I have many cores and one of them is pegged at 100%. It looks like this can benefit from threading. Heaptrack is from master a few days ago. heaptrack_gui heaptrack.ts-scc.29350.gz file: -rw-r----- 1 user user 815811007 Oct 27 16:09 heaptrack.app.29350.gz ps output: 2924 pts/32 Sl+ 6:56 heaptrack_gui heaptrack.app.29350.gz
the heaptrack data file needs to be loaded linearly (you need to find the previous allocation for a deallocation e.g.). I'll have to investigate whether other stuff could be parallelized, but it's not an easy task to do. patches welcome ;-)
Git commit a189ad4d2aa09e7afcd47987bdc75537dd22d5d3 by Milian Wolff. Committed on 16/03/2018 at 23:51. Pushed by mwolff into branch 'master'. Optimize: only map trace indices for allocation infos Previously, we used to run the binary search to map a trace index to an allocation object on every (de)allocation. These events are occurring extremely often, of course - usually orders of magnitudes more often than the allocation info events. Now, we only map the trace index to an allocation when a new allocation info is parsed. This way, we don't need to run the slow binary search and can access the allocation object directly through the mapped index in the allocation index. For a large data file (~13GB uncompressed) the results are quite impressive: Before this patch, heaptrack_print took ca. 3min to parse the zstd compressed data. With this patch applied, we are down to 2min6s! Before: Performance counter stats for 'heaptrack_print heaptrack.Application.19285.zst': 178798,164042 task-clock:u (msec) # 0,998 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 30.570 page-faults:u # 0,171 K/sec 551.902.999.436 cycles:u # 3,087 GHz 1.540.185.452.300 instructions:u # 2,79 insn per cycle 332.833.340.539 branches:u # 1861,503 M/sec 1.350.342.839 branch-misses:u # 0,41% of all branches 179,193276255 seconds time elapsed After: Performance counter stats for 'heaptrack_print heaptrack.Application.19285.zst': 125579,754384 task-clock:u (msec) # 0,999 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 33.982 page-faults:u # 0,271 K/sec 393.084.840.177 cycles:u # 3,130 GHz 1.127.147.336.034 instructions:u # 2,87 insn per cycle 238.225.815.121 branches:u # 1897,008 M/sec 998.456.200 branch-misses:u # 0,42% of all branches 125,663808724 seconds time elapsed M +20 -10 src/analyze/accumulatedtracedata.cpp M +8 -2 src/analyze/accumulatedtracedata.h M +2 -1 src/analyze/gui/parser.cpp https://commits.kde.org/heaptrack/a189ad4d2aa09e7afcd47987bdc75537dd22d5d3
Git commit ef4a460cc69310618ec72cbaf284501bd19a6133 by Milian Wolff. Committed on 16/03/2018 at 23:50. Pushed by mwolff into branch 'master'. Optimize AccumulatedTraceData::findAllocation Instead of sorting the vector of Allocation objects which are 48byte large, introduce a separate sorted vector of pairs of TraceIndex and and AllocationIndex, both just 4 byte large. Lookup and insertion in the middle of this much smaller container is considerably faster, improving the heaptrack_print analysis time by ~10% in one of my larger test files (from ~3minutes down to 2min40s). M +20 -11 src/analyze/accumulatedtracedata.cpp M +3 -0 src/analyze/accumulatedtracedata.h M +5 -0 src/analyze/allocationdata.h https://commits.kde.org/heaptrack/ef4a460cc69310618ec72cbaf284501bd19a6133
commit 4edc10044f2743e067b0387d04518b94afd602e1 Author: Milian Wolff <milian.wolff@kdab.com> Date: Fri Mar 16 19:47:19 2018 +0100 Optionally use zstd for compression of heaptrack data files Zstandard is much faster compared to gzip, drastically improving both record and analysis performance of heaptrack in turn. Since Zstandard support is missing from upstream boost as of yet, this patch is introducing a copy of the zstd boost iostream code written by Reimar Döffinger and published at: https://github.com/rdoeffinger/iostreams/tree/zstd Many thanks to Reimar for enabling Zstandard support in heaptrack! Below are some performance numbers for compression and decompression of heaptrack data. The dramatic reduction in compression time significantly reduces the overhead imposed by heaptrack on your system while recording data. And the time saved while decompressing the data speeds up the analysis steps later on. Performance for compression of 224MB of raw heaptrack data: gzip: Performance counter stats for 'gzip -kf heaptrack.kmail.15607' (5 runs): 4869,759387 task-clock:u (msec) # 0,996 CPUs utilized ( +- 0,70% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 112 page-faults:u # 0,023 K/sec ( +- 0,52% ) 14.986.414.840 cycles:u # 3,077 GHz ( +- 0,62% ) 21.816.226.253 instructions:u # 1,46 insn per cycle ( +- 0,00% ) 4.022.016.531 branches:u # 825,917 M/sec ( +- 0,00% ) 157.529.308 branch-misses:u # 3,92% of all branches ( +- 0,07% ) 4,890694017 seconds time elapsed ( +- 0,60% ) The size of the gzip compressed data file is 15807235 bytes. zstd: Performance counter stats for 'zstd -kf heaptrack.kmail.15607' (5 runs): 577,288680 task-clock:u (msec) # 0,995 CPUs utilized ( +- 0,49% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 634 page-faults:u # 0,001 M/sec ( +- 0,08% ) 1.626.811.592 cycles:u # 2,818 GHz ( +- 0,37% ) 2.750.788.523 instructions:u # 1,69 insn per cycle ( +- 0,00% ) 312.504.536 branches:u # 541,331 M/sec ( +- 0,00% ) 10.858.277 branch-misses:u # 3,47% of all branches ( +- 0,08% ) 0,580079819 seconds time elapsed ( +- 0,67% ) The size of the zstd compressed data file is 16188988 bytes. Performance for parsing the data files with heaptrack_print, which requires two decompression passes: gzip: Performance counter stats for 'heaptrack_print heaptrack.kmail.15607.gz' (5 runs): 6180,813184 task-clock:u (msec) # 0,998 CPUs utilized ( +- 0,76% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 20.111 page-faults:u # 0,003 M/sec ( +- 0,00% ) 19.138.095.427 cycles:u # 3,096 GHz ( +- 0,66% ) 36.417.197.208 instructions:u # 1,90 insn per cycle ( +- 0,00% ) 7.683.651.885 branches:u # 1243,146 M/sec ( +- 0,00% ) 153.959.798 branch-misses:u # 2,00% of all branches ( +- 1,24% ) 6,194282999 seconds time elapsed ( +- 0,71% ) zstd: Performance counter stats for 'heaptrack_print heaptrack.kmail.15607.zst' (5 runs): 4496,786684 task-clock:u (msec) # 0,999 CPUs utilized ( +- 0,43% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 24.304 page-faults:u # 0,005 M/sec ( +- 0,00% ) 13.944.273.234 cycles:u # 3,101 GHz ( +- 0,40% ) 33.232.471.804 instructions:u # 2,38 insn per cycle ( +- 0,00% ) 6.912.954.404 branches:u # 1537,310 M/sec ( +- 0,00% ) 120.004.080 branch-misses:u # 1,74% of all branches ( +- 0,07% ) 4,501574955 seconds time elapsed ( +- 0,41% ) One of my larger data files is actually seeing even better improvements! The data is about ~13GB when uncompressed. Gzip brings it down to a much more manageable 155MB. But zstd magically compresses the same data down to only 77MB! Even better, we can parse the zstd compressed data file with heaptrack_print in ca. 3 minutes. The gzip compressed data file on the other hand takes a bit over 4 minutes! CCBUG: 386256
quite a few optimizations have landed over the years. if it's still too slow, please create a new bug report and upload a test file that I can use for further optimizations.