386256 – Speed up loading of large data files

Bug 386256 - Speed up loading of large data files

Summary: Speed up loading of large data files

Status:	RESOLVED FIXED

Alias:	None

Product:	Heaptrack
Classification:	Applications
Component:	general (show other bugs)
Version:	unspecified
Platform:	Gentoo Packages Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Milian Wolff

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-10-27 16:22 UTC by Edward Kigwana
Modified:	2021-05-21 08:47 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Edward Kigwana 2017-10-27 16:22:52 UTC

I have many cores and one of them is pegged at 100%. It looks like this can benefit from threading. Heaptrack is from  master a few days ago.

heaptrack_gui heaptrack.ts-scc.29350.gz

file:
-rw-r----- 1 user user 815811007 Oct 27 16:09 heaptrack.app.29350.gz

ps output:
2924 pts/32   Sl+    6:56 heaptrack_gui heaptrack.app.29350.gz

Comment 1 Milian Wolff 2017-12-12 15:45:27 UTC

the heaptrack data file needs to be loaded linearly (you need to find the previous allocation for a deallocation e.g.). I'll have to investigate whether other stuff could be parallelized, but it's not an easy task to do.

patches welcome ;-)

Comment 2 Milian Wolff 2018-03-16 23:51:42 UTC

Git commit a189ad4d2aa09e7afcd47987bdc75537dd22d5d3 by Milian Wolff.
Committed on 16/03/2018 at 23:51.
Pushed by mwolff into branch 'master'.

Optimize: only map trace indices for allocation infos

Previously, we used to run the binary search to map a trace index
to an allocation object on every (de)allocation. These events are
occurring extremely often, of course - usually orders of magnitudes
more often than the allocation info events.

Now, we only map the trace index to an allocation when a new
allocation info is parsed. This way, we don't need to run the
slow binary search and can access the allocation object directly
through the mapped index in the allocation index.

For a large data file (~13GB uncompressed) the results are quite
impressive: Before this patch, heaptrack_print took ca. 3min to
parse the zstd compressed data. With this patch applied, we are
down to 2min6s!

Before:

 Performance counter stats for 'heaptrack_print heaptrack.Application.19285.zst':

     178798,164042      task-clock:u (msec)       #    0,998 CPUs utilized
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
            30.570      page-faults:u             #    0,171 K/sec
   551.902.999.436      cycles:u                  #    3,087 GHz
 1.540.185.452.300      instructions:u            #    2,79  insn per cycle
   332.833.340.539      branches:u                # 1861,503 M/sec
     1.350.342.839      branch-misses:u           #    0,41% of all branches

     179,193276255 seconds time elapsed

After:

 Performance counter stats for 'heaptrack_print heaptrack.Application.19285.zst':

     125579,754384      task-clock:u (msec)       #    0,999 CPUs utilized
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
            33.982      page-faults:u             #    0,271 K/sec
   393.084.840.177      cycles:u                  #    3,130 GHz
 1.127.147.336.034      instructions:u            #    2,87  insn per cycle
   238.225.815.121      branches:u                # 1897,008 M/sec
       998.456.200      branch-misses:u           #    0,42% of all branches

     125,663808724 seconds time elapsed

M  +20   -10   src/analyze/accumulatedtracedata.cpp
M  +8    -2    src/analyze/accumulatedtracedata.h
M  +2    -1    src/analyze/gui/parser.cpp

https://commits.kde.org/heaptrack/a189ad4d2aa09e7afcd47987bdc75537dd22d5d3

Comment 3 Milian Wolff 2018-03-16 23:51:42 UTC

Git commit ef4a460cc69310618ec72cbaf284501bd19a6133 by Milian Wolff.
Committed on 16/03/2018 at 23:50.
Pushed by mwolff into branch 'master'.

Optimize AccumulatedTraceData::findAllocation

Instead of sorting the vector of Allocation objects which are
48byte large, introduce a separate sorted vector of pairs of
TraceIndex and and AllocationIndex, both just 4 byte large.
Lookup and insertion in the middle of this much smaller container
is considerably faster, improving the heaptrack_print analysis
time by ~10% in one of my larger test files (from ~3minutes down
to 2min40s).

M  +20   -11   src/analyze/accumulatedtracedata.cpp
M  +3    -0    src/analyze/accumulatedtracedata.h
M  +5    -0    src/analyze/allocationdata.h

https://commits.kde.org/heaptrack/ef4a460cc69310618ec72cbaf284501bd19a6133

Comment 4 Milian Wolff 2018-03-16 23:53:05 UTC

commit 4edc10044f2743e067b0387d04518b94afd602e1
Author: Milian Wolff <milian.wolff@kdab.com>
Date:   Fri Mar 16 19:47:19 2018 +0100

    Optionally use zstd for compression of heaptrack data files
    
    Zstandard is much faster compared to gzip, drastically improving
    both record and analysis performance of heaptrack in turn.
    
    Since Zstandard support is missing from upstream boost as of yet,
    this patch is introducing a copy of the zstd boost iostream code
    written by Reimar Döffinger and published at:
    
        https://github.com/rdoeffinger/iostreams/tree/zstd
    
    Many thanks to Reimar for enabling Zstandard support in heaptrack!
    
    Below are some performance numbers for compression and decompression
    of heaptrack data. The dramatic reduction in compression time
    significantly reduces the overhead imposed by heaptrack on your
    system while recording data. And the time saved while decompressing
    the data speeds up the analysis steps later on.
    
    Performance for compression of 224MB of raw heaptrack data:
    
    gzip:
     Performance counter stats for 'gzip -kf heaptrack.kmail.15607' (5 runs):
    
           4869,759387      task-clock:u (msec)       #    0,996 CPUs utilized            ( +-  0,70% )
                     0      context-switches:u        #    0,000 K/sec
                     0      cpu-migrations:u          #    0,000 K/sec
                   112      page-faults:u             #    0,023 K/sec                    ( +-  0,52% )
        14.986.414.840      cycles:u                  #    3,077 GHz                      ( +-  0,62% )
        21.816.226.253      instructions:u            #    1,46  insn per cycle           ( +-  0,00% )
         4.022.016.531      branches:u                #  825,917 M/sec                    ( +-  0,00% )
           157.529.308      branch-misses:u           #    3,92% of all branches          ( +-  0,07% )
    
           4,890694017 seconds time elapsed                                          ( +-  0,60% )
    
    The size of the gzip compressed data file is 15807235 bytes.
    
    zstd:
     Performance counter stats for 'zstd -kf heaptrack.kmail.15607' (5 runs):
    
            577,288680      task-clock:u (msec)       #    0,995 CPUs utilized            ( +-  0,49% )
                     0      context-switches:u        #    0,000 K/sec
                     0      cpu-migrations:u          #    0,000 K/sec
                   634      page-faults:u             #    0,001 M/sec                    ( +-  0,08% )
         1.626.811.592      cycles:u                  #    2,818 GHz                      ( +-  0,37% )
         2.750.788.523      instructions:u            #    1,69  insn per cycle           ( +-  0,00% )
           312.504.536      branches:u                #  541,331 M/sec                    ( +-  0,00% )
            10.858.277      branch-misses:u           #    3,47% of all branches          ( +-  0,08% )
    
           0,580079819 seconds time elapsed                                          ( +-  0,67% )
    
    The size of the zstd compressed data file is 16188988 bytes.
    
    Performance for parsing the data files with heaptrack_print, which
    requires two decompression passes:
    
    gzip:
     Performance counter stats for 'heaptrack_print heaptrack.kmail.15607.gz' (5 runs):
    
           6180,813184      task-clock:u (msec)       #    0,998 CPUs utilized            ( +-  0,76% )
                     0      context-switches:u        #    0,000 K/sec
                     0      cpu-migrations:u          #    0,000 K/sec
                20.111      page-faults:u             #    0,003 M/sec                    ( +-  0,00% )
        19.138.095.427      cycles:u                  #    3,096 GHz                      ( +-  0,66% )
        36.417.197.208      instructions:u            #    1,90  insn per cycle           ( +-  0,00% )
         7.683.651.885      branches:u                # 1243,146 M/sec                    ( +-  0,00% )
           153.959.798      branch-misses:u           #    2,00% of all branches          ( +-  1,24% )
    
           6,194282999 seconds time elapsed                                          ( +-  0,71% )
    
    zstd:
     Performance counter stats for 'heaptrack_print heaptrack.kmail.15607.zst' (5 runs):
    
           4496,786684      task-clock:u (msec)       #    0,999 CPUs utilized            ( +-  0,43% )
                     0      context-switches:u        #    0,000 K/sec
                     0      cpu-migrations:u          #    0,000 K/sec
                24.304      page-faults:u             #    0,005 M/sec                    ( +-  0,00% )
        13.944.273.234      cycles:u                  #    3,101 GHz                      ( +-  0,40% )
        33.232.471.804      instructions:u            #    2,38  insn per cycle           ( +-  0,00% )
         6.912.954.404      branches:u                # 1537,310 M/sec                    ( +-  0,00% )
           120.004.080      branch-misses:u           #    1,74% of all branches          ( +-  0,07% )
    
           4,501574955 seconds time elapsed                                          ( +-  0,41% )
    
    One of my larger data files is actually seeing even better
    improvements! The data is about ~13GB when uncompressed. Gzip brings
    it down to a much more manageable 155MB. But zstd magically compresses
    the same data down to only 77MB! Even better, we can parse the
    zstd compressed data file with heaptrack_print in ca. 3 minutes.
    The gzip compressed data file on the other hand takes a bit over 4
    minutes!
    
    CCBUG: 386256

Comment 5 Milian Wolff 2021-05-21 08:47:45 UTC

quite a few optimizations have landed over the years. if it's still too slow, please create a new bug report and upload a test file that I can use for further optimizations.