442061 – very slow execution under Fedora 34 (readdwarf3)

Bug 442061 - very slow execution under Fedora 34 (readdwarf3)

Summary: very slow execution under Fedora 34 (readdwarf3)

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	general (show other bugs)
Version:	unspecified
Platform:	Fedora RPMs All

Importance:	NOR normal
Target Milestone:	---
Assignee:	Mark Wielaard

URL:
Keywords:	efficiency-and-performance

Depends on:
Blocks:

Reported:	2021-09-06 02:33 UTC by Alvaro Kuolas
Modified:	2021-09-24 21:03 UTC (History)
CC List:	5 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
perf record data (714.20 KB, application/x-xz) 2021-09-06 02:34 UTC, Alvaro Kuolas	Details
Skip units without addresses when looking for inlined functions (3.91 KB, text/plain) 2021-09-20 09:36 UTC, Mark Wielaard	Details
Only read line table for units with addresses for inlined functions (1.73 KB, text/plain) 2021-09-20 09:37 UTC, Mark Wielaard	Details
Reuse fndn_ix_Table as much as possible (12.49 KB, text/plain) 2021-09-20 09:37 UTC, Mark Wielaard	Details
Immediately skip to end of CU when not parsing children (1.22 KB, text/plain) 2021-09-20 09:38 UTC, Mark Wielaard	Details
Reuse abbrev if possible between units (5.34 KB, text/plain) 2021-09-20 09:39 UTC, Mark Wielaard	Details
Introduce abbv_state to read .debug_abbrev more lazily (13.25 KB, text/plain) 2021-09-20 09:40 UTC, Mark Wielaard	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Alvaro Kuolas 2021-09-06 02:33:02 UTC

SUMMARY

I am running valgrind on my PC with dual boot Windows 10 and Fedora 34.

Running the same test on Ubuntu 20.04 (under Windows 10 WSL2) valgrind runs in less than a second, but on Fedora 34 is very slow, several minutes slow.

On Fedora 34 it looks like the time spent is reading symbols.

This is the part that takes most of the time:

--23524-- Reading syms from /usr/lib64/libstdc++.so.6.0.29
--23524--   Considering /usr/lib/debug/.build-id/bd/d633ff5da0bba64d19ecf277a9eed7001da127.debug ..
--23524--   .. build-id is valid
--23524--   Considering /usr/lib/debug/.build-id/bd/../../../../../usr/lib/debug/usr/lib64/../../.dwz/gcc-11.2.1-1.fc34.x86_64 ..
--23524--   .. build-id is valid
--23524-- Reading syms from /usr/lib64/libm-2.33.so
--23524--   Considering /usr/lib/debug/.build-id/c1/784bbe8a93a6e62f74b16105a2076a03b398ac.debug ..
--23524--   .. build-id is valid
--23524-- Reading syms from /usr/lib64/libgcc_s-11-20210728.so.1
--23524--   Considering /usr/lib/debug/.build-id/46/e10c9ee769c5cfc7cdb638e3ccbc3169c0a949.debug ..
--23524--   .. build-id is valid
--23524--   Considering /usr/lib/debug/.build-id/46/../../../../../usr/lib/debug/lib64/../.dwz/gcc-11.2.1-1.fc34.x86_64 ..
--23524--   .. build-id is valid
--23524-- Reading syms from /usr/lib64/libc-2.33.so
--23524--   Considering /usr/lib/debug/.build-id/9d/e0f6e78ec16db4c6c167efc6b37466705edeff.debug ..
--23524--   .. build-id is valid
--23524--   Considering /usr/lib/debug/.build-id/9d/../../../../../usr/lib/debug/lib64/../.dwz/glibc-2.33-20.fc34.x86_64 ..
--23524--   .. build-id is valid



STEPS TO REPRODUCE
1. Run valgrind on a simple binary 

OBSERVED RESULT
It takes minutes to execute.

EXPECTED RESULT
It should be instant as the program is very simple like "hello world".

SOFTWARE/OS VERSIONS
Linux: 5.13.13-200.fc34.x86_64 #1 SMP Thu Aug 26 17:06:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux


ADDITIONAL INFORMATION
Using GCC 11

Comment 1 Alvaro Kuolas 2021-09-06 02:34:28 UTC

Created attachment 141323 [details]
perf record data

Attached is perf record data

Comment 2 Tom Hughes 2021-09-06 06:06:21 UTC

*** This bug has been marked as a duplicate of bug 435908 ***

Comment 3 Tom Hughes 2021-09-06 06:13:23 UTC

I'm going to remove that duplicate flag because I think this is related but not the same.

The issue here is that because debuginfod is enabled in F34 we now fetch the debug information for glibc and hence have to parse it which wouldn't have happened before unless you had installed the debuginfo manually.

That interacts with a separate issue where we do more work parsing the debug information than is really needed and I thought there was a bug for that but I can't find it right now.

Comment 4 Alvaro Kuolas 2021-09-06 12:29:02 UTC

I want to clarify that I installed debuginfo to try gdb reverse-step.

The packages installed where:

dnf install debuginfo-install glibc-2.33-20.fc34.x86_64 libgcc-11.2.1-1.fc34.x86_64 libstdc++-11.2.1-1.fc34.x86_64 

Without the debuginfo packages it works as expected.

Comment 5 Mark Wielaard 2021-09-20 09:36:28 UTC

Created attachment 141718 [details]
Skip units without addresses when looking for inlined functions

Comment 6 Mark Wielaard 2021-09-20 09:37:18 UTC

Created attachment 141719 [details]
Only read line table for units with addresses for inlined functions

Comment 7 Mark Wielaard 2021-09-20 09:37:56 UTC

Created attachment 141720 [details]
Reuse fndn_ix_Table as much as possible

Comment 8 Mark Wielaard 2021-09-20 09:38:39 UTC

Created attachment 141721 [details]
Immediately skip to end of CU when not parsing children

Comment 9 Mark Wielaard 2021-09-20 09:39:22 UTC

Created attachment 141722 [details]
Reuse abbrev if possible between units

Comment 10 Mark Wielaard 2021-09-20 09:40:11 UTC

Created attachment 141723 [details]
Introduce abbv_state to read .debug_abbrev more lazily

Comment 11 Mark Wielaard 2021-09-20 09:49:57 UTC

For --read-inline-info (=yes by default) and --read-var-info (=no by default) we use readdwarf3. This is a very greedy parser which reads all abbrevs, DIEs and debugline entries upfront. This can cause startup times to take multiple seconds (or even minutes) when lots of debuginfo is installed (especially when the debuginfo is compressed with DWZ).

The following patches make readdwarf3 read a lot less information when not needed, detect when debug_lines or abbrevs are shared between CUs, more efficiently skip DIEs or whole CUs and makes abbrev reading lazy (so only the abbrevs that are actually needed to read the non-skipped DIEs are processed).

  readdwarf3: Skip units without addresses when looking for inlined functions
  readdwarf3: Only read line table for units with addresses for inlined functions
  readdwarf3: Reuse fndn_ix_Table as much as possible
  readdwarf3: Immediately skip to end of CU when not parsing children
  readdwarf3: Reuse abbrev if possible between units
  readdwarf3: Introduce abbv_state to read .debug_abbrev more lazily

Before (best of 3)
--read-inline-info=yes --read-var-info=yes
real 35.45
user 34.38
sys 0.95

--read-inline-info=no --read-var-info=yes
real 24.21
user 23.51
sys 0.61

--read-inline-info=yes --read-var-info=no
real 11.81
user 11.44
sys 0.34

After (best of 3)
--read-inline-info=yes --read-var-info=yes
real 13.01
user 12.63
sys 0.33

--read-inline-info=no --read-var-info=yes
real 12.88
user 12.51
sys 0.33

--read-inline-info=yes --read-var-info=no
real 0.43
user 0.38
sys 0.04

Comment 12 Mark Wielaard 2021-09-20 09:53:21 UTC

The previous timings are for a c++ "Hello, World!" program, linking against glibc and libstdc++ with Fedora 34 glibc and libstdc++ debuginfo installed.

Comment 13 Mark Wielaard 2021-09-24 21:03:19 UTC

Pushed after testing on various systems, no regressions found, loading executable and libraries with debuginfo shows faster loading.

commit 2be10685c9a54102f0943a7841a500b7371f2f4a
Author: Mark Wielaard <mark@klomp.org>
Date:   Sun Sep 19 14:30:19 2021 +0200

    readdwarf3: Introduce abbv_state to read .debug_abbrev more lazily
    
    With the inline parser often a lot of DIEs are skipped, so reading
    all abbrevs up front wastes time and memory. A lot of time and memory
    can be saved by reading the abbrevs on demand. Do this by introducing
    an abbv_state that is used to keep track of the abbrevs already read.
    This does technically make the CUConst struct not const.

commit ab6795f5c23ff155d62e6f77ec6c28817a52cd15
Author: Mark Wielaard <mark@klomp.org>
Date:   Sat Sep 18 22:16:33 2021 +0200

    readdwarf3: Reuse abbrev if possible between units
    
    Instead of destroying the ht_abbrvs after processing a CU save it
    and the offset so it can be reused for the next CU if that happens
    to have the same abbrev offset. dwz compressed DWARF often reuse
    the same abbrev for multiple CUs.

commit 0a543c57c8ffca25f4817750317c981ea2d84311
Author: Mark Wielaard <mark@klomp.org>
Date:   Sat Sep 18 03:23:52 2021 +0200

    readdwarf3: Immediately skip to end of CU when not parsing children

commit 26a776da2d196b1b88e809464e98cd0a0d6fc746
Author: Mark Wielaard <mark@klomp.org>
Date:   Sat Sep 18 00:24:38 2021 +0200

    readdwarf3: Reuse fndn_ix_Table as much as possible
    
    Both the var parser and the inl parser kept a fndn_ix_Table.
    Initialize only one per debuginfo read pass and reuse if the stmt offset
    is the same as last time (CUs can share the same line table and alt
    files do share one for all units).

commit b4a6b165baf28b7371495f6c1f12bf745ff52d02
Author: Mark Wielaard <mark@klomp.org>
Date:   Thu Sep 16 22:49:41 2021 +0200

    readdwarf3: Only read line table for units with addresses for inlined functions
    
    When parsing DIEs for inlined functions, only read the line table for
    units which can actually contain inlined_subroutines.

commit 75e3ef0f3b834f75f49333d35b5a040060247b03
Author: Mark Wielaard <mark@klomp.org>
Date:   Thu Sep 16 22:01:47 2021 +0200

    readdwarf3: Skip units without addresses when looking for inlined functions
    
    When a unit doesn't cover any addresses skip it because no actual code
    will be inside. Also use skip_DIE instead of read_DIE when not parsing
    (skipping) children.