SUMMARY I am running valgrind on my PC with dual boot Windows 10 and Fedora 34. Running the same test on Ubuntu 20.04 (under Windows 10 WSL2) valgrind runs in less than a second, but on Fedora 34 is very slow, several minutes slow. On Fedora 34 it looks like the time spent is reading symbols. This is the part that takes most of the time: --23524-- Reading syms from /usr/lib64/libstdc++.so.6.0.29 --23524-- Considering /usr/lib/debug/.build-id/bd/d633ff5da0bba64d19ecf277a9eed7001da127.debug .. --23524-- .. build-id is valid --23524-- Considering /usr/lib/debug/.build-id/bd/../../../../../usr/lib/debug/usr/lib64/../../.dwz/gcc-11.2.1-1.fc34.x86_64 .. --23524-- .. build-id is valid --23524-- Reading syms from /usr/lib64/libm-2.33.so --23524-- Considering /usr/lib/debug/.build-id/c1/784bbe8a93a6e62f74b16105a2076a03b398ac.debug .. --23524-- .. build-id is valid --23524-- Reading syms from /usr/lib64/libgcc_s-11-20210728.so.1 --23524-- Considering /usr/lib/debug/.build-id/46/e10c9ee769c5cfc7cdb638e3ccbc3169c0a949.debug .. --23524-- .. build-id is valid --23524-- Considering /usr/lib/debug/.build-id/46/../../../../../usr/lib/debug/lib64/../.dwz/gcc-11.2.1-1.fc34.x86_64 .. --23524-- .. build-id is valid --23524-- Reading syms from /usr/lib64/libc-2.33.so --23524-- Considering /usr/lib/debug/.build-id/9d/e0f6e78ec16db4c6c167efc6b37466705edeff.debug .. --23524-- .. build-id is valid --23524-- Considering /usr/lib/debug/.build-id/9d/../../../../../usr/lib/debug/lib64/../.dwz/glibc-2.33-20.fc34.x86_64 .. --23524-- .. build-id is valid STEPS TO REPRODUCE 1. Run valgrind on a simple binary OBSERVED RESULT It takes minutes to execute. EXPECTED RESULT It should be instant as the program is very simple like "hello world". SOFTWARE/OS VERSIONS Linux: 5.13.13-200.fc34.x86_64 #1 SMP Thu Aug 26 17:06:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux ADDITIONAL INFORMATION Using GCC 11
Created attachment 141323 [details] perf record data Attached is perf record data
*** This bug has been marked as a duplicate of bug 435908 ***
I'm going to remove that duplicate flag because I think this is related but not the same. The issue here is that because debuginfod is enabled in F34 we now fetch the debug information for glibc and hence have to parse it which wouldn't have happened before unless you had installed the debuginfo manually. That interacts with a separate issue where we do more work parsing the debug information than is really needed and I thought there was a bug for that but I can't find it right now.
I want to clarify that I installed debuginfo to try gdb reverse-step. The packages installed where: dnf install debuginfo-install glibc-2.33-20.fc34.x86_64 libgcc-11.2.1-1.fc34.x86_64 libstdc++-11.2.1-1.fc34.x86_64 Without the debuginfo packages it works as expected.
Created attachment 141718 [details] Skip units without addresses when looking for inlined functions
Created attachment 141719 [details] Only read line table for units with addresses for inlined functions
Created attachment 141720 [details] Reuse fndn_ix_Table as much as possible
Created attachment 141721 [details] Immediately skip to end of CU when not parsing children
Created attachment 141722 [details] Reuse abbrev if possible between units
Created attachment 141723 [details] Introduce abbv_state to read .debug_abbrev more lazily
For --read-inline-info (=yes by default) and --read-var-info (=no by default) we use readdwarf3. This is a very greedy parser which reads all abbrevs, DIEs and debugline entries upfront. This can cause startup times to take multiple seconds (or even minutes) when lots of debuginfo is installed (especially when the debuginfo is compressed with DWZ). The following patches make readdwarf3 read a lot less information when not needed, detect when debug_lines or abbrevs are shared between CUs, more efficiently skip DIEs or whole CUs and makes abbrev reading lazy (so only the abbrevs that are actually needed to read the non-skipped DIEs are processed). readdwarf3: Skip units without addresses when looking for inlined functions readdwarf3: Only read line table for units with addresses for inlined functions readdwarf3: Reuse fndn_ix_Table as much as possible readdwarf3: Immediately skip to end of CU when not parsing children readdwarf3: Reuse abbrev if possible between units readdwarf3: Introduce abbv_state to read .debug_abbrev more lazily Before (best of 3) --read-inline-info=yes --read-var-info=yes real 35.45 user 34.38 sys 0.95 --read-inline-info=no --read-var-info=yes real 24.21 user 23.51 sys 0.61 --read-inline-info=yes --read-var-info=no real 11.81 user 11.44 sys 0.34 After (best of 3) --read-inline-info=yes --read-var-info=yes real 13.01 user 12.63 sys 0.33 --read-inline-info=no --read-var-info=yes real 12.88 user 12.51 sys 0.33 --read-inline-info=yes --read-var-info=no real 0.43 user 0.38 sys 0.04
The previous timings are for a c++ "Hello, World!" program, linking against glibc and libstdc++ with Fedora 34 glibc and libstdc++ debuginfo installed.
Pushed after testing on various systems, no regressions found, loading executable and libraries with debuginfo shows faster loading. commit 2be10685c9a54102f0943a7841a500b7371f2f4a Author: Mark Wielaard <mark@klomp.org> Date: Sun Sep 19 14:30:19 2021 +0200 readdwarf3: Introduce abbv_state to read .debug_abbrev more lazily With the inline parser often a lot of DIEs are skipped, so reading all abbrevs up front wastes time and memory. A lot of time and memory can be saved by reading the abbrevs on demand. Do this by introducing an abbv_state that is used to keep track of the abbrevs already read. This does technically make the CUConst struct not const. commit ab6795f5c23ff155d62e6f77ec6c28817a52cd15 Author: Mark Wielaard <mark@klomp.org> Date: Sat Sep 18 22:16:33 2021 +0200 readdwarf3: Reuse abbrev if possible between units Instead of destroying the ht_abbrvs after processing a CU save it and the offset so it can be reused for the next CU if that happens to have the same abbrev offset. dwz compressed DWARF often reuse the same abbrev for multiple CUs. commit 0a543c57c8ffca25f4817750317c981ea2d84311 Author: Mark Wielaard <mark@klomp.org> Date: Sat Sep 18 03:23:52 2021 +0200 readdwarf3: Immediately skip to end of CU when not parsing children commit 26a776da2d196b1b88e809464e98cd0a0d6fc746 Author: Mark Wielaard <mark@klomp.org> Date: Sat Sep 18 00:24:38 2021 +0200 readdwarf3: Reuse fndn_ix_Table as much as possible Both the var parser and the inl parser kept a fndn_ix_Table. Initialize only one per debuginfo read pass and reuse if the stmt offset is the same as last time (CUs can share the same line table and alt files do share one for all units). commit b4a6b165baf28b7371495f6c1f12bf745ff52d02 Author: Mark Wielaard <mark@klomp.org> Date: Thu Sep 16 22:49:41 2021 +0200 readdwarf3: Only read line table for units with addresses for inlined functions When parsing DIEs for inlined functions, only read the line table for units which can actually contain inlined_subroutines. commit 75e3ef0f3b834f75f49333d35b5a040060247b03 Author: Mark Wielaard <mark@klomp.org> Date: Thu Sep 16 22:01:47 2021 +0200 readdwarf3: Skip units without addresses when looking for inlined functions When a unit doesn't cover any addresses skip it because no actual code will be inside. Also use skip_DIE instead of read_DIE when not parsing (skipping) children.