Valgrind compiled from git HEAD on Fedora 33 with gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9) If the following 2 packages are installed libstdc++-debuginfo.x86_64 libgcc-debuginfo.x86_64 then the following list of 14 DRD regression tests fail drd/tests/annotate_ignore_read (stderr) drd/tests/annotate_order_1 (stderr) drd/tests/annotate_order_2 (stderr) drd/tests/annotate_order_3 (stderr) drd/tests/annotate_smart_pointer (stderr) drd/tests/annotate_smart_pointer2 (stderr) drd/tests/annotate_spinlock (stderr) drd/tests/annotate_static (stderr) drd/tests/bug322621 (stderr) drd/tests/concurrent_close (stderr) drd/tests/dlopen (stderr) drd/tests/pth_cancel_locked (stderr) drd/tests/pth_cleanup_handler (stderr) drd/tests/sem_wait (stderr) An example diff is paulf> cat annotate_ignore_read.stderr.diff --- annotate_ignore_read.stderr.exp 2017-11-11 10:43:44.152186306 +0100 +++ annotate_ignore_read.stderr.out 2021-02-23 16:57:25.402041732 +0100 @@ -1,4 +1,26 @@ + +parse DIE(readdwarf3.c:4085): confused by: + <3><55f9a2>: Abbrev Number: 64 (DW_TAG_subrange_type) + DW_AT_upper_bound : 2147483647 + DW_AT_name : (indirect string, offset: 0x........): positive___XDLU_1__2147483647 + DW_AT_type : 0x........ + DW_AT_artificial : 1 +parse_type_DIE: +WARNING: Serious error when reading debug info +When reading debug info from /usr/lib64/libstdc++.so.6.0.28: +confused by the above DIE + +parse DIE(readdwarf3.c:4085): confused by: + <3><87da9>: Abbrev Number: 64 (DW_TAG_subrange_type) + DW_AT_upper_bound : 2147483647 + DW_AT_name : (indirect string, offset: 0x........): positive___XDLU_1__2147483647 + DW_AT_type : 0x........ + DW_AT_artificial : 1 +parse_type_DIE: +WARNING: Serious error when reading debug info +When reading debug info from /usr/lib64/libgcc_s-10-20201125.so.1: +confused by the above DIE FLAGS [phb=1, fm=0] test69: negative GLOB=30
Simpler reproducer with libstdc++-debuginfo installed: $ (echo "#include <iostream>"; echo 'int main() { std::cout << "Hello"; }') | g++ -xc++ - $ ./vg-in-place --read-var-info=yes --tool=none ./a.out ==367933== Nulgrind, the minimal Valgrind tool ==367933== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote. ==367933== Using Valgrind-3.17.0.GIT and LibVEX; rerun with -h for copyright info ==367933== Command: ./a.out ==367933== parse DIE(readdwarf3.c:4085): confused by: <3><55f9a2>: Abbrev Number: 64 (DW_TAG_subrange_type) DW_AT_upper_bound : 2147483647 DW_AT_name : (indirect string, offset: 0x53c5e): positive___XDLU_1__2147483647 DW_AT_type : 0x786bc DW_AT_artificial : 1 parse_type_DIE: --367933-- WARNING: Serious error when reading debug info --367933-- When reading debug info from /usr/lib64/libstdc++.so.6.0.28: --367933-- confused by the above DIE Hello==367933==
Created attachment 136098 [details] Skip some stuff when seeing an unknown language, be less chatty about parser issues. All the issues seem to come from the multi-file, that is the shared (supplementary or alt) file containing debuginfo shared by all the gcc/runtime libraries. There are a couple of issues: - It contains entries for the 'D' language, which has some constructs we don't expect. - We don't read partial units correctly, which means we often don't know the language we are looking at. - The parser is very chatty about issues it didn't expect (even if they are ignored, it will still output something) I am not sure why this pops up now. It doesn't seem we ever parsed this correctly. Maybe nobody has been running the tests with this debuginfo installed? It also only shows up with --read-var-info=yes which some tests enable, but is disabled by default. The following workarounds make it silent (even though there are some issues reading the debuginfo).
With the suggested changes, for DRD I get, when running /usr/bin/time -p perl tests/vg_regtest drd/tests == 130 tests, 1 stderr failure, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures == drd/tests/pth_cleanup_handler (stderr) The above testcase that fails generates +Process terminating with default action of signal 14 (SIGALRM) + at 0x........: (within libpthread-?.?.so) + by 0x........: pthread_once (drd_pthread_intercepts.c:?) + by 0x........: UnknownInlinedFun (gthr-default.h:700) + by 0x........: uw_init_context_1 (unwind-dw2.c:1598) + by 0x........: _Unwind_ForcedUnwind (unwind.inc:211) + by 0x........: (within libpthread-?.?.so) + by 0x........: (within libpthread-?.?.so) + by 0x........: f (pth_cleanup_handler.c:?) + by 0x........: vgDrd_thread_wrapper (drd_pthread_intercepts.c:?) + by 0x........: (within libpthread-?.?.so) + by 0x........: clone (in /...libc...) +Destroying condition variable that is being waited upon: cond 0x........, mutex 0x........ locked by thread 0 + at 0x........: pthread_cond_wait@@GLIBC_2.3.2(within libpthread-?.?.so) + by 0x........: pthread_cond_wait (drd_pthread_intercepts.c:?) + by 0x........: pthread_cond_wait (drd_pthread_intercepts.c:?) + by 0x........: vgDrd_sema_down (drd_pthread_intercepts.c:?) + by 0x........: pthread_create (drd_pthread_intercepts.c:?) + by 0x........: main (pth_cleanup_handler.c:?) + + +ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) I also noticed a substantial slowdown Runtime with change and debuginfo installed real 2200.95 user 2085.74 sys 23.89 Without the changes and with debuginfo removed [0 failures] real 244.29 user 149.81 sys 13.18 With the changes and still withoug debuginfo [0 failures] real 245.38 user 151.01 sys 13.17
(In reply to Paul Floyd from comment #3) > With the suggested changes, for DRD I get, when running > > /usr/bin/time -p perl tests/vg_regtest drd/tests > > == 130 tests, 1 stderr failure, 0 stdout failures, 0 stderrB failures, 0 > stdoutB failures, 0 post failures == > drd/tests/pth_cleanup_handler (stderr) > > The above testcase that fails generates > > +Process terminating with default action of signal 14 (SIGALRM) > + at 0x........: (within libpthread-?.?.so) > + by 0x........: pthread_once (drd_pthread_intercepts.c:?) > + by 0x........: UnknownInlinedFun (gthr-default.h:700) > + by 0x........: uw_init_context_1 (unwind-dw2.c:1598) > + by 0x........: _Unwind_ForcedUnwind (unwind.inc:211) > + by 0x........: (within libpthread-?.?.so) > + by 0x........: (within libpthread-?.?.so) > + by 0x........: f (pth_cleanup_handler.c:?) > + by 0x........: vgDrd_thread_wrapper (drd_pthread_intercepts.c:?) > + by 0x........: (within libpthread-?.?.so) > + by 0x........: clone (in /...libc...) > +Destroying condition variable that is being waited upon: cond 0x........, > mutex 0x........ locked by thread 0 > + at 0x........: pthread_cond_wait@@GLIBC_2.3.2(within libpthread-?.?.so) > + by 0x........: pthread_cond_wait (drd_pthread_intercepts.c:?) > + by 0x........: pthread_cond_wait (drd_pthread_intercepts.c:?) > + by 0x........: vgDrd_sema_down (drd_pthread_intercepts.c:?) > + by 0x........: pthread_create (drd_pthread_intercepts.c:?) > + by 0x........: main (pth_cleanup_handler.c:?) > + > + > +ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) I cannot tell whether that is good or bad. It seems the test doesn't expect this failure mode. But it seems the dwarf reader worked as expected doesn't it? > I also noticed a substantial slowdown Maybe that explains the above test failure? A timeout because valgrind is reading all that debuginfo? > Runtime with change and debuginfo installed > real 2200.95 > user 2085.74 > sys 23.89 > > Without the changes and with debuginfo removed [0 failures] > real 244.29 > user 149.81 > sys 13.18 > > With the changes and still withoug debuginfo [0 failures] > real 245.38 > user 151.01 > sys 13.17 Yes, this is also expected sadly. The dwarf reader is really slow and it is basically reading all the gcc runtime library debuginfo eagerly as soon as libstdc++.so is loaded. That takes a lot of time.
(In reply to Mark Wielaard from comment #4) > (In reply to Paul Floyd from comment #3) > > With the suggested changes, for DRD I get, when running > > > > /usr/bin/time -p perl tests/vg_regtest drd/tests > > > > == 130 tests, 1 stderr failure, 0 stdout failures, 0 stderrB failures, 0 > > stdoutB failures, 0 post failures == > > drd/tests/pth_cleanup_handler (stderr) > > > > The above testcase that fails generates > > > > +Process terminating with default action of signal 14 (SIGALRM) > > + at 0x........: (within libpthread-?.?.so) > > + by 0x........: pthread_once (drd_pthread_intercepts.c:?) > > + by 0x........: UnknownInlinedFun (gthr-default.h:700) > > + by 0x........: uw_init_context_1 (unwind-dw2.c:1598) > > + by 0x........: _Unwind_ForcedUnwind (unwind.inc:211) > > + by 0x........: (within libpthread-?.?.so) > > + by 0x........: (within libpthread-?.?.so) > > + by 0x........: f (pth_cleanup_handler.c:?) > > + by 0x........: vgDrd_thread_wrapper (drd_pthread_intercepts.c:?) > > + by 0x........: (within libpthread-?.?.so) > > + by 0x........: clone (in /...libc...) > > +Destroying condition variable that is being waited upon: cond 0x........, > > mutex 0x........ locked by thread 0 > > + at 0x........: pthread_cond_wait@@GLIBC_2.3.2(within libpthread-?.?.so) > > + by 0x........: pthread_cond_wait (drd_pthread_intercepts.c:?) > > + by 0x........: pthread_cond_wait (drd_pthread_intercepts.c:?) > > + by 0x........: vgDrd_sema_down (drd_pthread_intercepts.c:?) > > + by 0x........: pthread_create (drd_pthread_intercepts.c:?) > > + by 0x........: main (pth_cleanup_handler.c:?) > > + > > + > > +ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) > > I cannot tell whether that is good or bad. > It seems the test doesn't expect this failure mode. > But it seems the dwarf reader worked as expected doesn't it? > > > I also noticed a substantial slowdown > > Maybe that explains the above test failure? A timeout because valgrind is > reading all that debuginfo? Does this help? diff --git a/drd/tests/pth_cleanup_handler.c b/drd/tests/pth_cleanup_handler.c index 0fb3d073d..e441fa3a4 100644 --- a/drd/tests/pth_cleanup_handler.c +++ b/drd/tests/pth_cleanup_handler.c @@ -39,7 +39,7 @@ int main() pthread_t pt1, pt2; // Make sure the program exits in case a deadlock has been triggered. - alarm(20); + alarm(60); if (pthread_mutex_init(&s_mutex, NULL) != 0) {
It seems likely that there is a timeout. I'll test tonight.
The testcase passes with a time 0f 60s. But it fails still with 50s. The value is likely to be machine dependent - my PC is pretty old though. Intel(R) Xeon(R) CPU W3520 @ 2.67GHz About 10 years old.
(In reply to Paul Floyd from comment #7) > The testcase passes with a time 0f 60s. But it fails still with 50s. The > value is likely to be machine dependent - my PC is pretty old though. > > Intel(R) Xeon(R) CPU W3520 @ 2.67GHz > > About 10 years old. If the test passes the timeout doesn't really matter. Lets set it to 60 if that guarantees it passes even on old hardware. We do have to look at the slowness of the debuginfo parsing. I also noticed it with the debuginfod support. But this particular issue seems to be solved by: Make the dwarf3 reader more robust and less chatty when things go wrong Skip some stuff when seeing an unknown language, be less chatty about parser issues. All the issues seem to come from the multi-file, that is the shared (supplementary or alt) file containing debuginfo shared by all the gcc/runtime libraries. There are a couple of issues that this patch works around: - The multifile contains entries for the 'D' language, which has some constructs we don't expect. - We don't read partial units correctly, which means we often don't know the language we are looking at. - The parser is very chatty about issues it didn't expect (even if they are ignored, it will still output something) It only shows up with --read-var-info=yes which some tests enable, but which is disabled by default. Also increate the timeout of drd/tests/pth_cleanup_handler.c because DWARF reading is so slow. https://bugs.kde.org/show_bug.cgi?id=433500