Helgrind complains that pthread_cond_destroy() is called for an unknown cond, even when the condition really is initialized. Looking at the code, pthread_cond_init() is *not* instrumented in HG. Reproducible: Always Steps to Reproduce: marc.cc: #include <pthread.h> int main(int argc, char *argv[]) { pthread_cond_t c; pthread_cond_init(& c, NULL); pthread_cond_destroy(& c); return 0; } valgrind --tool=helgrind ./marc Found in 3.8.0, verified again in 3.8.1 Actual Results: malff@linux-8edv:wtf> valgrind --tool=helgrind ./marc ==17314== Helgrind, a thread error detector ==17314== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al. ==17314== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==17314== Command: ./marc ==17314== ==17314== ---Thread-Announcement------------------------------------------ ==17314== ==17314== Thread #1 is the program's root thread ==17314== ==17314== ---------------------------------------------------------------- ==17314== ==17314== Thread #1: pthread_cond_destroy: destruction of unknown cond var ==17314== at 0x4C28C79: pthread_cond_destroy_WRK (hg_intercepts.c:940) ==17314== by 0x4C2A986: pthread_cond_destroy@* (hg_intercepts.c:958) ==17314== by 0x40063F: main (marc.cc:9) ==17314== ==17314== ==17314== For counts of detected and suppressed errors, rerun with: -v ==17314== Use --history-level=approx or =none to gain increased speed, at ==17314== the cost of reduced accuracy of conflicting-access information ==17314== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Expected Results: No error, the pthread_cond cond var should be known to HG hg_intercepts.c: /*----------------------------------------------------------------*/ /*--- pthread_cond_t functions ---*/ /*----------------------------------------------------------------*/ /* Handled: pthread_cond_wait pthread_cond_timedwait pthread_cond_signal pthread_cond_broadcast pthread_cond_destroy Unhandled: pthread_cond_init -- is this important? */ A condition does not have to be ever signaled, so instrumenting pthread_cond_init() is important to let HG know the condition exists, to avoid errors in pthread_cond_destroy().
I believe I have hit this in the wild and I'd like to be notified when it is resolved. If there's anything I can help with let me know (but it looks like there's a diagnosis already).
(In reply to comment #1) The diagnosis is correct. Fix requires adding a pthread_cond_init initcept and hooking it up to the HG internals (hg_main.c, maybe) that keep track of cond vars.
Created attachment 75894 [details] WIP patch to fix the problem I was advised to look at this bug entry after I reported "destruction of unknown cond var" when I ran mozilla thunderbird mail client under helgrind. It seems that the messages are likely to be either bugs/false positives. I have produced a work-in-progress patch to take care this issue. I wonder if the developers in the know can take a look and improve it. Before the patch, valgrind 3.8.1 reported the issue as Marc reported. (marc.c is the code reported by marc) ishikawa@debian-vbox-ci:/tmp$ gcc -g -o /tmp/a.out -lpthread ~/Dropbox/marc.c ishikawa@debian-vbox-ci:/tmp$ /tmp/a.out ishikawa@debian-vbox-ci:/tmp$ valgrind --tool=helgrind /tmp/a.out ==22785== Helgrind, a thread error detector ==22785== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al. ==22785== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==22785== Command: /tmp/a.out ==22785== ==22785== ---Thread-Announcement------------------------------------------ ==22785== ==22785== Thread #1 is the program's root thread ==22785== ==22785== ---------------------------------------------------------------- ==22785== ==22785== Thread #1: pthread_cond_destroy: destruction of unknown cond var ==22785== at 0x4027B1E: pthread_cond_destroy_WRK (hg_intercepts.c:940) ==22785== by 0x402987F: pthread_cond_destroy@* (hg_intercepts.c:958) ==22785== by 0x80484F4: main (marc.c:6) ==22785== ==22785== ==22785== For counts of detected and suppressed errors, rerun with: -v ==22785== Use --history-level=approx or =none to gain increased speed, at ==22785== the cost of reduced accuracy of conflicting-access information ==22785== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) But after the patch, it runs as follows. $ valgrind --tool=helgrind /tmp/a.out ==25080== Helgrind, a thread error detector ==25080== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al. ==25080== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==25080== Command: /tmp/a.out ==25080== ==25080== ==25080== For counts of detected and suppressed errors, rerun with: -v ==25080== Use --history-level=approx or =none to gain increased speed, at ==25080== the cost of reduced accuracy of conflicting-access information ==25080== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ishikawa@debian-vbox-ci:/tmp$ Not sure, though if it works with the initialized data as in pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t cond = PTHREAD_COND_INITIALIZER; I really want to get down to the bottom of the problem I was analyzing (random crashes of mozilla thunderbird mail client.) Seeing a mail client crash when it was exiting is very distressful: I have to wonder if the e-mail from an important client is stored properly in the file or not before the crash :-( So if developers in the know can take a look at the patch, and offer suggestions for improvement I am all ears so that I can use a patched valgrind/helgrind 3.8.1-b to dig down the problem with mozilla thunderbird (with the reduced clutter of "unknown cond var". TIA
The patch changed the error output for some error cases for pthread_cond_wait. When I run test using perl tests/vg_regtest helgrind I got the following -- Running tests in helgrind/tests ------------------------------------ annotate_hbefore: valgrind -q --fair-sched=try ./annotate_hbefore annotate_rwlock: valgrind -q ./annotate_rwlock annotate_smart_pointer: valgrind ./../../drd/tests/annotate_smart_pointer 50 5 bar_bad: valgrind -q ./bar_bad ... tc21_pthonce: valgrind --read-var-info=yes ./tc21_pthonce tc22_exit_w_lock: valgrind ./tc22_exit_w_lock tc23_bogus_condwait: valgrind ./tc23_bogus_condwait *** tc23_bogus_condwait failed (stderr) *** tc24_nonzero_sem: valgrind --hg-sanity-flags=111111 ./tc24_nonzero_sem -- Finished tests in helgrind/tests ------------------------------------ == 49 tests, 1 stderr failure, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures == helgrind/tests/tc23_bogus_condwait (stderr)
(Sorry I hit the wrong key prematurely.) Here is the diff output for tc23_bogus_condwait stderr. --- tc23_bogus_condwait.stderr.exp 2012-08-17 15:59:45.000000000 +0900 +++ tc23_bogus_condwait.stderr.out 2012-12-18 17:51:33.000000000 +0900 @@ -26,6 +26,14 @@ ---------------------------------------------------------------- +Thread #x's call to pthread_cond_wait failed + with error code 22 (EINVAL: Invalid argument) + at 0x........: pthread_cond_wait_WRK (hg_intercepts.c:...) + by 0x........: pthread_cond_wait@* (hg_intercepts.c:...) + by 0x........: main (tc23_bogus_condwait.c:72) + +---------------------------------------------------------------- + Thread #x: pthread_cond_{timed}wait called with mutex of type pthread_rwlock_t* at 0x........: pthread_cond_wait_WRK (hg_intercepts.c:...) by 0x........: pthread_cond_wait@* (hg_intercepts.c:...) @@ -53,4 +61,4 @@ by 0x........: main (tc23_bogus_condwait.c:78) -ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0) +ERROR SUMMARY: 8 errors from 8 contexts (suppressed: 0 from 0) So maybe the patch needs to be tweaked a bit. TIA
Oh, how confusing. Without the patch, the pristine 3.8.1 already produced the regression error under linux 32bits. uname -a Linux debian-vm 3.2.0-4-686-pae #1 SMP Debian 3.2.32-1 i686 GNU/Linux So the regression error is not the patch's fault. == 564 tests, 6 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 1 post failure == memcheck/tests/varinfo1 (stderr) memcheck/tests/varinfo2 (stderr) memcheck/tests/varinfo3 (stderr) memcheck/tests/varinfo4 (stderr) memcheck/tests/varinfo5 (stderr) massif/tests/overloaded-new (post) helgrind/tests/tc23_bogus_condwait (stderr)
Created attachment 75908 [details] Improved patch. (A '*' was missing.) A "*" was missing in a place. So the previous patch worked on 32 bit systems, but it may not work on 64 bit patch. The second * was missing in the previous patch. + DO_CREQ_v_WW(_VG_USERREQ__HG_PTHREAD_COND_INIT_PRE, + pthread_cond_t*,cond, pthread_condattr_t*, cond_attr); I am using the current patch to test mozilla thunderbird under helgrind now.
Created attachment 75985 [details] More improvement and fixed another problem for pthread_cond A season's gift :-) More complete patch for pthread_cond_init(), and also I fixed a problem with pthread_cond_destroy(). The unpatched helgrind 3.8.1 released the internal structure for cond variable even if libpthread does not release it due to EBUSY error condition (that is tasks are waiting on the cond variable). I noticed this while trying to check the operation of helgrind against a sample program in http://www.cs.kent.edu/~ruttan/sysprog/lectures/multi-thread/multi-thread.html specifically, http://www.cs.kent.edu/~ruttan/sysprog/lectures/multi-thread/thread-pool-server.c by adding pthread_cond_destroy() in a key place or two. (Adding pthread_cond_destroy() caused the sample program in https://www.securecoding.cert.org/confluence/display/seccode/POS54-C.+Notify+all+POSIX+threads+waiting+on+a+condition+variable+instead+of+a+single+thread caused a hung program, but I think helgrind was OK. My mods were probably produced incorrect C proram.) Anyway, so I fixed the new problem as well. However, this patch takes care of it and I have a better output now. I am using it now to run mozilla thunderbird throug its "make mozmill" test harness to find out if there are any problems related to threads especially usage of cond varirables. Unpatched valgrind 3.8.1 showed destruction of cond var being waited upon, but I am afraid that most of the error messages were false negatives. BTW, as a result of the patch, the regression test prints out error for helgrind/tests/pth_destroy_cond.c test. But it is not a big deal and can be taken care of easily. A peaceful holiday season and a happy new year to everybody.
I can confirm this; happens for me as well. I can also confirm that the latest patch from zephyrus00jp removes the false positives for me, thanks zephyrus00jp!
(2013/03/12 17:22), Johan Walles wrote: > https://bugs.kde.org/show_bug.cgi?id=307082 > > Johan Walles <johan.walles@gmail.com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |johan.walles@gmail.com > > --- Comment #9 from Johan Walles <johan.walles@gmail.com> --- > I can confirm this; happens for me as well. > > I can also confirm that the latest patch from zephyrus00jp removes the false > positives for me, thanks zephyrus00jp! > Dear Johan, You are very welcome! I believe Julian (one of the valgrind/helgrind developers) has cleaned up my patch and is using it. He is using it to hunt down race problems in firefox. He also got false positives before the patch. So let us expect that the patch will be in the next major release of valgrind/helgrind. Chiaki Ishikawa (aka zephyrus00jp)
Chiaki, this bug still exists in the latest SVN version of Valgrind. Your patch applies though and resolves the problem. Regards //Johan
(2013/03/13 4:28), Johan Walles wrote: > https://bugs.kde.org/show_bug.cgi?id=307082 > > --- Comment #11 from Johan Walles <johan.walles@gmail.com> --- > Chiaki, this bug still exists in the latest SVN version of Valgrind. > Oh, I should have known. > Your patch applies though and resolves the problem. This is good to know. > > Regards //Johan > Chiaki
(In reply to comment #8) > The unpatched helgrind 3.8.1 released the internal structure for cond > variable > even if libpthread does not release it due to EBUSY error condition (that is > tasks are waiting on the cond variable). Fix for this committed as rev 13329 (in addition to not destroy the cvi, the cvi is kept in the FM) Thanks (should take a look at the "init" thing in the coming days
(In reply to comment #10) > (2013/03/12 17:22), Johan Walles wrote: > > https://bugs.kde.org/show_bug.cgi?id=307082 > > > > Johan Walles <johan.walles@gmail.com> changed: > > > > What |Removed |Added > > ---------------------------------------------------------------------------- > > CC| |johan.walles@gmail.com > > > > --- Comment #9 from Johan Walles <johan.walles@gmail.com> --- > > I can confirm this; happens for me as well. > > > > I can also confirm that the latest patch from zephyrus00jp removes the false > > positives for me, thanks zephyrus00jp! > > > > Dear Johan, > > You are very welcome! > > I believe Julian (one of the valgrind/helgrind developers) has cleaned up my > patch > and is using it. > He is using it to hunt down race problems in firefox. > He also got false positives before the patch. > > So let us expect that the patch will be in the next major release of > valgrind/helgrind. Revision 13332 should fix the problem. Committed patch derived from the patch attached to this bug (with some changes, e.g. doing a POST rather than a PRE,) Thanks
(2013/03/25 5:14), Philippe Waroquiers wrote: > https://bugs.kde.org/show_bug.cgi?id=307082 > > Philippe Waroquiers <philippe.waroquiers@skynet.be> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|UNCONFIRMED |RESOLVED > Resolution|--- |FIXED > > --- Comment #14 from Philippe Waroquiers <philippe.waroquiers@skynet.be> --- > (In reply to comment #10) >> (2013/03/12 17:22), Johan Walles wrote: >>> https://bugs.kde.org/show_bug.cgi?id=307082 >>> >>> Johan Walles <johan.walles@gmail.com> changed: >>> >>> What |Removed |Added >>> ---------------------------------------------------------------------------- >>> CC| |johan.walles@gmail.com >>> >>> --- Comment #9 from Johan Walles <johan.walles@gmail.com> --- >>> I can confirm this; happens for me as well. >>> >>> I can also confirm that the latest patch from zephyrus00jp removes the false >>> positives for me, thanks zephyrus00jp! >>> >> >> Dear Johan, >> >> You are very welcome! >> >> I believe Julian (one of the valgrind/helgrind developers) has cleaned up my >> patch >> and is using it. >> He is using it to hunt down race problems in firefox. >> He also got false positives before the patch. >> >> So let us expect that the patch will be in the next major release of >> valgrind/helgrind. > Revision 13332 should fix the problem. > Committed patch derived from the patch attached to this bug > (with some changes, e.g. doing a POST rather than a PRE,) > Thanks > Thank you for incorporating a fix for this issue, and for sharing this great tool with the world's development community. Happy Hacking (in the good old sense of the word)