307082 – HG false positive: pthread_cond_destroy: destruction of unknown cond var

Bug 307082 - HG false positive: pthread_cond_destroy: destruction of unknown cond var

Summary: HG false positive: pthread_cond_destroy: destruction of unknown cond var

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	helgrind (other bugs)
Version First Reported In:	3.8.0
Platform:	openSUSE Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-09-20 02:52 UTC by Marc Alff
Modified:	2013-04-02 18:27 UTC (History)
CC List:	5 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
WIP patch to fix the problem (6.57 KB, patch) 2012-12-18 07:21 UTC, zephyrus00jp	Details
*Improved patch. (A '' was missing.)** (6.57 KB, patch) 2012-12-18 20:07 UTC, zephyrus00jp	Details
More improvement and fixed another problem for pthread_cond (10.17 KB, patch) 2012-12-23 17:01 UTC, zephyrus00jp	Details
Show Obsolete (2) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Marc Alff 2012-09-20 02:52:07 UTC

Helgrind complains that pthread_cond_destroy() is called for an unknown cond,
even when the condition really is initialized.

Looking at the code, pthread_cond_init() is *not* instrumented in HG.


Reproducible: Always

Steps to Reproduce:
marc.cc:

#include <pthread.h>

int main(int argc, char *argv[])
{
  pthread_cond_t c;
  pthread_cond_init(& c, NULL);
  pthread_cond_destroy(& c);
  return 0;
}

valgrind --tool=helgrind ./marc

Found in 3.8.0, verified again in 3.8.1


Actual Results:  
malff@linux-8edv:wtf> valgrind --tool=helgrind ./marc
==17314== Helgrind, a thread error detector
==17314== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al.
==17314== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==17314== Command: ./marc
==17314== 
==17314== ---Thread-Announcement------------------------------------------
==17314== 
==17314== Thread #1 is the program's root thread
==17314== 
==17314== ----------------------------------------------------------------
==17314== 
==17314== Thread #1: pthread_cond_destroy: destruction of unknown cond var
==17314==    at 0x4C28C79: pthread_cond_destroy_WRK (hg_intercepts.c:940)
==17314==    by 0x4C2A986: pthread_cond_destroy@* (hg_intercepts.c:958)
==17314==    by 0x40063F: main (marc.cc:9)
==17314== 
==17314== 
==17314== For counts of detected and suppressed errors, rerun with: -v
==17314== Use --history-level=approx or =none to gain increased speed, at
==17314== the cost of reduced accuracy of conflicting-access information
==17314== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)


Expected Results:  
No error, the pthread_cond cond var should be known to HG


hg_intercepts.c:
/*----------------------------------------------------------------*/
/*--- pthread_cond_t functions                                 ---*/
/*----------------------------------------------------------------*/

/* Handled:   pthread_cond_wait pthread_cond_timedwait
              pthread_cond_signal pthread_cond_broadcast
              pthread_cond_destroy

   Unhandled: pthread_cond_init
              -- is this important?
*/

A condition does not have to be ever signaled, so instrumenting pthread_cond_init()
is important to let HG know the condition exists, to avoid errors in pthread_cond_destroy().

Comment 1 Leif Walsh 2012-12-04 17:26:49 UTC

I believe I have hit this in the wild and I'd like to be notified when it is resolved.  If there's anything I can help with let me know (but it looks like there's a diagnosis already).

Comment 2 Julian Seward 2012-12-05 13:50:26 UTC

(In reply to comment #1)
The diagnosis is correct.  Fix requires adding a pthread_cond_init initcept and
hooking it up to the HG internals (hg_main.c, maybe) that keep track of
cond vars.

Comment 3 zephyrus00jp 2012-12-18 07:21:57 UTC

Created attachment 75894 [details]
WIP patch to fix the problem

I was advised to look at this bug entry after I reported "destruction of unknown cond var" when I ran mozilla thunderbird mail client under helgrind.
It seems that the messages are likely to be either bugs/false positives.

I have produced a work-in-progress patch to take care this issue.
I wonder if the developers in the know can take a look and improve it.

Before the patch, valgrind 3.8.1 reported the issue as Marc reported.

(marc.c is the code reported by marc)

ishikawa@debian-vbox-ci:/tmp$ gcc -g -o /tmp/a.out  -lpthread ~/Dropbox/marc.c 
ishikawa@debian-vbox-ci:/tmp$ /tmp/a.out
ishikawa@debian-vbox-ci:/tmp$ valgrind --tool=helgrind /tmp/a.out
==22785== Helgrind, a thread error detector
==22785== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al.
==22785== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==22785== Command: /tmp/a.out
==22785== 
==22785== ---Thread-Announcement------------------------------------------
==22785== 
==22785== Thread #1 is the program's root thread
==22785== 
==22785== ----------------------------------------------------------------
==22785== 
==22785== Thread #1: pthread_cond_destroy: destruction of unknown cond var
==22785==    at 0x4027B1E: pthread_cond_destroy_WRK (hg_intercepts.c:940)
==22785==    by 0x402987F: pthread_cond_destroy@* (hg_intercepts.c:958)
==22785==    by 0x80484F4: main (marc.c:6)
==22785== 
==22785== 
==22785== For counts of detected and suppressed errors, rerun with: -v
==22785== Use --history-level=approx or =none to gain increased speed, at
==22785== the cost of reduced accuracy of conflicting-access information
==22785== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

But after the patch, it runs as follows.

$ valgrind --tool=helgrind /tmp/a.out
==25080== Helgrind, a thread error detector
==25080== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al.
==25080== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==25080== Command: /tmp/a.out
==25080== 
==25080== 
==25080== For counts of detected and suppressed errors, rerun with: -v
==25080== Use --history-level=approx or =none to gain increased speed, at
==25080== the cost of reduced accuracy of conflicting-access information
==25080== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
ishikawa@debian-vbox-ci:/tmp$ 


Not sure, though if it works with the initialized data as in
   pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
   pthread_cond_t cond = PTHREAD_COND_INITIALIZER;


I really want to get down to the bottom of the problem I was analyzing
(random crashes of mozilla thunderbird mail client.) Seeing a mail
client crash when it was exiting is very distressful: I have to wonder
if the e-mail from an important client is stored properly in the file
or not before the crash :-(

So if developers in the know can take a look at the patch, and offer
suggestions for improvement I am all ears so that I can use a patched
valgrind/helgrind 3.8.1-b to dig down the problem with mozilla thunderbird (with the reduced clutter of "unknown cond var".

TIA

Comment 4 zephyrus00jp 2012-12-18 09:15:00 UTC

The patch changed the error output for some error cases for pthread_cond_wait.
When I run test using

perl tests/vg_regtest helgrind

I got the following

-- Running  tests in helgrind/tests ------------------------------------
annotate_hbefore: valgrind   -q --fair-sched=try ./annotate_hbefore 
annotate_rwlock: valgrind   -q ./annotate_rwlock 
annotate_smart_pointer: valgrind   ./../../drd/tests/annotate_smart_pointer 50 5
bar_bad:         valgrind   -q ./bar_bad 
		 
		 ...


tc21_pthonce:    valgrind   --read-var-info=yes ./tc21_pthonce 
tc22_exit_w_lock: valgrind   ./tc22_exit_w_lock 
tc23_bogus_condwait: valgrind   ./tc23_bogus_condwait 
*** tc23_bogus_condwait failed (stderr) ***
tc24_nonzero_sem: valgrind   --hg-sanity-flags=111111 ./tc24_nonzero_sem 
-- Finished tests in helgrind/tests ------------------------------------

== 49 tests, 1 stderr failure, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
helgrind/tests/tc23_bogus_condwait       (stderr)

Comment 5 zephyrus00jp 2012-12-18 09:17:09 UTC

(Sorry I hit the wrong key prematurely.) Here is the diff output for tc23_bogus_condwait stderr.
--- tc23_bogus_condwait.stderr.exp	2012-08-17 15:59:45.000000000 +0900
+++ tc23_bogus_condwait.stderr.out	2012-12-18 17:51:33.000000000 +0900
@@ -26,6 +26,14 @@
 
 ----------------------------------------------------------------
 
+Thread #x's call to pthread_cond_wait failed
+   with error code 22 (EINVAL: Invalid argument)
+   at 0x........: pthread_cond_wait_WRK (hg_intercepts.c:...)
+   by 0x........: pthread_cond_wait@* (hg_intercepts.c:...)
+   by 0x........: main (tc23_bogus_condwait.c:72)
+
+----------------------------------------------------------------
+
 Thread #x: pthread_cond_{timed}wait called with mutex of type pthread_rwlock_t*
    at 0x........: pthread_cond_wait_WRK (hg_intercepts.c:...)
    by 0x........: pthread_cond_wait@* (hg_intercepts.c:...)
@@ -53,4 +61,4 @@
    by 0x........: main (tc23_bogus_condwait.c:78)
 
 
-ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)
+ERROR SUMMARY: 8 errors from 8 contexts (suppressed: 0 from 0)

So maybe the patch needs to be tweaked a bit.

TIA

Comment 6 zephyrus00jp 2012-12-18 19:26:46 UTC

Oh, how confusing.

Without the patch, the pristine 3.8.1 already produced the
regression error under linux 32bits.
uname -a
Linux debian-vm 3.2.0-4-686-pae #1 SMP Debian 3.2.32-1 i686 GNU/Linux

So the regression error is not the patch's fault.

== 564 tests, 6 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 1 post failure ==
memcheck/tests/varinfo1                  (stderr)
memcheck/tests/varinfo2                  (stderr)
memcheck/tests/varinfo3                  (stderr)
memcheck/tests/varinfo4                  (stderr)
memcheck/tests/varinfo5                  (stderr)
massif/tests/overloaded-new              (post)
helgrind/tests/tc23_bogus_condwait       (stderr)

Comment 7 zephyrus00jp 2012-12-18 20:07:43 UTC

Created attachment 75908 [details]
Improved patch. (A '*' was missing.)

A "*" was missing in a place.
So the previous patch worked on 32 bit systems, but it may not work on
64 bit patch. 

The second * was missing in the previous patch.
+   DO_CREQ_v_WW(_VG_USERREQ__HG_PTHREAD_COND_INIT_PRE,
+		pthread_cond_t*,cond, pthread_condattr_t*, cond_attr);

I am using the current patch to test mozilla thunderbird under helgrind now.

Comment 8 zephyrus00jp 2012-12-23 17:01:31 UTC

Created attachment 75985 [details]
More improvement and fixed another problem for pthread_cond

A season's gift :-)

More complete patch for pthread_cond_init(), and
also I fixed a problem with pthread_cond_destroy().

The unpatched helgrind 3.8.1 released the internal structure for cond variable
even if libpthread does not release it due to EBUSY error condition (that is
tasks are waiting on the cond variable).
I noticed this while trying to check the operation of helgrind against a
sample program in
http://www.cs.kent.edu/~ruttan/sysprog/lectures/multi-thread/multi-thread.html
specifically,
http://www.cs.kent.edu/~ruttan/sysprog/lectures/multi-thread/thread-pool-server.c
by adding pthread_cond_destroy() in a key place or two.

(Adding pthread_cond_destroy() caused the sample program in
https://www.securecoding.cert.org/confluence/display/seccode/POS54-C.+Notify+all+POSIX+threads+waiting+on+a+condition+variable+instead+of+a+single+thread  caused a hung program, but I think helgrind was OK. My mods were probably produced incorrect C proram.)


Anyway, so I fixed the new problem as well.
However, this patch takes care of it and I have a better output now.
I am using it now to run mozilla thunderbird throug its "make mozmill" test harness to find out if there are any problems related to threads especially 
usage of cond varirables. Unpatched valgrind 3.8.1 showed destruction of
cond var being waited upon, but I am afraid that most of the error messages were
false negatives.  

BTW, as a result of the patch, the regression test prints out error for helgrind/tests/pth_destroy_cond.c test. But it is not a big deal and can
be taken care of easily.

A peaceful holiday season and a happy new year to everybody.

Comment 9 Johan Walles 2013-03-12 08:22:17 UTC

I can confirm this; happens for me as well.

I can also confirm that the latest patch from zephyrus00jp removes the false positives for me, thanks zephyrus00jp!

Comment 10 zephyrus00jp 2013-03-12 13:34:01 UTC

(2013/03/12 17:22), Johan Walles wrote:
> https://bugs.kde.org/show_bug.cgi?id=307082
>
> Johan Walles <johan.walles@gmail.com> changed:
>
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                   CC|                            |johan.walles@gmail.com
>
> --- Comment #9 from Johan Walles <johan.walles@gmail.com> ---
> I can confirm this; happens for me as well.
>
> I can also confirm that the latest patch from zephyrus00jp removes the false
> positives for me, thanks zephyrus00jp!
>

Dear Johan,

You are very welcome!

I believe Julian (one of the valgrind/helgrind developers) has cleaned up my patch
and is using it.
He is using it to hunt down race problems in firefox.
He also got false positives before the patch.

So let us expect that the patch will be in the next major release of valgrind/helgrind.

Chiaki Ishikawa
(aka zephyrus00jp)

Comment 11 Johan Walles 2013-03-12 19:28:43 UTC

Chiaki, this bug still exists in the latest SVN version of Valgrind.

Your patch applies though and resolves the problem.

  Regards //Johan

Comment 12 zephyrus00jp 2013-03-13 03:35:12 UTC

(2013/03/13 4:28), Johan Walles wrote:
> https://bugs.kde.org/show_bug.cgi?id=307082
>
> --- Comment #11 from Johan Walles <johan.walles@gmail.com> ---
> Chiaki, this bug still exists in the latest SVN version of Valgrind.
>

Oh, I should have known.

> Your patch applies though and resolves the problem.

This is good to know.

>
>    Regards //Johan
>

Chiaki

Comment 13 Philippe Waroquiers 2013-03-18 22:55:51 UTC

(In reply to comment #8)
> The unpatched helgrind 3.8.1 released the internal structure for cond
> variable
> even if libpthread does not release it due to EBUSY error condition (that is
> tasks are waiting on the cond variable).
Fix for this committed as rev 13329
(in addition to not destroy the cvi, the cvi is kept in the FM)
Thanks

(should take a look at the "init" thing in the coming days

Comment 14 Philippe Waroquiers 2013-03-24 20:14:04 UTC

(In reply to comment #10)
> (2013/03/12 17:22), Johan Walles wrote:
> > https://bugs.kde.org/show_bug.cgi?id=307082
> >
> > Johan Walles <johan.walles@gmail.com> changed:
> >
> >             What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >                   CC|                            |johan.walles@gmail.com
> >
> > --- Comment #9 from Johan Walles <johan.walles@gmail.com> ---
> > I can confirm this; happens for me as well.
> >
> > I can also confirm that the latest patch from zephyrus00jp removes the false
> > positives for me, thanks zephyrus00jp!
> >
> 
> Dear Johan,
> 
> You are very welcome!
> 
> I believe Julian (one of the valgrind/helgrind developers) has cleaned up my
> patch
> and is using it.
> He is using it to hunt down race problems in firefox.
> He also got false positives before the patch.
> 
> So let us expect that the patch will be in the next major release of
> valgrind/helgrind.
Revision 13332 should fix the problem.
Committed patch derived from the patch attached to this bug
(with some changes, e.g. doing a POST rather than a PRE,)
Thanks

Comment 15 zephyrus00jp 2013-04-02 18:27:58 UTC

(2013/03/25 5:14), Philippe Waroquiers wrote:
> https://bugs.kde.org/show_bug.cgi?id=307082
>
> Philippe Waroquiers <philippe.waroquiers@skynet.be> changed:
>
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>               Status|UNCONFIRMED                 |RESOLVED
>           Resolution|---                         |FIXED
>
> --- Comment #14 from Philippe Waroquiers <philippe.waroquiers@skynet.be> ---
> (In reply to comment #10)
>> (2013/03/12 17:22), Johan Walles wrote:
>>> https://bugs.kde.org/show_bug.cgi?id=307082
>>>
>>> Johan Walles <johan.walles@gmail.com> changed:
>>>
>>>              What    |Removed                     |Added
>>> ----------------------------------------------------------------------------
>>>                    CC|                            |johan.walles@gmail.com
>>>
>>> --- Comment #9 from Johan Walles <johan.walles@gmail.com> ---
>>> I can confirm this; happens for me as well.
>>>
>>> I can also confirm that the latest patch from zephyrus00jp removes the false
>>> positives for me, thanks zephyrus00jp!
>>>
>>
>> Dear Johan,
>>
>> You are very welcome!
>>
>> I believe Julian (one of the valgrind/helgrind developers) has cleaned up my
>> patch
>> and is using it.
>> He is using it to hunt down race problems in firefox.
>> He also got false positives before the patch.
>>
>> So let us expect that the patch will be in the next major release of
>> valgrind/helgrind.
> Revision 13332 should fix the problem.
> Committed patch derived from the patch attached to this bug
> (with some changes, e.g. doing a POST rather than a PRE,)
> Thanks
>

Thank you for incorporating a fix for this issue, and
for sharing this great tool with the world's development community.

Happy Hacking
(in the good old sense of the word)