Bug 137396

Summary:	I would really like helgrind to work again...
Product:	[Developer tools] valgrind	Reporter:	François PELLEGRINI <pelegrin>
Component:	helgrind	Assignee:	Julian Seward <jseward>
Status:	RESOLVED FIXED
Severity:	wishlist	CC:	dooglus, trapni
Priority:	NOR
Version First Reported In:	3.2.1
Target Milestone:	---
Platform:	Compiled Sources
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:
Attachments:	thrcheck output calling g_thread_create_full() suppressions I had to add to stop complaints about libraries

Description François PELLEGRINI 2006-11-15 16:50:38 UTC

I am trying to debug multi-threaded MPI programs, and being able to use helgrind
would be a big plus. Maybe a Christmas gift ?   9^)
Anyway, thanks a lot to all of the Valgrind team for the tremendous job you have
done and are doing !
                                f.p.

Comment 1 Brian E Campbell 2006-12-08 04:59:22 UTC

I second that.

-Brian

Comment 2 Lukas Fittl 2007-03-23 18:20:48 UTC

*** This bug has been confirmed by popular vote. ***

Comment 3 Julian Seward 2007-10-09 17:14:33 UTC

A replacement for Helgrind is under active development and nearing beta
status.  Try this:

  svn co svn://svn.valgrind.org/valgrind/branches/THRCHECK
  cd THRCHECK
  ./autogen.sh && ./configure --prefix=`pwd`/Inst
  make && make install
  ./Inst/bin/valgrind --tool=thrcheck my_threaded_app

Feedback welcomed.

Comment 4 Michael Lacher 2007-10-09 18:24:22 UTC

I just tried it, and it seems to work. Here are some comments that I found after fooling around with it for only a bit:

a) is there documentation available ? I only glanced cursorly in the sources, but nothing seemed to catch my eye
b) it might need some default suppressions. I seem to be getting possible dataraces for unlocking a mutex for example (for some internal variable (data.lock or something) of the mutex itself). I doubt this is actually an error on my part.
c) the output is good, the "reason" line is very helpful. What might be nice would be "last access by <stacktrace> at <time> (read/write)".
d) since i have no docu: does it do deadlock detection ? would be nice to have.
e) I don't know if it possible, but it would be nice to automatically disregard cases where thread A initializes some data, then starts a workerthread B to work on them as "possible data race". Since accesses before thread B actually came alive can never really be a data race and therefore do not need protection. same ofcourse for access after thread B dies.

but good work, I am sure this will help me a lot with my work. Thanks alot.

Comment 5 Julian Seward 2007-10-09 18:53:37 UTC

Thanks for the feedback.

> a) is there documentation available ? I only glanced cursorly in the
> sources, but nothing seemed to catch my eye


thrcheck/docs/tc-manual.xml.  Try this:

  (cd docs && make html-docs)
  konqueror docs/html/tc-manual.html

> b) it might need some default suppressions.


Yes.  It already has a lot, but probably not enough.  What Linux
distro are you using (and glibc version and what cpu ?)

> c) the output is good, the "reason" line is very helpful. What might be nice
> would be last access by <stacktrace> at <time> (read/write)"


Problem is, capturing a stacktrace is expensive, and doing that for
every memory reference would slow it down enormously.  It would be
possible to take a stacktrace every time a lock changes state
(is locked/unlocked); but not sure how that would be useful.

> d) since i have no docu: does it do deadlock detection ?


It does potential deadlock detection; do 'make check' and then run
thrcheck/tests/{tc13_laog,tc14_laog_dinphils,tc15_laog_lockdel}.

> e) I don't know if it possible, but it would be nice to automatically
> disregard cases where thread A initializes some data, then starts a
> workerthread B to work on them as "possible data race". Since accesses
> before thread B actually came alive can never really be a data race and
> therefore do not need protection. same ofcourse for access after thread B
> dies. 


Both of these cases should be handled correctly.  See
   thrcheck/tests/tc02_simple_tls.c
   thrchech/tests/tc03_re_excl.c

If they are not handled correctly it is a bug in thrcheck which 
should be fixed if possible.  Can you look a bit more at this?
Maybe produce a small test case?

Comment 6 Michael Lacher 2007-10-09 19:15:39 UTC

Julian Seward wrote:
[bugs.kde.org quoted mail]
Thx.

>> b) it might need some default suppressions.
> 
> 
> Yes.  It already has a lot, but probably not enough.  What Linux
> distro are you using (and glibc version and what cpu ?)
> 


This system was a P4 3.2GHz HT (one of the old prescott HT ones)
Gentoo, with glibc-2.5:

GNU C Library stable release version 2.5, by Roland McGrath et al.
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, 
pie-8.7.8).
Compiled on a Linux >>2.6.14-gentoo-r4<< system on 2007-09-17.
Available extensions:
         C stubs add-on version 2.1.2
         crypt add-on version 2.1 by Michael Glad and others
         Gentoo patchset 1.8
         GNU Libidn by Simon Josefsson
         GNU libio by Per Bothner
         NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
         Native POSIX Threads Library by Ulrich Drepper et al
         Support for some architectures added on, not maintained in 
glibc core.
         BIND-8.2.3-T5B
Thread-local storage support included.
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.


>> c) the output is good, the "reason" line is very helpful. What might be nice
>> would be last access by <stacktrace> at <time> (read/write)"
> 
> 
> Problem is, capturing a stacktrace is expensive, and doing that for
> every memory reference would slow it down enormously.  It would be
> possible to take a stacktrace every time a lock changes state
> (is locked/unlocked); but not sure how that would be useful.
> 

maybe it is possible to make a macro similar to CALLGRIND_TOGGLE_COLLECT 
that toggles collection of these stacktraces for small segments of code 
? then it would be possible to only turn it on for the interesting parts.

>> d) since i have no docu: does it do deadlock detection ?
> 
> 
> It does potential deadlock detection; do 'make check' and then run
> thrcheck/tests/{tc13_laog,tc14_laog_dinphils,tc15_laog_lockdel}.
> 

Hmm, I do get "Lock acquisition order is inconsistent" errors. I haven't 
check the testcase source, are those programs actually deadlocking ? 
(they seem to terminate when called normally).

>> e) I don't know if it possible, but it would be nice to automatically
>> disregard cases where thread A initializes some data, then starts a
>> workerthread B to work on them as "possible data race". Since accesses
>> before thread B actually came alive can never really be a data race and
>> therefore do not need protection. same ofcourse for access after thread B
>> dies. 
> 
> 
> Both of these cases should be handled correctly.  See
>    thrcheck/tests/tc02_simple_tls.c
>    thrchech/tests/tc03_re_excl.c
> 
> If they are not handled correctly it is a bug in thrcheck which 
> should be fixed if possible.  Can you look a bit more at this?
> Maybe produce a small test case?
> 

Both testcases run through valgrind without error. I will have to 
analyze the thing I saw in my program in more detail. I know that it is 
impossible to be an actual race, but the program is quite complex and I 
am not sure if this is actually possible to tell for valgrind or just 
obvious from what the program does.

Comment 7 Julian Seward 2007-10-09 19:38:39 UTC

> This system was a P4 3.2GHz HT (one of the old prescott HT ones)
> Gentoo, with glibc-2.5:


32-bit mode or 64-bit mode?

> Hmm, I do get "Lock acquisition order is inconsistent" errors. I haven't
> check the testcase source, are those programs actually deadlocking ?
> (they seem to terminate when called normally).


What this detects is cases where a deadlock *might* happen, depending on
how the threads are scheduled.  In tc14_laog_dinphils.c, if all 5 
philosopher threads do

      pthread_mutex_lock(&chop[left]);

exactly at the same time, and then do the next line

      pthread_mutex_lock(&chop[right]);

exactly at the same time, then the program will deadlock.  That's what
the tool is telling you.  I will improve the error message -- this is
very new functionality.

More generally, if all threads must acquire two locks L1 and L2 before
accessing a shared resource R, then all threads must acquire L1 and L2
in the same order (L1 then L2 or L2 then L1).  Doing otherwise risks
deadlock, since one thread could acquire L1 and another L2, and then
they deadlock when trying to acquire L2 and L1 respectively.

> Both testcases run through valgrind without error. I will have to
> analyze the thing I saw in my program in more detail. I know that it is
> impossible to be an actual race, but the program is quite complex and I
> am not sure if this is actually possible to tell for valgrind or just
> obvious from what the program does.


Let me know when you have more details.

Comment 8 Michael Lacher 2007-10-10 10:36:51 UTC

Julian Seward wrote:
[bugs.kde.org quoted mail]
32bit. tis cpu doesn't do 64bit at all.

If you want me to test some stuff, I also have various flavors of gentoo 
on a Core2 Quad, Dual Opteron, and two AthlonXP's around here. Plus some 
other distros in Xen on the Core2 quad.

>> Hmm, I do get "Lock acquisition order is inconsistent" errors. I haven't
>> check the testcase source, are those programs actually deadlocking ?
>> (they seem to terminate when called normally).
> 
> 
> What this detects is cases where a deadlock *might* happen, depending on
> how the threads are scheduled.  In tc14_laog_dinphils.c, if all 5 
> philosopher threads do
> 
>       pthread_mutex_lock(&chop[left]);
> 
> exactly at the same time, and then do the next line
> 
>       pthread_mutex_lock(&chop[right]);
> 
> exactly at the same time, then the program will deadlock.  That's what
> the tool is telling you.  I will improve the error message -- this is
> very new functionality.
> 
> More generally, if all threads must acquire two locks L1 and L2 before
> accessing a shared resource R, then all threads must acquire L1 and L2
> in the same order (L1 then L2 or L2 then L1).  Doing otherwise risks
> deadlock, since one thread could acquire L1 and another L2, and then
> they deadlock when trying to acquire L2 and L1 respectively.
> 

I see. I found one such potential deadlock in my app, but since it is in 
the logging class the number of potential threads and places where it is 
called is huge. Could you perhaps output which threads tried the 
potential deadlock and where ? Something like: Threads X and Y now 
locked in order L1 <stacktrace>, L2 <stacktrace> and it was previously 
locked in order L2 <stacktrace>, L1 <stacktrace> by Threads A and B.

>> Both testcases run through valgrind without error. I will have to
>> analyze the thing I saw in my program in more detail. I know that it is
>> impossible to be an actual race, but the program is quite complex and I
>> am not sure if this is actually possible to tell for valgrind or just
>> obvious from what the program does.
> 
> 
> Let me know when you have more details.
> 

Ok, I figured out what happens:

a) Thread #1 creates a variable and initializes it

b) Thread #2 is created and reads the variable

c) Thread #1 later overwrites the variable   <- this triggers the 
potential race. In my case this is impossible to race, because there is 
network traffic involved between (b) and (c) making it impossible to 
happen at the same time ((c) is in response to a request sent at (b)).

Which leads me to the question: what kinds of synchronization points are 
handled correctly. Take the following example:

a) Create Threads #1 and #2
b) Thread #2 waits on synchronization Primitive X
c) Thread #1 writes shared variable A
d) Thread #1 triggers X to release Thread #2
e) Thread #2 writes shared variable A

This sequence could never race, but for which kinds of X is this 
detected ? It obviously works for mutexes, but would it work for example 
for a condition variable, a sempahore or memory barrier ? I use 
constructs such as those described quite a lot, so it would be nice if 
that worked ;-) Ofc, some of synchronzie via network traffic, pipes, or 
other things which will probably never work ...

mucki

Comment 9 Julian Seward 2007-10-10 15:10:43 UTC

> I see. I found one such potential deadlock in my app, but since it is in
> the logging class the number of potential threads and places where it is
> called is huge. Could you perhaps output which threads tried the
> potential deadlock and where ? Something like: Threads X and Y now
> locked in order L1 <stacktrace>, L2 <stacktrace> and it was previously
> locked in order L2 <stacktrace>, L1 <stacktrace> by Threads A and B.


I'll look into doing that; won't be this week though.  Maybe next week.

> Which leads me to the question: what kinds of synchronization points are
> handled correctly. [...]
> detected ? It obviously works for mutexes, but would it work for example
> for a condition variable, a sempahore or memory barrier ?


Mutexes and condition variables.  However it might be possible to extend it.
One way you could help is to send some small example programs using semaphores
and memory barriers, that demonstrate the problem.  Then I can consider if
it is possible to fix the tool to handle them.

Comment 10 Michael Lacher 2007-10-10 15:43:04 UTC

Julian Seward wrote:
[bugs.kde.org quoted mail]
No worries :) I am happy already to see that something is working and 
that valgrind is not really dead (the news section looks very quiet ;-) )

>> Which leads me to the question: what kinds of synchronization points are
>> handled correctly. [...]
>> detected ? It obviously works for mutexes, but would it work for example
>> for a condition variable, a sempahore or memory barrier ?
> 
> 
> Mutexes and condition variables.  However it might be possible to extend it.
> One way you could help is to send some small example programs using semaphores
> and memory barriers, that demonstrate the problem.  Then I can consider if
> it is possible to fix the tool to handle them.
> 

I'm, pretty swamped with work atm, and all my synchronization and 
threading stuff is buried in utility libs, so it might take a while, but 
I will see what I can do. It might take a while however.

Comment 11 Chris Moore 2007-10-13 17:27:05 UTC

Created attachment 21813 [details]
thrcheck output calling g_thread_create_full()

I'm seeing the attached output when I create a thread using ubuntu feisty:

libc6-dev			       2.5-0ubuntu14
libgtk2.0-dev			       2.10.11-0ubuntu3
libgtkmm-2.4-dev		       2.10.8-0ubuntu  

Is this a case of a missing suppression?  Or am I misunderstanding the output?

Comment 12 Chris Moore 2007-10-13 17:43:09 UTC

I also see a data race while obtaining a read lock:

==6031== Possible data race during write to 0x172B8DC4 
==6031==    at 0x46C4DB0: pthread_rwlock_rdlock (in /lib/tls/i686/cmov/libpthread-2.5.so)
==6031==    by 0x4380C60: synfig::RWLock::reader_lock() (mutex.cpp:180)

libpthread-2.5.so is in this package:

libc6-i686                2.5-0ubuntu14             GNU C Library: Shared libraries [i686 optimized]

Comment 13 Julian Seward 2007-10-16 18:09:47 UTC

> > I see. I found one such potential deadlock in my app, but since it is in
> > the logging class the number of potential threads and places where it is
> > called is huge. Could you perhaps output which threads tried the
> > potential deadlock and where ? Something like: Threads X and Y now
> > locked in order L1 <stacktrace>, L2 <stacktrace> and it was previously
> > locked in order L2 <stacktrace>, L1 <stacktrace> by Threads A and B.


svn up to r7005 and try again.  I improved the messages.  They now look
like this:

 Thread #1: lock order "0x7FEFFFAA0 before 0x7FEFFFA70" violated
    at 0x4C23A55: pthread_mutex_lock (tc_intercepts.c:379)
    by 0x40081F: main (tc13_laog1.c:24)
   Required order was established by acquisition of lock at 0x7FEFFFAA0
    at 0x4C23A55: pthread_mutex_lock (tc_intercepts.c:379)
    by 0x400748: main (tc13_laog1.c:17)
   followed by a later acquisition of lock at 0x7FEFFFA70
    at 0x4C23A55: pthread_mutex_lock (tc_intercepts.c:379)
    by 0x400773: main (tc13_laog1.c:18)

Kind-of verbose, and maybe not exactly what you wanted; but doable, and
should help you track down the root cause of the problem.

Comment 14 Chris Moore 2007-10-16 23:36:18 UTC

Created attachment 21834 [details]
suppressions I had to add to stop complaints about libraries

I've attached a list of suppressions I've added to enable me to use thrcheck in
ubuntu feisty without warnings about libraries I use.

I'm not really understanding the output from thrcheck however, so it's possible
that some of these shouldn't be included in the default install.

Comment 15 Julian Seward 2007-10-17 08:47:09 UTC

On Tuesday 16 October 2007 23:36, Chris Moore wrote:
> I've attached a list of suppressions I've added to enable me to use
> thrcheck in ubuntu feisty without warnings about libraries I use.

Thanks for the suppressions.

> I'm not really understanding the output from thrcheck however, so it's
> possible that some of these shouldn't be included in the default install.

There is some documentation.  See comment #5 for details on how to 
build it.

I suggest you svn up (again), to at least r7008.  That reduces the 
flood of lock-order messages somewhat.

Overall, though, I am concerned to present the information in a way
which is understandable - that's a big concern here, as it is with 
all the tools.  So it would be useful to know what it is you found
difficult to understand - can you summarise?

Comment 16 Chris Moore 2007-10-17 14:33:46 UTC

It's not the format of the output that's the problem, it's what it really means.

I have a class which uses a single mutex around every read and every write of a certain member variable, and yet thrcheck still complains of a possible race.  I don't see how there can be a possible race when everything is protected by the same mutex, but thrcheck says there is.  That's what makes me think I don't understand what thrcheck is telling me.

I've not been able to run thrcheck at all for the last couple of days.  Since I started using Boehm's garbage collector, thrcheck crashes like this:

==16444== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==16444==  Access not within mapped region at address 0xBEC85190
==16444==    at 0x549843D: GC_mark_from (in /usr/lib/libgc.so.1.0.2)
==16444==    by 0x5498C2F: GC_mark_some (in /usr/lib/libgc.so.1.0.2)
==16444==    by 0x549075A: GC_stopped_mark (in /usr/lib/libgc.so.1.0.2)
==16444==    by 0x5490B0B: GC_try_to_collect_inner (in /usr/lib/libgc.so.1.0.2)
==16444==    by 0x549AC3D: GC_init_inner (in /usr/lib/libgc.so.1.0.2)
==16444==    by 0x5495EE5: GC_generic_malloc_inner (in /usr/lib/libgc.so.1.0.2)
==16444==    by 0x5495FE4: GC_generic_malloc (in /usr/lib/libgc.so.1.0.2)
==16444==    by 0x5496246: GC_malloc (in /usr/lib/libgc.so.1.0.2)
==16444==    by 0x82DB2CC: gc::operator new(unsigned) (gc_cpp.h:281)
==16444==    by 0x8389BCB: studio::App::App(int*, char***) (app.cpp:1104)
==16444==    by 0x83671D0: main (main.cpp:85)
==16444== 
==16444== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

I don't think this is a thrcheck bug specifically.  Valgrind crashes similarly.
Boehm's garbage collector is here: http://www.hpl.hp.com/personal/Hans_Boehm/gc/

Comment 17 Julian Seward 2007-10-17 14:40:03 UTC

> I have a class which uses a single mutex around every read and every write
> of a certain member variable, and yet thrcheck still complains of a
> possible race.  I don't see how there can be a possible race when
> everything is protected by the same mutex, but thrcheck says there is. 
> That's what makes me think I don't understand what thrcheck is telling me.


The tool certainly has its fair share of false positives and limitations.
So it could be wrong.  I'd like to know more.  Are you able to show any
more details of the problem (ideally a test case) ?

Comment 18 Michael Lacher 2007-10-22 10:21:04 UTC

Sorry for the late reply, but i was out of office for a week. This looks 
exactly like the kind of output i had in mind. I was infact able to 
determine the source of one of the deadlock error messages by this. It 
seems libstdc++ locks a mutex when creating a static variable for the 
first time. Certainly something i would have never guessed to be 
remotely a problem.

Julian Seward wrote:
[bugs.kde.org quoted mail]

Comment 19 Michael Lacher 2007-10-22 10:25:35 UTC

Some seem to be quite specific, but the first suppression 
(__lll_mutex_unlock_wake) is one that I was seeing a lot also. For some 
reason this seems to be detected as a race many times someone waits on a 
mutex. Interestingly enough, the data that causes the race is the "lock" 
member of the pthread_mutex_t variable (which is internal to the library 
and therefore not documented). I doubt that the pthread library is 
actually faulty, so maybe this is just thrcheck not realizing that the 
access is safe ? (maybe done via cmpxchg assembly or something?)

Chris Moore wrote:
[bugs.kde.org quoted mail]

Comment 20 Julian Seward 2007-10-22 10:54:27 UTC

> On Monday 22 October 2007 10:21, Michael Lacher wrote:
> This looks exactly like the kind of output i had in mind. I was infact able
> to determine the source of one of the deadlock error messages by this. 

Good.

Comment 21 Julian Seward 2007-10-22 10:57:20 UTC

On Monday 22 October 2007 10:25, Michael Lacher wrote:
> Some seem to be quite specific, but the first suppression
> (__lll_mutex_unlock_wake) is one that I was seeing a lot also. For some
> reason this seems to be detected as a race many times someone waits on a
> mutex. Interestingly enough, the data that causes the race is the "lock"
> member of the pthread_mutex_t variable (which is internal to the library
> and therefore not documented). I doubt that the pthread library is
> actually faulty, so maybe this is just thrcheck not realizing that the
> access is safe ? (maybe done via cmpxchg assembly or something?)

Because of the way thrcheck is constructed, it cannot properly make sense
of what is going on inside libpthread itself.  So there will be a bunch of
races falsely reported inside libpthread, and I simply have to suppress them.
So you should ignore this.  btw, it should handle LOCK cmpxchg etc correctly.

Comment 22 Michael Lacher 2007-10-22 12:32:21 UTC

 > Because of the way thrcheck is constructed, it cannot properly make sense
 > of what is going on inside libpthread itself.  So there will be a 

bunch of
 > races falsely reported inside libpthread, and I simply have to 

suppress them.
 > So you should ignore this.  btw, it should handle LOCK cmpxchg etc 

correctly.
 >


I see, thanks for the information.

Comment 23 Julian Seward 2007-10-27 02:37:28 UTC

> ------- Additional Comments From dooglus gmail com  2007-10-17 14:33
> I have a class which uses a single mutex around every read and every write
> of a certain member variable, and yet thrcheck still complains of a
> possible race.  I don't see how there can be a possible race when
> everything is protected by the same mutex, but thrcheck says there is. 


I've discovered that one possible cause of this is gcc's thread-unsafe
code generation.  See enormously long discussions starting at

     http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
     http://lkml.org/lkml/2007/10/24/673

In short, under some conditions gcc generates code which actually has
a race (which is then detected by Thrcheck) when the original source
program has no race (!).  This most often happens for conditional
stores:

   if (cond)
      x = ...

where x lives in memory.  See for example thrcheck/tests/tc17_sembar.c.
And so I wonder if your class is also affected by this problem.  Can you
provide any further details?

Comment 24 Julian Seward 2007-11-13 22:34:12 UTC

Fixed.  Will be in 3.3.0.