Bug 379630 - false positive std::mutex problems
Summary: false positive std::mutex problems
Status: CONFIRMED
Alias: None
Product: valgrind
Classification: Developer tools
Component: helgrind (show other bugs)
Version: 3.13 SVN
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Paul Floyd
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-08 09:19 UTC by Gregory Nilsson
Modified: 2023-09-13 08:37 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
test code (561 bytes, text/x-csrc)
2017-05-08 09:19 UTC, Gregory Nilsson
Details
log (11.92 KB, text/plain)
2017-05-08 09:19 UTC, Gregory Nilsson
Details
drd (10.80 KB, text/plain)
2017-05-08 13:36 UTC, Gregory Nilsson
Details
test case with mutex wrapper (790 bytes, text/x-csrc)
2017-05-09 08:11 UTC, Gregory Nilsson
Details
log when using wrapper (42.44 KB, text/plain)
2017-05-09 08:12 UTC, Gregory Nilsson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gregory Nilsson 2017-05-08 09:19:29 UTC
Created attachment 105387 [details]
test code

If an std::mutex goes out of scope and a new mutex happens to be allocated at the same address helgrind still remembers the old mutex, leading to strange behavior.

In the test case attached a lock order violation is reported.

In my actual application the second mutex is a pthread rw mutex and when locking it I get:

Helgrind: ../../helgrind/hg_main.c:307 (lockN_acquire_reader): Assertion 'lk->kind == LK_rdwr' failed.


It seems Helgrind does not detect when a std::mutex is deallocated.

Linux eselivm2v1074l 2.6.32-504.3.3.el6.x86_64 #1 SMP Fri Dec 12 16:05:43 EST 2014 x86_64 x86_64 x86_64 GNU/Linux
Comment 1 Gregory Nilsson 2017-05-08 09:19:53 UTC
Created attachment 105388 [details]
log
Comment 2 Ivo Raisr 2017-05-08 11:15:02 UTC
I get no violation when the attached example is compiled with gcc 6.3 and run on my Ubuntu 16.10:

valgrind --tool=helgrind --hg-sanity-flags=011111 -q ./mutex
m1=0xffefffc00, m2=0xffefffc28
m2=0xffefffc00, m1=0xffefffc28

What is your Valgrind version and compiler version?

Could you also try your example with DRD instead of Helgrind?
Comment 3 Gregory Nilsson 2017-05-08 13:35:55 UTC
I'm running gcc 4.9.2 and valgrind 3.13 SVN, tried 3.12 as well with the same result. No faults are reported when running DRD (attaching log).

I compile my test like this (seen in log.txt):
gcc -g -std=c++11 main.cc -o test -O0 -lstdc++ -lpthread

I can see from the output of the -v flag to valgrind REDIR printouts only for pthread_mutex_lock and pthread_mutex_unlock (not pthread_mutex_init/destroy). If I remove -lpthread when compiling then helgrind reports no faults but also no REDIRs can be seen for pthread_mutex, not sure if that is a hint to why you saw no error?

I've tried to set SHOW_EVENTS to 1 in hg_main.cc and still cannot see any calls to evh__HG_PTHREAD_MUTEX_DESTROY_PRE. This function would call map_locks_delete() 
which removes the mutex from the internal lock map of helgrind, but since it is not called then the old mutex will be found when a new mutex is allocated in the same place.

Does helgrind detect std::mutex going out of scope on your system?
Comment 4 Gregory Nilsson 2017-05-08 13:36:15 UTC
Created attachment 105390 [details]
drd
Comment 5 Ivo Raisr 2017-05-09 05:29:31 UTC
Yes, compilation options make a huge difference.
I can reproduce the problem now.

DRD with '--trace-mutex=yes' reports this:
...
m1=0xffefffc00, m2=0xffefffc28
==23335== [1] mutex_trylock   mutex 0xffefffc00 rc 0 owner 0
==23335== [1] post_mutex_lock mutex 0xffefffc00 rc 0 owner 0
==23335== [1] mutex_trylock   mutex 0xffefffc28 rc 0 owner 0
==23335== [1] post_mutex_lock mutex 0xffefffc28 rc 0 owner 0
==23335== [1] mutex_unlock    mutex 0xffefffc28 rc 1
==23335== [1] mutex_unlock    mutex 0xffefffc00 rc 1
m2=0xffefffc00, m1=0xffefffc28
==23335== [1] mutex_trylock   mutex 0xffefffc28 rc 0 owner 1
==23335== [1] post_mutex_lock mutex 0xffefffc28 rc 0 owner 1
==23335== [1] mutex_trylock   mutex 0xffefffc00 rc 0 owner 1
==23335== [1] post_mutex_lock mutex 0xffefffc00 rc 0 owner 1
==23335== [1] mutex_unlock    mutex 0xffefffc00 rc 1
==23335== [1] mutex_unlock    mutex 0xffefffc28 rc 1

And Helgrind built with "SHOW_EVENTS 1" reports this:
m1=0xffefffc10, m2=0xffefffc38
evh__hg_PTHREAD_MUTEX_LOCK_PRE(ctid=1, mutex=0xFFEFFFC10)
evh__HG_PTHREAD_MUTEX_LOCK_POST(ctid=1, mutex=0xFFEFFFC10)
evh__hg_PTHREAD_MUTEX_LOCK_PRE(ctid=1, mutex=0xFFEFFFC38)
evh__HG_PTHREAD_MUTEX_LOCK_POST(ctid=1, mutex=0xFFEFFFC38)
evh__HG_PTHREAD_MUTEX_UNLOCK_PRE(ctid=1, mutex=0xFFEFFFC38)
evh__hg_PTHREAD_MUTEX_UNLOCK_POST(ctid=1, mutex=0xFFEFFFC38)
evh__HG_PTHREAD_MUTEX_UNLOCK_PRE(ctid=1, mutex=0xFFEFFFC10)
evh__hg_PTHREAD_MUTEX_UNLOCK_POST(ctid=1, mutex=0xFFEFFFC10)
m2=0xffefffc10, m1=0xffefffc38
evh__hg_PTHREAD_MUTEX_LOCK_PRE(ctid=1, mutex=0xFFEFFFC38)
evh__HG_PTHREAD_MUTEX_LOCK_POST(ctid=1, mutex=0xFFEFFFC38)
evh__hg_PTHREAD_MUTEX_LOCK_PRE(ctid=1, mutex=0xFFEFFFC10)
evh__HG_PTHREAD_MUTEX_LOCK_POST(ctid=1, mutex=0xFFEFFFC10)
<error about lock order violation here>
evh__HG_PTHREAD_MUTEX_UNLOCK_PRE(ctid=1, mutex=0xFFEFFFC10)
evh__hg_PTHREAD_MUTEX_UNLOCK_POST(ctid=1, mutex=0xFFEFFFC10)
evh__HG_PTHREAD_MUTEX_UNLOCK_PRE(ctid=1, mutex=0xFFEFFFC38)
evh__hg_PTHREAD_MUTEX_UNLOCK_POST(ctid=1, mutex=0xFFEFFFC38)

So both tools correctly detect mutex lock and unlock.
DRD does not report lock order violation, although it could have.

As regards detecting when mutex is allocated or not. The reproducer unfortunately does not invoke none of pthread_mutex_init() or pthread_mutex_destroy().
If you disassemble the resulting binary, you'll see calls to gthread_mutex_lock() and gthread_mutex_unlock(). However no calls to init or destroy - they are implicit.

At this point I think you need to teach your program to annotate for this.
See Valgrind manual [1] and also these hints [2].

[1] http://valgrind.org/docs/manual/drd-manual.html#drd-manual.C++11
[2] http://valgrind.org/docs/manual/hg-manual.html#hg-manual.effective-use
Comment 6 Gregory Nilsson 2017-05-09 08:11:31 UTC
Created attachment 105409 [details]
test case with mutex wrapper
Comment 7 Gregory Nilsson 2017-05-09 08:12:57 UTC
Created attachment 105411 [details]
log when using wrapper
Comment 8 Gregory Nilsson 2017-05-09 08:13:34 UTC
I've created a wrapper for the std::mutex (updated test code attached),
with VALGRIND_HG_MUTEX_INIT_POST/VALGRIND_HG_MUTEX_DESTROY_PRE in its constructor/destructor.
The lock order violation error is now gone (log attached).

Is this how I should do it if I want helgrind to handle std::mutex correctly?
Comment 9 Paul Floyd 2023-09-12 08:17:59 UTC
There are two ways to initialize a pthread_mutex_t object:
call pthread_mutex_init() or
use the static initializer PTHREAD_MUTEX_INITIALIZER

The function way is "nice" from a Valgrind perspective since we can intercept the function and record the initialization, as well as check that it gets destroyed.

The static initializer is less nice - Valgrind doesn't see anything so can't do any create/destroy validation.

llvm looks like it uses static initialization:

    __libcpp_mutex_t __m_ = _LIBCPP_MUTEX_INITIALIZER;

Not surprising, I expect that it's faster and does the same thing.

libstdc++ code is more complicated and depends on whether __GTHREAD_MUTEX_INIT is defined or not. If it is then static initialization is used. Which is the case for posix systems.

Other than modifying your C++ standard library and writing a wrapper there's no easy fix.
Comment 10 Paul Floyd 2023-09-13 08:37:17 UTC
I still get the same error with CGG 11.2.

Your wrapper looks right to me, as long as the pthread_mutex_t is the very first thing in the maemory layout of the std::mutex. That's the case for GCC libstdc++ and I suspect also for LLVM libc++.