| Summary: | Spurious lock not held error from inside pthread_cond_timedwait | ||
|---|---|---|---|
| Product: | [Developer tools] valgrind | Reporter: | faminebadger |
| Component: | helgrind | Assignee: | Julian Seward <jseward> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | drahflow, philippe.waroquiers, pjfloyd |
| Priority: | NOR | ||
| Version First Reported In: | 3.13.0 | ||
| Target Milestone: | --- | ||
| Platform: | Gentoo Packages | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
At least simple usage of pthread_cond_wait and lock is verified by the regression tests. Do you have a small compilable reproducer ? > Do you have a small compilable reproducer ? I tried to make a cut-down version, but then it didn't reproduce. Such is the nature of debugging races I guess. However, it's quite clear from the glibc source code that pthread_cond_wait_common (the common implementation for both pthread_cond_wait and pthread_cond_timedwait) can call pthread_cond_signal after releasing the lock by calling __condvar_cancel_waiting, which contains: > __condvar_release_lock (cond, private); > > if (consumed_signal) > { > /* We effectively consumed a signal even though we didn't want to. > Therefore, we need to send a replacement signal. > If we would want to optimize this, we could do what > pthread_cond_signal does right in the critical section above. */ > __pthread_cond_signal (cond); > } >} It even implicitly says it's outside of the critical section. This suggests some interaction between waiting and cancelling that wait via a signal or interrupt. Either way though, I don't think this should be flagged as an error by valgrind, given it's purely internal to glibc, and is presumably working as intended. I have a reproducer for this warning, which also suggests is spurious (given the source of the example): Taking the example from https://en.cppreference.com/w/cpp/thread/condition_variable: #include <iostream> #include <string> #include <thread> #include <mutex> #include <condition_variable> std::mutex m; std::condition_variable cv; std::string data; bool ready = false; bool processed = false; void worker_thread() { // Wait until main() sends data std::unique_lock lk(m); cv.wait(lk, []{return ready;}); // after the wait, we own the lock. std::cout << "Worker thread is processing data\n"; data += " after processing"; // Send data back to main() processed = true; std::cout << "Worker thread signals data processing completed\n"; // Manual unlocking is done before notifying, to avoid waking up // the waiting thread only to block again (see notify_one for details) lk.unlock(); cv.notify_one(); } int main() { std::thread worker(worker_thread); data = "Example data"; // send data to the worker thread { std::lock_guard lk(m); ready = true; std::cout << "main() signals data ready for processing\n"; } cv.notify_one(); // wait for the worker { std::unique_lock lk(m); cv.wait(lk, []{return processed;}); } std::cout << "Back in main(), data = " << data << '\n'; worker.join(); } ... compiling and running via % g++-11 -O4 -ggdb test.c++ -lpthread % valgrind --error-exitcode=1 --exit-on-first-error=yes --tool=helgrind ./a.out ... gives: [...] ==30543== ---Thread-Announcement------------------------------------------ ==30543== ==30543== Thread #1 is the program's root thread ==30543== ==30543== ---------------------------------------------------------------- ==30543== ==30543== Thread #1: pthread_cond_{signal,broadcast}: dubious: associated lock is not held by any thread ==30543== at 0x4847EF6: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==30543== by 0x10A241: main (test2.c++:44) This is rather annoying, especially as it is a bog standard cppreference example and the sort of thing that is likely to occur in any C++11 or later threaded code. This message is generated from static void evh__HG_PTHREAD_COND_SIGNAL_PRE ( ThreadId tid, void* cond ) Which looks like the wrapper function called before a pthread_cond_signal And there is the comment // Hmm. POSIX doesn't actually say that it's an error to call // pthread_cond_signal with the associated mutex being unlocked. // Although it does say that it should be "if consistent scheduling // is desired." For that reason, print "dubious" if the lock isn't // held by any thread. Skip the "dubious" if it is held by some // other thread; that sounds straight-out wrong. And the corresponding doc. https://pubs.opengroup.org/onlinepubs/9699919799/ I'm not sure that we want to change this. I see two other possibilities - user suppression - add a command line option, something like --check_cond_signal_mutex Also TSan does not compain about this g++ -g -o 392331 392331.cpp -pthread -fsanitize=thread ./392331 main() signals data ready for processing Worker thread is processing data Worker thread signals data processing completed Back in main(), data = Example data after processing Regarding user suppression: It would already be very helpful if it could get its own category and would not be reported as Helgrind:Misc
Currently, I'm having a suppression which is
{
<this bug URL>
Helgrind:Misc
...
fun:main
}
(no chance in being more specific due to a template function being inlined in a lot of places).
So any other Helgrind:Misc problem are now silenced as well.
The commit below makes a separate category for this kind of error "Dubious". Please reopen if you also need a command line option. commit 7d0389956e5e6ff182fdf4c2a3a9f8a202a927d4 (HEAD -> master, origin/master, origin/HEAD) Author: Paul Floyd <pjfloyd@wanadoo.fr> Date: Sun Dec 25 21:43:36 2022 +0100 Bug 392331 - Spurious lock not held error from inside pthread_cond_timedwait Added a "Dubious" error category to cover this kind of error. |
I'm getting the following error from my program: ==18784== Thread #4: pthread_cond_{signal,broadcast}: dubious: associated lock is not held by any thread ==18784== at 0x4C335B1: pthread_cond_signal_WRK (hg_intercepts.c:1384) ==18784== by 0x5CFBC3D: pthread_cond_timedwait@@GLIBC_2.3.2 (in /lib64/libpthread-2.26.so) ==18784== by 0x4C36F54: pthread_cond_timedwait_WRK.constprop.1 (hg_intercepts.c:1295) ==18784== by 0x110F6C: semaphore_wait (semaphores.c:113) ==18784== by 0x10C983: exchange_thread (exchange_thread.c:337) ==18784== by 0x4C316A6: mythread_wrapper (hg_intercepts.c:389) ==18784== by 0x5CF48E9: start_thread (in /lib64/libpthread-2.26.so) ==18784== by 0x600F61E: clone (in /lib64/libc-2.26.so) However, clearly my semaphore_wait function is locking the mutex: 111: pthread_mutex_lock( &sem->mx ); 112: if( !sem->data ) 113: pthread_cond_wait( &sem->cd, &sem->mx ); Note, I've verified pthread_mutex_lock is returning success (which was the first thought I had). The second thing I noticed is that this is a warning about pthread_cond_*SIGNAL* - which is the opposite side of the condition - I'm doing a wait here. It therefore looks like glibc is internally calling signal from wait, and valgrind is reporting this as an error.