Bug 371396

Summary: helgrind and drd pth_cond_destroy_busy testcase hang with new glibc cond var implementation
Product: [Developer tools] valgrind Reporter: Mark Wielaard <mark>
Component: helgrindAssignee: Julian Seward <jseward>
Status: CONFIRMED ---    
Severity: normal CC: ivosh
Priority: NOR    
Version First Reported In: 3.12 SVN   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Mark Wielaard 2016-10-20 23:48:35 UTC
A new conditional variable implementation being tested in fedora rawhide atm will hang the helgrind and drd pth_cond_destroy_busy tests. This is because they test a condition that triggers undefined behaviour. "Attempting to destroy a  condition variable upon which other threads are currently blocked results in undefined behavior." In the new implementation this causes the thread calling pthread_cond_destroy will just hang in this situation.

There are a couple of ways we could work around this.
- Add some timer thread that kills the whole test after some time.
  This is what we did for the bar_bad tests. It causes some non-determinism, extra thread events and you
  have to handle different code paths/warnings/error conditions.
- Add a generic timer/watchdog to vgregtest that kills hanging tests
  This would be nice in general to make sure make regtest does at least finish.
- Skip the test if a newer glibc is detected. This is what I am doing for now.
- Try to call pthread_cond_broadcast in HG_PTHREAD_COND_DESTROY_PRE when we detect that pthread_cond_destroy is called on a variable that other threads are waiting on. That should in theory wake them up making the pthread_cond_destroy defined again.
- Just not call pthread_cond_destroy if we detect that it might hang.

The new implementation is: https://sourceware.org/ml/libc-alpha/2016-06/msg00554.html
It is currently only applied in fedora rawhide glibc 2.24.90, but is expected to eventually hit glibc master.

Reproducible: Always
Comment 1 Mark Wielaard 2016-10-21 00:03:34 UTC
A workaround (skip the test if a newer glibc is detected) has been checked in as valgrind r16097.
Comment 2 Ivo Raisr 2017-05-05 15:31:23 UTC
Mark, are you happy with the workaround provided?
In other words, shall we close this bug?