glibc 2.23-pre got a new pthread barrier implementation: https://sourceware.org/ml/libc-alpha/2016-01/msg00338.html It reacts differently on bad usage of a barrier like we do in bar_bad.vgtest to see whether helgrind detects the bad usage. In particular it will hang the bar_bad testcase. Reproducible: Always
Created attachment 96738 [details] pthread_barrier-vs-newer-glibc-implementation These changes make the bar_bad testcase PASS against both the old and new implementation.
The patch to the testscase only works for helgrind, but the test program is also used under drd. It needs some different changes or you will get two failures: drd/tests/bar_bad (stderr) drd/tests/bar_bad_xml (stderr)
The problem fixing this for drd is that DRD_(barrier_destroy) is called after pthread_barrier_destroy. But that call might now hang and then bar_bad invokes a watchdog to exit the program, so the post handler is never called.
Created attachment 96765 [details] pthread_barrier-vs-newer-glibc-implementation-drd-helgrind Simplest option to solve the drd issue is to do like in the helgrind case and have two variants of the exp files. One of which will have pthread_barrier_destroy hang, and so then drd will just not see that it wasn't a barrier. Which isn't ideal, but since it is the last test in the batch it seems it doesn't impact things too much.
Note that patch is missing the drd/test/Makefile.am addition of the two extra exp file: diff --git a/drd/tests/Makefile.am b/drd/tests/Makefile.am index 2885391..cfd74d0 100644 --- a/drd/tests/Makefile.am +++ b/drd/tests/Makefile.am @@ -81,8 +81,10 @@ EXTRA_DIST = \ atomic_var.stderr.exp \ atomic_var.vgtest \ bar_bad.stderr.exp \ + bar_bad.stderr.exp-nohang \ bar_bad.vgtest \ bar_bad_xml.stderr.exp \ + bar_bad_xml.stderr.exp-nohang \ bar_bad_xml.vgtest \ bar_trivial.stderr.exp \ bar_trivial.stdout.exp \
I checked in a workaround for the hang based on the attachement as valgrind svn r15962. This does make sure that the tests don't hang indefenitely. But they do introduce (more) non-determinism that occassionally causes these tests to fail or even trigger an internal drd assert (drd_barrier.c:352 (vgDrd_barrier_pre_wait): Assertion 'p' failed.)
Should we close this now?
(In reply to Julian Seward from comment #7) > Should we close this now? No, I don't think it should. There is now a workaround in place that makes sure the test doesn't hang. But now that test (non-deterministically) fails or even crashes valgrind itself.
Created attachment 102144 [details] Make bar_bad test more deterministic. At some MIPS boards, thread slp2 may end before ext1 ends and that makes the test fail again. One of my colleagues suggested a patch in which the main thread waits for slp2 termination. This makes the test more deterministic.
Created attachment 102247 [details] make bar_bad test more deterministic, attempt two The previous patch will break DRD. Instead of pthread_join(), we can experiment with pthread_cancel().
(In reply to Petar Jovanovic from comment #10) > Created attachment 102247 [details] > make bar_bad test more deterministic, attempt two > > The previous patch will break DRD. Instead of pthread_join(), we can > experiment with pthread_cancel(). This patch, slightly modified, was committed in r16154.
Was fixed in r16154, fix announced in NEWS revision 16165
Sadly I am still seeing the sporadic failures in bar_bad/bar_bad_xml. Reported here: https://bugs.kde.org/show_bug.cgi?id=430321