Summary: | helgrind/drd bar_bad testcase hangs or crashes with new glibc pthread barrier implementation | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Mark Wielaard <mark> |
Component: | helgrind | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | alex.kanavin, ivosh, mips32r2, philippe.waroquiers |
Priority: | NOR | ||
Version: | 3.11 SVN | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Attachments: |
pthread_barrier-vs-newer-glibc-implementation
pthread_barrier-vs-newer-glibc-implementation-drd-helgrind Make bar_bad test more deterministic. make bar_bad test more deterministic, attempt two |
Description
Mark Wielaard
2016-01-19 15:13:20 UTC
Created attachment 96738 [details]
pthread_barrier-vs-newer-glibc-implementation
These changes make the bar_bad testcase PASS against both the old and new implementation.
The patch to the testscase only works for helgrind, but the test program is also used under drd. It needs some different changes or you will get two failures: drd/tests/bar_bad (stderr) drd/tests/bar_bad_xml (stderr) The problem fixing this for drd is that DRD_(barrier_destroy) is called after pthread_barrier_destroy. But that call might now hang and then bar_bad invokes a watchdog to exit the program, so the post handler is never called. Created attachment 96765 [details]
pthread_barrier-vs-newer-glibc-implementation-drd-helgrind
Simplest option to solve the drd issue is to do like in the helgrind case and have two variants of the exp files. One of which will have pthread_barrier_destroy hang, and so then drd will just not see that it wasn't a barrier. Which isn't ideal, but since it is the last test in the batch it seems it doesn't impact things too much.
Note that patch is missing the drd/test/Makefile.am addition of the two extra exp file: diff --git a/drd/tests/Makefile.am b/drd/tests/Makefile.am index 2885391..cfd74d0 100644 --- a/drd/tests/Makefile.am +++ b/drd/tests/Makefile.am @@ -81,8 +81,10 @@ EXTRA_DIST = \ atomic_var.stderr.exp \ atomic_var.vgtest \ bar_bad.stderr.exp \ + bar_bad.stderr.exp-nohang \ bar_bad.vgtest \ bar_bad_xml.stderr.exp \ + bar_bad_xml.stderr.exp-nohang \ bar_bad_xml.vgtest \ bar_trivial.stderr.exp \ bar_trivial.stdout.exp \ I checked in a workaround for the hang based on the attachement as valgrind svn r15962. This does make sure that the tests don't hang indefenitely. But they do introduce (more) non-determinism that occassionally causes these tests to fail or even trigger an internal drd assert (drd_barrier.c:352 (vgDrd_barrier_pre_wait): Assertion 'p' failed.) Should we close this now? (In reply to Julian Seward from comment #7) > Should we close this now? No, I don't think it should. There is now a workaround in place that makes sure the test doesn't hang. But now that test (non-deterministically) fails or even crashes valgrind itself. Created attachment 102144 [details]
Make bar_bad test more deterministic.
At some MIPS boards, thread slp2 may end before ext1 ends and that makes the test fail again. One of my colleagues suggested a patch in which the main thread waits for slp2 termination. This makes the test more deterministic.
Created attachment 102247 [details]
make bar_bad test more deterministic, attempt two
The previous patch will break DRD. Instead of pthread_join(), we can experiment with pthread_cancel().
(In reply to Petar Jovanovic from comment #10) > Created attachment 102247 [details] > make bar_bad test more deterministic, attempt two > > The previous patch will break DRD. Instead of pthread_join(), we can > experiment with pthread_cancel(). This patch, slightly modified, was committed in r16154. Was fixed in r16154, fix announced in NEWS revision 16165 Sadly I am still seeing the sporadic failures in bar_bad/bar_bad_xml. Reported here: https://bugs.kde.org/show_bug.cgi?id=430321 |