Bug 430321 - drd/tests/bar_bad and drd/tests/bar_bad_xml are non-deterministic
Summary: drd/tests/bar_bad and drd/tests/bar_bad_xml are non-deterministic
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: drd (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Bart Van Assche
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-12 21:42 UTC by Alexander Kanavin
Modified: 2021-05-03 15:43 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Kanavin 2020-12-12 21:42:00 UTC
When I run the tests under qemu they pass or fail non-deterministically. Note the difference in "destroy a barrier that has waiting threads" section.

Run 1:

root@qemux86-64:/usr/lib/valgrind/ptest# valgrind --tool=drd --fair-sched=yes ./helgrind/tests/bar_bad
==31394== drd, a thread error detector
==31394== Copyright (C) 2006-2017, and GNU GPL'd, by Bart Van Assche.
==31394== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==31394== Command: ./helgrind/tests/bar_bad
==31394== 

initialise a barrier with zero count
==31394== pthread_barrier_init: 'count' argument is zero: barrier 0x4852030
==31394==    at 0x48262D2: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1302)
==31394==    by 0x48262D2: pthread_barrier_init (drd_pthread_intercepts.c:1310)
==31394==    by 0x401273: main (bar_bad.c:43)
==31394== 

initialise a barrier twice
==31394== Barrier reinitialization: barrier 0x4852070
==31394==    at 0x48262D2: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1302)
==31394==    by 0x48262D2: pthread_barrier_init (drd_pthread_intercepts.c:1310)
==31394==    by 0x4012CB: main (bar_bad.c:49)
==31394== barrier 0x4852070 was first observed at:
==31394==    at 0x48262D2: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1302)
==31394==    by 0x48262D2: pthread_barrier_init (drd_pthread_intercepts.c:1310)
==31394==    by 0x4012B5: main (bar_bad.c:48)
==31394== 

initialise a barrier which has threads waiting on it
==31394== Barrier reinitialization: barrier 0x48520b0
==31394==    at 0x48262D2: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1302)
==31394==    by 0x48262D2: pthread_barrier_init (drd_pthread_intercepts.c:1310)
==31394==    by 0x401367: main (bar_bad.c:64)
==31394== barrier 0x48520b0 was first observed at:
==31394==    at 0x48262D2: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1302)
==31394==    by 0x48262D2: pthread_barrier_init (drd_pthread_intercepts.c:1310)
==31394==    by 0x40130D: main (bar_bad.c:55)
==31394== 

destroy a barrier that has waiting threads
==31394== Destruction of a barrier with active waiters: barrier 0x4852350
==31394==    at 0x4826C71: pthread_barrier_destroy_intercept (drd_pthread_intercepts.c:1323)
==31394==    by 0x4826C71: pthread_barrier_destroy (drd_pthread_intercepts.c:1328)
==31394==    by 0x4013F9: main (bar_bad.c:82)
==31394== barrier 0x4852350 was first observed at:
==31394==    at 0x48262D2: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1302)
==31394==    by 0x48262D2: pthread_barrier_init (drd_pthread_intercepts.c:1310)
==31394==    by 0x4013A9: main (bar_bad.c:70)
==31394== 

destroy a barrier that was never initialised
==31394== Not a barrier
==31394==    at 0x4826C71: pthread_barrier_destroy_intercept (drd_pthread_intercepts.c:1323)
==31394==    by 0x4826C71: pthread_barrier_destroy (drd_pthread_intercepts.c:1328)
==31394==    by 0x40148E: main (bar_bad.c:98)
==31394== 
==31394== 
==31394== For lists of detected and suppressed errors, rerun with: -s
==31394== ERROR SUMMARY: 6 errors from 5 contexts (suppressed: 138 from 15)


Run 2:

root@qemux86-64:/usr/lib/valgrind/ptest# valgrind --tool=drd --fair-sched=yes ./helgrind/tests/bar_bad
==31400== drd, a thread error detector
==31400== Copyright (C) 2006-2017, and GNU GPL'd, by Bart Van Assche.
==31400== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==31400== Command: ./helgrind/tests/bar_bad
==31400== 

initialise a barrier with zero count
==31400== pthread_barrier_init: 'count' argument is zero: barrier 0x4852030
==31400==    at 0x48262D2: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1302)
==31400==    by 0x48262D2: pthread_barrier_init (drd_pthread_intercepts.c:1310)
==31400==    by 0x401273: main (bar_bad.c:43)
==31400== 

initialise a barrier twice
==31400== Barrier reinitialization: barrier 0x4852070
==31400==    at 0x48262D2: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1302)
==31400==    by 0x48262D2: pthread_barrier_init (drd_pthread_intercepts.c:1310)
==31400==    by 0x4012CB: main (bar_bad.c:49)
==31400== barrier 0x4852070 was first observed at:
==31400==    at 0x48262D2: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1302)
==31400==    by 0x48262D2: pthread_barrier_init (drd_pthread_intercepts.c:1310)
==31400==    by 0x4012B5: main (bar_bad.c:48)
==31400== 

initialise a barrier which has threads waiting on it
==31400== Barrier reinitialization: barrier 0x48520b0
==31400==    at 0x48262D2: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1302)
==31400==    by 0x48262D2: pthread_barrier_init (drd_pthread_intercepts.c:1310)
==31400==    by 0x401367: main (bar_bad.c:64)
==31400== barrier 0x48520b0 was first observed at:
==31400==    at 0x48262D2: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1302)
==31400==    by 0x48262D2: pthread_barrier_init (drd_pthread_intercepts.c:1310)
==31400==    by 0x40130D: main (bar_bad.c:55)
==31400== 

destroy a barrier that has waiting threads

destroy a barrier that was never initialised
==31400== Not a barrier
==31400==    at 0x4826C71: pthread_barrier_destroy_intercept (drd_pthread_intercepts.c:1323)
==31400==    by 0x4826C71: pthread_barrier_destroy (drd_pthread_intercepts.c:1328)
==31400==    by 0x40148E: main (bar_bad.c:98)
==31400== 
==31400== 
==31400== For lists of detected and suppressed errors, rerun with: -s
==31400== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 138 from 15)


This is observed with valgrind 3.16.1 under qemu x86_64.
I believe this has been also previously reported in https://bugs.kde.org/show_bug.cgi?id=358213 but seemingly never fixed properly.
Comment 1 Yi Fan Yu 2021-05-03 15:43:10 UTC
here is the diff file I collected for the test failure

```
--- bar_bad.stderr.exp	2018-03-09 12:34:56.000000000 +0000
+++ bar_bad.stderr.out	2021-04-23 21:34:56.654000000 +0000
@@ -25,14 +25,11 @@
 
 
 destroy a barrier that has waiting threads
-Destruction of a barrier with active waiters: barrier 0x........
+
+destroy a barrier that was never initialised
+Not a barrier
    at 0x........: pthread_barrier_destroy (drd_pthread_intercepts.c:?)
    by 0x........: main (bar_bad.c:?)
-barrier 0x........ was first observed at:
-   at 0x........: pthread_barrier_init (drd_pthread_intercepts.c:?)
-   by 0x........: main (bar_bad.c:?)
 
 
-destroy a barrier that was never initialised
-
-ERROR SUMMARY: 5 errors from 4 contexts (suppressed: 0 from 0)
+ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
```