Bug 444568 - drd/tests/pth_barrier_thr_cr Number of concurrent pthread_barrier_wait() calls exceeds the barrier count
Summary: drd/tests/pth_barrier_thr_cr Number of concurrent pthread_barrier_wait() call...
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: drd (other bugs)
Version First Reported In: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Bart Van Assche
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-28 22:14 UTC by Mark Wielaard
Modified: 2022-12-24 23:38 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2021-10-28 22:14:23 UTC
On both ppc64le and arm64 I am seeing drd/tests/pth_barrier_thr_cr fail:

--- pth_barrier_thr_cr.stderr.exp       2021-10-28 17:52:34.497619030 -0400
+++ pth_barrier_thr_cr.stderr.out       2021-10-28 18:04:55.051452953 -0400
@@ -1,4 +1,15 @@
 
+Thread 15:
+Number of concurrent pthread_barrier_wait() calls exceeds the barrier count: barrier 0x........
+   at 0x........: pthread_barrier_wait (drd_pthread_intercepts.c:?)
+   by 0x........: thread (pth_barrier_thr_cr.c:?)
+   by 0x........: vgDrd_thread_wrapper (drd_pthread_intercepts.c:?)
+   by 0x........: start_thread
+   by 0x........: clone (in /...libc...)
+barrier 0x........ was first observed at:
+   at 0x........: pthread_barrier_init (drd_pthread_intercepts.c:?)
+   by 0x........: main (pth_barrier_thr_cr.c:?)
+
 Done.
 
-ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
+ERROR SUMMARY: 39 errors from 1 contexts (suppressed: 0 from 0)

I have seen this fail in the past on x86_64 too, but cannot replicate it there now.
Comment 1 Paul Floyd 2021-11-19 10:02:58 UTC
Can you reproduce this with --keep-unfiltered, and does the unfiltered version help at all?
Comment 2 Paul Floyd 2022-12-23 21:29:05 UTC
I also see intermittent failures on amd64 fedora 36.

==158787== Thread 55:
==158787== Number of concurrent pthread_barrier_wait() calls exceeds the barrier count: barrier 0x4aa8030
==158787==    at 0x4865883: pthread_barrier_wait_intercept (drd_pthread_intercepts.c:1414)
==158787==    by 0x4865883: pthread_barrier_wait@* (drd_pthread_intercepts.c:1421)
==158787==    by 0x4011F4: thread (pth_barrier_thr_cr.c:21)
==158787==    by 0x48518B4: vgDrd_thread_wrapper (drd_pthread_intercepts.c:491)
==158787==    by 0x492FDEC: start_thread (in /usr/lib64/libc.so.6)
==158787==    by 0x49B4523: clone (in /usr/lib64/libc.so.6)

I don't really see why this has changed. There were changes to libc/libthread back in May/July 2021 when there was a big move frpoom libpthread to libc. We're still intercepting OK.

This wasn't long after landing the FreeBSD. I can't see anything that could be wrong.

With --trace-barriers I see

.==170352== [52] barrier_pre_wait  pthread barrier 0x4aa8030 iteration 25
==170352== [50] barrier_post_wait pthread barrier 0x4aa8030 iteration 24
.==170352== [53] barrier_pre_wait  pthread barrier 0x4aa8030 iteration 25
==170352== [53] barrier_post_wait pthread barrier 0x4aa8030 iteration 25 (serializing)
.==170352== [54] barrier_pre_wait  pthread barrier 0x4aa8030 iteration 26
.==170352== [55] barrier_pre_wait  pthread barrier 0x4aa8030 iteration 26
==170352== [55] barrier_post_wait pthread barrier 0x4aa8030 iteration 25 (serializing)
==170352== Thread 55:
==170352== Number of concurrent pthread_barrier_wait() calls exceeds the barrier count: barrier 0x4aa8030
==170352==    at 0x4865883: pthread_barrier_wait_intercept (drd_pthread_intercepts.c:1414)
==170352==    by 0x4865883: pthread_barrier_wait@* (drd_pthread_intercepts.c:1421)
==170352==    by 0x4011F4: thread (pth_barrier_thr_cr.c:21)
==170352==    by 0x48518B4: vgDrd_thread_wrapper (drd_pthread_intercepts.c:491)
==170352==    by 0x492FDEC: start_thread (pthread_create.c:442)
==170352==    by 0x49B4523: clone (clone.S:100)
==170352== barrier 0x4aa8030 was first observed at:
==170352==    at 0x4864615: pthread_barrier_init_intercept (drd_pthread_intercepts.c:1376)
==170352==    by 0x4864615: pthread_barrier_init@* (drd_pthread_intercepts.c:1384)
==170352==    by 0x401266: main (pth_barrier_thr_cr.c:34)
==170352== 
==170352== [54] barrier_post_wait pthread barrier 0x4aa8030 iteration 26
==170352== [52] barrier_post_wait pthread barrier 0x4aa8030 iteration 26
.==170352== [56] barrier_pre_wait  pthread barrier 0x4aa8030 iteration 27
.==170352== [57] barrier_pre_wait  pthread barrier 0x4aa8030 iteration 27
==170352== [56] barrier_post_wait pthread barrier 0x4aa8030 iteration 27
==170352== [57] barrier_post_wait pthread barrier 0x4aa8030 iteration 27 (serializing)

The testcase creates 100 threads and a barrier with a count of 2. Every time that there are 2 waiting threads the barrier lets 2 though, one at a time. These are the 50 iterations in the traces above.

There are two iteration 25s then an error and then an iteration 27. No iteration 26.
Comment 3 Bart Van Assche 2022-12-24 23:38:08 UTC
Commit 72b556ab15f1 ("drd: Improve barrier support") should fix this bug.