Bug 327548

Summary:	false positive while destroying mutex
Product:	[Developer tools] valgrind	Reporter:	Samuel Thibault <samuel.thibault>
Component:	helgrind	Assignee:	Julian Seward <jseward>
Status:	RESOLVED FIXED
Severity:	normal	CC:	philippe.waroquiers, pjfloyd, saurabh
Priority:	NOR
Version First Reported In:	3.9.0
Target Milestone:	---
Platform:	unspecified
OS:	Linux
Latest Commit:		Version Fixed In:
Sentry Crash Report:
Attachments:	testcase false positive log

Description Samuel Thibault 2013-11-13 09:33:24 UTC

Hello,

The newly-added my_memcmp in pthread_mutex_destroy brings false positives. See the attached testcase. Helgrind says the read is not synchronized with a previous mutex_unlock write, but it is synchronized, through the condition variable.

Unfortunately this doesn't happen very often, I have to run the testcase for a long time before getting the false positive.

Samuel

Comment 1 Samuel Thibault 2013-11-13 09:35:34 UTC

Created attachment 83543 [details]
testcase

Comment 2 Samuel Thibault 2013-11-13 09:47:55 UTC

Created attachment 83544 [details]
false positive log

Comment 3 Philippe Waroquiers 2013-11-13 22:09:20 UTC

The same bug was discussed on valusers see
http://sourceforge.net/mailarchive/message.php?msg_id=31611446

A patch was proposed to solve it (just disabling/enabling checking around the my_memcmp
call).
However, it is not clear to me if that is the correct fix to do.
I see in the default suppression files that there are already a bunch of suppressions
entries for helgrind, matching only one function.
So maybe we better add an additional supp entry there.

Comment 4 Paul Floyd 2022-12-27 17:53:55 UTC

I don't get an error on FreeBSD. I'll check if the problem still exists on Linux.

Comment 5 Paul Floyd 2022-12-30 19:49:29 UTC

I ran this for a couple of hours without any errors. It could be related to the hardware.

Comment 6 Paul Floyd 2022-12-31 09:33:09 UTC

Reopen this if anyone else can reproduces the problem.

Comment 7 Samuel Thibault 2023-01-03 01:08:25 UTC

To provide various datapoints, I have tried it 

valgrind --tool=helgrind ./test

on:

- Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
- Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
- Intel(R) Xeon(R) CPU E5-2650L v4 @ 1.70GHz
- AMD EPYC 7352 24-Core Processor
- AMD EPYC 7502 32-Core Processor

with glibc 2.35

I got the hit

==420467== Helgrind, a thread error detector
==420467== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==420467== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==420467== Command: ./test
==420467== 
==420467== ---Thread-Announcement------------------------------------------
==420467== 
==420467== Thread #1 is the program's root thread
==420467== 
==420467== ---Thread-Announcement------------------------------------------
==420467== 
==420467== Thread #2 was created
==420467==    at 0x49792FF: clone (clone.S:76)
==420467==    by 0x497A146: __clone_internal (clone-internal.c:83)
==420467==    by 0x48F6484: create_thread (pthread_create.c:295)
==420467==    by 0x48F6F78: pthread_create@@GLIBC_2.34 (pthread_create.c:828)
==420467==    by 0x484E5D7: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==    by 0x1092DA: main (in /net/inria/home/sthibault/test)
==420467== 
==420467== ----------------------------------------------------------------
==420467== 
==420467== Possible data race during read of size 1 at 0x10C128 by thread #1
==420467== Locks held: none
==420467==    at 0x484B225: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==    by 0x484B4CA: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==    by 0x10937C: main (in /net/inria/home/sthibault/test)
==420467== 
==420467== This conflicts with a previous write of size 4 by thread #2
==420467== Locks held: none
==420467==    at 0x48FB4D8: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62)
==420467==    by 0x484BCE8: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==    by 0x10929A: f (in /net/inria/home/sthibault/test)
==420467==    by 0x484E7D6: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==    by 0x48F6849: start_thread (pthread_create.c:442)
==420467==    by 0x497930F: clone (clone.S:100)
==420467==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==420467== 

immediately on almost all of them, only AMD EPYC 7352 24-Core Processor took 1m to reproduce.

On an AMD Opteron(tm) Processor 6174 with glibc 2.28 I couldn't reproduce it within an hour.

On a Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, I tried with various glibcs: 2.19, 2.24, 2.28, 2.31, 2.36, with all versions it happens within half an hour, sometimes within a minute.

On an arm64 with glibc 2.36 it appeared immediately

On a mips64el Cavium Octeon III V0.2  FPU V0.0 with glibc 2.36 it appeared immediately

On a ppc64el POWER8 with glibc 2.36 it appeared within a minute.

On an s390x it appeared in 8 minutes.

Comment 8 Samuel Thibault 2023-01-03 01:09:13 UTC

(with the various versions of valgrind between 3.9 and 3.19)

Comment 9 Paul Floyd 2023-01-03 11:09:48 UTC

I can reproduce this on a RHEL 7.6 machine.

Comment 10 Paul Floyd 2023-01-03 14:03:56 UTC

Can you try this patch please?

diff --git a/helgrind/hg_intercepts.c b/helgrind/hg_intercepts.c
index 5a83996..8c98e4e 100644
--- a/helgrind/hg_intercepts.c
+++ b/helgrind/hg_intercepts.c
@@ -879,7 +879,9 @@ static int mutex_destroy_WRK(pthread_mutex_t *mutex)
 
    if (mutex != NULL) {
       static const pthread_mutex_t mutex_init = PTHREAD_MUTEX_INITIALIZER;
+      VALGRIND_HG_DISABLE_CHECKING(mutex, sizeof(*mutex));
       mutex_is_init = my_memcmp(mutex, &mutex_init, sizeof(*mutex)) == 0;
+      VALGRIND_HG_ENABLE_CHECKING(mutex, sizeof(*mutex));
    } else {
       mutex_is_init = 0;
    }
@@ -1785,7 +1787,9 @@ static int pthread_cond_destroy_WRK(pthread_cond_t* cond)
 
    if (cond != NULL) {
       const pthread_cond_t cond_init = PTHREAD_COND_INITIALIZER;
+      VALGRIND_HG_DISABLE_CHECKING(cond, sizeof(*cond));
       cond_is_init = my_memcmp(cond, &cond_init, sizeof(*cond)) == 0;
+      VALGRIND_HG_ENABLE_CHECKING(cond, sizeof(*cond));
    } else {
      cond_is_init = 0;
    }

Comment 11 Paul Floyd 2023-01-03 14:31:52 UTC

This is the same as what Philippe suggested in the e-mail thread, but with similar protection for cond vars.

Comment 12 Samuel Thibault 2023-01-03 16:26:52 UTC

Yes that avoids that exact issue, but then a flurry of new ones pop up, I was not getting them before:

==2511392== Possible data race during read of size 4 at 0x10C128 by thread #2
==2511392== Locks held: none
==2511392==    at 0x48F31A2: pthread_mutex_lock@@GLIBC_2.2.5 (pthread_mutex_lock.c:94)
==2511392==    by 0x48487F8: mutex_lock_WRK (hg_intercepts.c:937)
==2511392==    by 0x109272: f (in /net/inria/home/sthibault/test)
==2511392==    by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==    by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==    by 0x496F8CF: clone (clone.S:100)
==2511392== 
==2511392== This conflicts with a previous write of size 4 by thread #1
==2511392== Locks held: none
==2511392==    at 0x48F4A47: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62)
==2511392==    by 0x48EF26D: __pthread_cond_wait_common (pthread_cond_wait.c:419)
==2511392==    by 0x48EF26D: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:618)
==2511392==    by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==    by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==2511392== 
==2511392== ----------------------------------------------------------------
==2511392== 
==2511392== Possible data race during write of size 4 at 0x10C128 by thread #2
==2511392== Locks held: none
==2511392==    at 0x48F31B9: pthread_mutex_lock@@GLIBC_2.2.5 (pthread_mutex_lock.c:182)
==2511392==    by 0x48487F8: mutex_lock_WRK (hg_intercepts.c:937)
==2511392==    by 0x109272: f (in /net/inria/home/sthibault/test)
==2511392==    by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==    by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==    by 0x496F8CF: clone (clone.S:100)
==2511392== 
==2511392== This conflicts with a previous write of size 4 by thread #1
==2511392== Locks held: none
==2511392==    at 0x48F4A47: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62)
==2511392==    by 0x48EF26D: __pthread_cond_wait_common (pthread_cond_wait.c:419)
==2511392==    by 0x48EF26D: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:618)
==2511392==    by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==    by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==2511392== 
==2511392== ----------------------------------------------------------------
==2511392== 
==2511392==  Lock at 0x10C120 was first observed
==2511392==    at 0x484CFE9: pthread_mutex_init (hg_intercepts.c:818)
==2511392==    by 0x1092EE: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c120 is 0 bytes inside data symbol "mutex"
==2511392== 
==2511392== Possible data race during write of size 8 at 0x10C0E8 by thread #2
==2511392== Locks held: 1, at address 0x10C120
==2511392==    at 0x48EEE29: __atomic_wide_counter_add_relaxed (atomic_wide_counter.h:57)
==2511392==    by 0x48EEE29: __condvar_add_g1_start_relaxed (pthread_cond_common.c:52)
==2511392==    by 0x48EEE29: __condvar_quiesce_and_switch_g1 (pthread_cond_common.c:294)
==2511392==    by 0x48EEE29: pthread_cond_signal@@GLIBC_2.3.2 (pthread_cond_signal.c:77)
==2511392==    by 0x4848FB8: pthread_cond_signal_WRK (hg_intercepts.c:1570)
==2511392==    by 0x10928B: f (in /net/inria/home/sthibault/test)
==2511392==    by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==    by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==    by 0x496F8CF: clone (clone.S:100)
==2511392== 
==2511392== This conflicts with a previous read of size 8 by thread #1
==2511392== Locks held: none
==2511392==    at 0x48EF39E: __atomic_wide_counter_load_relaxed (atomic_wide_counter.h:30)
==2511392==    by 0x48EF39E: __condvar_load_g1_start_relaxed (pthread_cond_common.c:46)
==2511392==    by 0x48EF39E: __pthread_cond_wait_common (pthread_cond_wait.c:486)
==2511392==    by 0x48EF39E: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:618)
==2511392==    by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==    by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c0e8 is 8 bytes inside data symbol "cond"
==2511392== 
==2511392== ----------------------------------------------------------------
==2511392== 
==2511392== Possible data race during write of size 4 at 0x10C128 by thread #2
==2511392== Locks held: none
==2511392==    at 0x48F4A47: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62)
==2511392==    by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184)
==2511392==    by 0x10929A: f (in /net/inria/home/sthibault/test)
==2511392==    by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==    by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==    by 0x496F8CF: clone (clone.S:100)
==2511392== 
==2511392== This conflicts with a previous write of size 4 by thread #1
==2511392== Locks held: none
==2511392==    at 0x48F4A47: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62)
==2511392==    by 0x48EF26D: __pthread_cond_wait_common (pthread_cond_wait.c:419)
==2511392==    by 0x48EF26D: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:618)
==2511392==    by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==    by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==2511392== 
==2511392== ----------------------------------------------------------------
==2511392== 
==2511392==  Lock at 0x10C120 was first observed
==2511392==    at 0x484CFE9: pthread_mutex_init (hg_intercepts.c:818)
==2511392==    by 0x1092EE: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c120 is 0 bytes inside data symbol "mutex"
==2511392== 
==2511392== Possible data race during read of size 8 at 0x10C0E8 by thread #1
==2511392== Locks held: none
==2511392==    at 0x48EF2C1: __atomic_wide_counter_load_relaxed (atomic_wide_counter.h:30)
==2511392==    by 0x48EF2C1: __condvar_load_g1_start_relaxed (pthread_cond_common.c:46)
==2511392==    by 0x48EF2C1: __pthread_cond_wait_common (pthread_cond_wait.c:539)
==2511392==    by 0x48EF2C1: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:618)
==2511392==    by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==    by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392== 
==2511392== This conflicts with a previous write of size 8 by thread #2
==2511392== Locks held: 1, at address 0x10C120
==2511392==    at 0x48EEE29: __atomic_wide_counter_add_relaxed (atomic_wide_counter.h:57)
==2511392==    by 0x48EEE29: __condvar_add_g1_start_relaxed (pthread_cond_common.c:52)
==2511392==    by 0x48EEE29: __condvar_quiesce_and_switch_g1 (pthread_cond_common.c:294)
==2511392==    by 0x48EEE29: pthread_cond_signal@@GLIBC_2.3.2 (pthread_cond_signal.c:77)
==2511392==    by 0x4848FB8: pthread_cond_signal_WRK (hg_intercepts.c:1570)
==2511392==    by 0x10928B: f (in /net/inria/home/sthibault/test)
==2511392==    by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==    by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==    by 0x496F8CF: clone (clone.S:100)
==2511392==  Address 0x10c0e8 is 8 bytes inside data symbol "cond"
==2511392== 
==2511392== ----------------------------------------------------------------
==2511392== 
==2511392== Possible data race during read of size 4 at 0x10C128 by thread #1
==2511392== Locks held: none
==2511392==    at 0x48F2472: __pthread_mutex_cond_lock (pthread_mutex_lock.c:94)
==2511392==    by 0x48EF2EF: __pthread_cond_wait_common (pthread_cond_wait.c:607)
==2511392==    by 0x48EF2EF: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:618)
==2511392==    by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==    by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392== 
==2511392== This conflicts with a previous write of size 4 by thread #2
==2511392== Locks held: none
==2511392==    at 0x48F4A47: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62)
==2511392==    by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184)
==2511392==    by 0x10929A: f (in /net/inria/home/sthibault/test)
==2511392==    by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==    by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==    by 0x496F8CF: clone (clone.S:100)
==2511392==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==2511392==
==2511392== ----------------------------------------------------------------
==2511392==
==2511392== Possible data race during write of size 4 at 0x10C128 by thread #1
==2511392== Locks held: none
==2511392==    at 0x48F2485: __pthread_mutex_cond_lock (pthread_mutex_lock.c:182)
==2511392==    by 0x48EF2EF: __pthread_cond_wait_common (pthread_cond_wait.c:607)
==2511392==    by 0x48EF2EF: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:618)
==2511392==    by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==    by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392==
==2511392== This conflicts with a previous write of size 4 by thread #2
==2511392== Locks held: none
==2511392==    at 0x48F4A47: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62)
==2511392==    by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184)
==2511392==    by 0x10929A: f (in /net/inria/home/sthibault/test)
==2511392==    by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==    by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==    by 0x496F8CF: clone (clone.S:100)
==2511392==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==2511392==
==2511392== ----------------------------------------------------------------
==2511392==
==2511392== Possible data race during write of size 4 at 0x10C128 by thread #1
==2511392== Locks held: none
==2511392==    at 0x48F4A47: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62)
==2511392==    by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184)
==2511392==    by 0x109354: main (in /net/inria/home/sthibault/test)
==2511392==
==2511392== This conflicts with a previous write of size 4 by thread #2
==2511392== Locks held: none
==2511392==    at 0x48F4A47: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62)
==2511392==    by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184)
==2511392==    by 0x10929A: f (in /net/inria/home/sthibault/test)
==2511392==    by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==    by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==    by 0x496F8CF: clone (clone.S:100)
==2511392==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==2511392==
==2511392== ----------------------------------------------------------------
==2511392==
==2511392== Possible data race during read of size 4 at 0x10C12C by thread #1
==2511392== Locks held: none
==2511392==    at 0x48F4A52: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:65)
==2511392==    by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184)
==2511392==    by 0x109354: main (in /net/inria/home/sthibault/test)
==2511392==
==2511392== This conflicts with a previous write of size 4 by thread #2
==2511392== Locks held: none
==2511392==    at 0x48F4A52: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:65)
==2511392==    by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184)
==2511392==    by 0x10929A: f (in /net/inria/home/sthibault/test)
==2511392==    by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==    by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==    by 0x496F8CF: clone (clone.S:100)
==2511392==  Address 0x10c12c is 12 bytes inside data symbol "mutex"
==2511392==
==2511392== ----------------------------------------------------------------
==2511392==
==2511392== Possible data race during write of size 4 at 0x10C12C by thread #1
==2511392== Locks held: none
==2511392==    at 0x48F4A52: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:65)
==2511392==    by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184)
==2511392==    by 0x109354: main (in /net/inria/home/sthibault/test)
==2511392==
==2511392== This conflicts with a previous write of size 4 by thread #2
==2511392== Locks held: none
==2511392==    at 0x48F4A52: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:65)
==2511392==    by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184)
==2511392==    by 0x10929A: f (in /net/inria/home/sthibault/test)
==2511392==    by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==    by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==    by 0x496F8CF: clone (clone.S:100)
==2511392==  Address 0x10c12c is 12 bytes inside data symbol "mutex"
==2511392==

Comment 13 Paul Floyd 2023-01-03 16:30:13 UTC

Looking at the glibc source

pthread_mutex_destroy is fairly simple. It does an atomic read of the kind and a plain read of nusers and returns EBUSY if nusers is not 0.

It then sets kind to an invalid value, -1.

pthread_mutex_unlock is a lot more complicated. It also does atomic reads of the kind, decrements nusers and checks the lock then does a cas on the lock.

If I run with --default-suppressions=no then there huge numbers of hazards, but none related to pthread_mutex_destroy. I think that is because Helgrind sees the lock order is such that pthread_mutex_destroy is called in a serial section.

Comment 14 Samuel Thibault 2023-01-03 16:35:43 UTC

FYI, it's glibc 2.36:

- pthread_mutex_lock.c:94 is when it checks assert (mutex->__data.__owner == 0); after obtaining the lock
- pthread_mutex_lock.c:182 is when it sets mutex->__data.__owner after obtaining the lock
- pthread_mutex_unlock.c:62 is when it clears mutex->__data.__owner before releasing the lock
- pthread_mutex_unlock.c:65 is when it --mutex->__data.__nusers; before releasing the lock

The other cond-related lines are more complex to describe, but they'll probably get fixed the same way as mutex would get fixed.

I guess the disabling of checking somehow perturbates helgrind's history record (I guess that's part of the "This puts it in the same state as new memory allocated by this thread -- that is, basically owned exclusively by this thread." comment for the VALGRIND_HG_ENABLE_CHECKING macro)

Comment 15 Paul Floyd 2023-01-03 16:39:31 UTC

(In reply to Samuel Thibault from comment #12)
> Yes that avoids that exact issue, but then a flurry of new ones pop up, I
> was not getting them before:
> 
> ==2511392== Possible data race during read of size 4 at 0x10C128 by thread #2
> ==2511392== Locks held: none
> ==2511392==    at 0x48F31A2: pthread_mutex_lock@@GLIBC_2.2.5
> (pthread_mutex_lock.c:94)

[snip]

I would expect these to be covered by the default suppressions:
{
   helgrind-glibc2X-004
   Helgrind:Race
   obj:*/lib*/libc-2.*so*
}

and

{
   helgrind-glibc2X-005
   Helgrind:Race
   obj:*/lib*/libpthread-2.*so*
}

which are very broad and cover most of the pthread internals.

I'll try to see if I can test on 2.36

Comment 16 Samuel Thibault 2023-01-03 16:48:20 UTC

> I would expect these to be covered by the default suppressions:
> {
>    helgrind-glibc2X-004
>    Helgrind:Race
>    obj:*/lib*/libc-2.*so*
> }

Ah, but glibc renamed libc-2.*so* into just libc.so.6, so that suppression won't take effect. Duplicating the entry with libc.so.* does indeed seem to avoid the warnings. I'll try to test on more machines and archs.

Comment 17 Samuel Thibault 2023-01-03 16:50:30 UTC

Ah, configure.ac already detects which case that is. So we need a valgrind built against the proper glibc, that's why I hadn't the proper name.

Comment 18 Samuel Thibault 2023-01-03 19:53:57 UTC

I have run the reproducer on all the various systems previously mentioned, with no issue so far, so it seems we're good with that fix.

Comment 19 Samuel Thibault 2023-01-03 19:54:17 UTC

(ran it for an hour)

Comment 20 Paul Floyd 2023-01-03 20:30:28 UTC

Now fixed

commit 74e20007467b930fb14264519f74a45179b5f740 (HEAD -> master, origin/master, origin/HEAD)
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Tue Jan 3 21:28:42 2023 +0100

    Bug 327548 - false positive while destroying mutex