Bug 451327 - Potential false positive when posting to stack allocated semaphore
Summary: Potential false positive when posting to stack allocated semaphore
Status: REOPENED
Alias: None
Product: valgrind
Classification: Developer tools
Component: helgrind (other bugs)
Version First Reported In: 3.18.1
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-09 18:08 UTC by Godmar Back
Modified: 2022-04-28 15:37 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
semtest.c that shows the potential false positive (957 bytes, text/x-csrc)
2022-03-09 18:08 UTC, Godmar Back
Details
test program that shows spurious warning even when threads are joined (1.60 KB, text/x-csrc)
2022-04-28 14:32 UTC, Godmar Back
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Godmar Back 2022-03-09 18:08:20 UTC
Created attachment 147408 [details]
semtest.c that shows the potential false positive

SUMMARY

Helgrind 3.18.1 appears to flag what I believe to be a false positive when a stack-allocated semaphore is signaled to and the stack space on which it is allocated is then reused.


STEPS TO REPRODUCE
1. Compile the attached program with gcc -g -pthread semtest.c -o semtest (gcc 9.4.0 on Ubuntu 20.04 with GNU libc 2.2.5 I believe)
2. Run it with valgrind --tool=helgrind ./semtest


OBSERVED RESULT

==2965601== Helgrind, a thread error detector
==2965601== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==2965601== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==2965601== Command: ./semtest
==2965601== 
==2965601== ---Thread-Announcement------------------------------------------
==2965601== 
==2965601== Thread #1 is the program's root thread
==2965601== 
==2965601== ---Thread-Announcement------------------------------------------
==2965601== 
==2965601== Thread #2 was created
==2965601==    at 0x49BC152: clone (clone.S:71)
==2965601==    by 0x48812EB: create_thread (createthread.c:101)
==2965601==    by 0x4882E0F: pthread_create@@GLIBC_2.2.5 (pthread_create.c:817)
==2965601==    by 0x4846FBA: pthread_create_WRK (hg_intercepts.c:445)
==2965601==    by 0x48480BD: pthread_create@* (hg_intercepts.c:478)
==2965601==    by 0x1092C9: use_stack_allocated_semaphore (in /home/gback/cs3214/dutchblitz.sem/tests/semtest)
==2965601==    by 0x10931C: main (in /home/gback/cs3214/dutchblitz.sem/tests/semtest)
==2965601== 
==2965601== ----------------------------------------------------------------
==2965601== 
==2965601== Possible data race during write of size 8 at 0x1FFEFFFDE0 by thread #1
==2965601== Locks held: none
==2965601==    at 0x484BCD9: memset (vg_replace_strmem.c:1358)
==2965601==    by 0x10923F: touch_the_stack (in /home/gback/cs3214/dutchblitz.sem/tests/semtest)
==2965601==    by 0x109317: main (in /home/gback/cs3214/dutchblitz.sem/tests/semtest)
==2965601== 
==2965601== This conflicts with a previous read of size 8 by thread #2
==2965601== Locks held: none
==2965601==    at 0x488CA37: sem_post@@GLIBC_2.2.5 (sem_post.c:44)
==2965601==    by 0x4847ED1: sem_post_WRK (hg_intercepts.c:2993)
==2965601==    by 0x4848AB6: sem_post@* (hg_intercepts.c:3013)
==2965601==    by 0x109272: post_thread (in /home/gback/cs3214/dutchblitz.sem/tests/semtest)
==2965601==    by 0x48471B2: mythread_wrapper (hg_intercepts.c:406)
==2965601==    by 0x4882608: start_thread (pthread_create.c:477)
==2965601==    by 0x49BC162: clone (clone.S:95)
==2965601==  Address 0x1ffefffde0 is on thread #1's stack
==2965601==  in frame #1, created by touch_the_stack (???:)
==2965601== 
==2965601== ----------------------------------------------------------------
==2965601== 
==2965601== Possible data race during write of size 4 at 0x1FFEFFFDE8 by thread #1
==2965601== Locks held: none
==2965601==    at 0x484BCDC: memset (vg_replace_strmem.c:1358)
==2965601==    by 0x10923F: touch_the_stack (in /home/gback/cs3214/dutchblitz.sem/tests/semtest)
==2965601==    by 0x109317: main (in /home/gback/cs3214/dutchblitz.sem/tests/semtest)
==2965601== 
==2965601== This conflicts with a previous read of size 4 by thread #2
==2965601== Locks held: none
==2965601==    at 0x488CA34: sem_post@@GLIBC_2.2.5 (sem_post.c:36)
==2965601==    by 0x4847ED1: sem_post_WRK (hg_intercepts.c:2993)
==2965601==    by 0x4848AB6: sem_post@* (hg_intercepts.c:3013)
==2965601==    by 0x109272: post_thread (in /home/gback/cs3214/dutchblitz.sem/tests/semtest)
==2965601==    by 0x48471B2: mythread_wrapper (hg_intercepts.c:406)
==2965601==    by 0x4882608: start_thread (pthread_create.c:477)
==2965601==    by 0x49BC162: clone (clone.S:95)
==2965601==  Address 0x1ffefffde8 is on thread #1's stack
==2965601==  in frame #1, created by touch_the_stack (???:)
==2965601== 
==2965601== 
==2965601== Use --history-level=approx or =none to gain increased speed, at
==2965601== the cost of reduced accuracy of conflicting-access information
==2965601== For lists of detected and suppressed errors, rerun with: -s
==2965601== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)


EXPECTED RESULT

There should be no errors shown.

SOFTWARE/OS VERSIONS
- Valgrind 3.18.1 (built from source)
- gcc 9.4.0
- Ubuntu 20.04.4


ADDITIONAL INFORMATION

The error is not shown if the semaphore is dynamically allocated.
Comment 1 Paul Floyd 2022-04-28 09:50:07 UTC
I'll see if I can reproduce this with clang.

Does joining the thread change anything? I'm not a threading expert, but I would expect the thread to be created detached or joined.
Comment 2 Paul Floyd 2022-04-28 10:11:42 UTC
Answering my own question. On Linux, if I add pthread_join:

static void
use_stack_allocated_semaphore(void)
{
    sem_t sem;
    sem_init(&sem, 0, 0);
    pthread_t t;
    pthread_create(&t, NULL, post_thread, &sem);
    sem_wait(&sem);
    sem_destroy(&sem);
    pthread_join(t, NULL);
}

then I get no error

==17867== Helgrind, a thread error detector
==17867== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==17867== Using Valgrind-3.20.0.GIT and LibVEX; rerun with -h for copyright info
==17867== Command: ./451327
==17867==
==17867==
==17867== Use --history-level=approx or =none to gain increased speed, at
==17867== the cost of reduced accuracy of conflicting-access information
==17867== For lists of detected and suppressed errors, rerun with: -s
==17867== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Comment 3 Paul Floyd 2022-04-28 13:24:33 UTC
And the manual says

https://valgrind.org/docs/manual/hg-manual.html

 7.5. Hints and Tips for Effective Use of Helgrind

6. Round up all finished threads using pthread_join. Avoid detaching threads: don't create threads in the detached state, and don't call pthread_detach on existing threads.
Comment 4 Godmar Back 2022-04-28 13:49:08 UTC
The actual application in which the false positive occurs uses pthread_join properly. I created a minimum reproducible test case here for you to see the issue. Adding pthread_join introduces additional edges in the happens-before graph, which influences Helgrind's race detection.
I'll prepare a larger example that uses pthread_join.
Comment 5 Godmar Back 2022-04-28 14:32:25 UTC
Created attachment 148434 [details]
test program that shows spurious warning even when threads are joined

I've attached a larger example from the actual code to show the potential issue.
Comment 6 Paul Floyd 2022-04-28 14:37:34 UTC
OK, I've reopened this.

The error is intermittent for me so I assume that there is some race.
Comment 7 Godmar Back 2022-04-28 15:36:34 UTC
It's more reliable if you `NTHREADS` and `n`
Comment 8 Godmar Back 2022-04-28 15:37:48 UTC
(In reply to Godmar Back from comment #7)
> It's more reliable if you `NTHREADS` and `n`

if you *increase*