Created attachment 177557 [details] Test case The attached test case creates a POSIX timer that should deliver SIGALRM every 100 microseconds, creates a background thread, blocks SIGALRM in the main thread, and waits for the SIGALRM handler to fire 100 times. This runs fine on bare metal, but on valgrind it hangs forever: $ gcc -g -Wall vgalrm.c -o vgalrm $ ./vgalrm $ valgrind ./vgalrm ==2019420== Memcheck, a memory error detector ==2019420== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al. ==2019420== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info ==2019420== Command: ./vgalrm ==2019420==
Okay here's a more reduced testcase: ``` #include <pthread.h> #include <stdatomic.h> atomic_bool start, stop; void *work(void *ptr) { start = 1; while (!stop); return NULL; } int main(void) { pthread_t thread; pthread_create(&thread, NULL, work, NULL); while (!start); stop = 1; pthread_join(thread, NULL); return 0; } ``` When I run this with valgrind, it hangs. Running GDB on valgrind itself, it seems like the new thread is blocked on acquiring a lock: (gdb) bt #0 0x000000005801e839 in vgMemCheck_helperc_LOADV8 (a=1097769) at /usr/src/debug/valgrind/valgrind-3.24.0/memcheck/mc_main.c:5637 #1 0x0000001002e8d7d9 in ?? () #2 0x0000001002da9e80 in ?? () #3 0x0000000059a98040 in ?? () #4 0x0000000059a98038 in vgPlain_brk_limit () #5 0x0000000000000000 in ?? () (gdb) thread 2 [Switching to thread 2 (LWP 2026357)] #0 0x00000000580010e2 in do_syscall_WRK () (gdb) bt #0 0x00000000580010e2 in do_syscall_WRK () #1 0x000000005804d1b8 in vgPlain_do_syscall (a7=0, a8=0, sysno=0, a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=0, a5=0, a6=0) at ../coregrind/m_syscall.c:1183 #2 vgPlain_read (fd=<optimized out>, buf=<optimized out>, count=<optimized out>) at ../coregrind/m_libcfile.c:338 #3 0x00000000580d3914 in vgModuleLocal_sema_down (as_LL=0 '\000', sema=0x10020063f0) at ../coregrind/m_scheduler/sema.c:107 #4 acquire_sched_lock (p=0x10020063f0) at ../coregrind/m_scheduler/sched-lock-generic.c:69 #5 0x0000000058099d77 in vgModuleLocal_acquire_sched_lock (p=<optimized out>) at ../coregrind/m_scheduler/sched-lock.c:88 #6 vgPlain_acquire_BigLock_LL (who=0x0) at ../coregrind/m_scheduler/scheduler.c:422 #7 vgPlain_acquire_BigLock (tid=2, who=0x5827ec58 "thread_wrapper(starting new thread)") at ../coregrind/m_scheduler/scheduler.c:346 #8 0x00000000580db229 in thread_wrapper (tidW=2) at ../coregrind/m_syswrap/syswrap-linux.c:83 #9 run_a_thread_NORETURN (tidW=2) at ../coregrind/m_syswrap/syswrap-linux.c:155 #10 0x00000000580db6bf in vgModuleLocal_start_thread_NORETURN (arg=<optimized out>) at ../coregrind/m_syswrap/syswrap-linux.c:329 #11 0x0000000058001181 in do_syscall_clone_amd64_linux () #12 0xdeadbeefdeadbeef in ?? () #13 0xdeadbeefdeadbeef in ?? () #14 0xdeadbeefdeadbeef in ?? () #15 0xdeadbeefdeadbeef in ?? () #16 0x0000000000000000 in ?? ()
--fair-sched=yes seems to work around it
(In reply to Tavian Barnes from comment #2) > --fair-sched=yes seems to work around it OK, can we close this item?
I guess? I'm surprised that valgrind will totally starve a thread by default. This may be a dupe of https://bugs.kde.org/show_bug.cgi?id=343357 anyway.
(In reply to Tavian Barnes from comment #4) > I guess? I'm surprised that valgrind will totally starve a thread by > default. > > This may be a dupe of https://bugs.kde.org/show_bug.cgi?id=343357 anyway. It can happen. The default Valgrind scheduler is well, no scheduling. There's just a global lock (using FIFO reads and writes). When the running thread releases the lock it's pot luck as to which thread gets to take it. And I suspect that there are cases where the running thread is hot in the cache and so it keeps taking the lock.