Created attachment 143727 [details] Example to reproduce the issue. SUMMARY *** The provided example shows two threads synchronizing on a mutex, which has been initialized with the PTHREAD_PRIO_INHERIT protocol. While the contender is blocked on pthread_mutex_lock() the owner of the mutex sends a signal to the contender via pthread_kill(). While running under valgrind supervision with --tool=drd --trace-mutex=yes "The impossible happened: mutex is locked simultaneously by two threads" is reported and the example terminates unexpectedly. *** STEPS TO REPRODUCE Compile the example with -O0 and run with valgrind with --tool=drd --trace-mutex=yes enabled. OBSERVED RESULT When run outside valgrind the example does not terminate. This is expected behavior. However, when run with valgrind's memcheck tool the program terminates. This seems to be already wrong. Furthermore, when run with valgrind's drd tool the impossible happens: $ valgrind --tool=drd ./mutex_signal ==264795== drd, a thread error detector ==264795== Copyright (C) 2006-2017, and GNU GPL'd, by Bart Van Assche. ==264795== Using Valgrind-3.16.0 and LibVEX; rerun with -h for copyright info ==264795== Command: ./mutex_signal ==264795== mutex initialized thread attributes initialized contender trying to lock mutex thread created sleeping signalling joining thread nullHandler running ==264795== Thread 2: ==264795== The impossible happened: mutex is locked simultaneously by two threads: mutex 0x1ffefff850, recursion count 1, owner 1. ==264795== at 0x4C3F298: pthread_mutex_lock_intercept (drd_pthread_intercepts.c:893) ==264795== by 0x4C3F298: pthread_mutex_lock (drd_pthread_intercepts.c:903) ==264795== by 0x400BE3: contender_start (mutex_signal.c:16) ==264795== by 0x4C3BB48: vgDrd_thread_wrapper (drd_pthread_intercepts.c:449) ==264795== by 0x4E5E149: start_thread (in /usr/lib64/libpthread-2.28.so) ==264795== by 0x5172DC2: clone (in /usr/lib64/libc-2.28.so) ==264795== mutex 0x1ffefff850 was first observed at: ==264795== at 0x4C3E2C4: pthread_mutex_init_intercept (drd_pthread_intercepts.c:826) ==264795== by 0x4C3E2C4: pthread_mutex_init (drd_pthread_intercepts.c:835) ==264795== by 0x400D34: main (mutex_signal.c:48) ==264795== contender locked mutex contender unlocking mutex contender unlocked mutex ==264795== Thread 1: ==264795== Mutex still locked at thread exit: mutex 0x1ffefff850, recursion count 1, owner 2. ==264795== at 0x4C3CB26: pthread_join_intercept (drd_pthread_intercepts.c:716) ==264795== by 0x4C3CB26: pthread_join (drd_pthread_intercepts.c:723) ==264795== by 0x400E91: main (mutex_signal.c:74) ==264795== mutex 0x1ffefff850 was first observed at: ==264795== at 0x4C3E2C4: pthread_mutex_init_intercept (drd_pthread_intercepts.c:826) ==264795== by 0x4C3E2C4: pthread_mutex_init (drd_pthread_intercepts.c:835) ==264795== by 0x400D34: main (mutex_signal.c:48) ==264795== ==264795== ==264795== For lists of detected and suppressed errors, rerun with: -s ==264795== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 158 from 148) EXPECTED RESULT When run outside valgrind the example does not terminate. When run outside valgrind the mutex is not locked by both threads when the signal is sent. $ ./mutex_signal mutex initialized thread attributes initialized thread created sleeping contender trying to lock mutex signalling joining thread nullHandler running SOFTWARE/OS VERSIONS Linux 4.18.0-305.19.1.el8_4.x86_64 #1 SMP Wed Sep 15 15:39:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux CentOS Linux release 8.4.2105 ADDITIONAL INFORMATION This issue was reproduced also with latest valgrind-3.19.0.GIT
Interesting. With Helgrind I get ==13857== Thread #2: Bug in libpthread: write lock granted on mutex/rwlock which is currently wr-held by a different thread ==13857== at 0x403586F: mutex_lock_WRK (hg_intercepts.c:942) ==13857== by 0x4012E2: contender_start (sched.c:16) ==13857== by 0x40384A6: mythread_wrapper (hg_intercepts.c:406) ==13857== by 0x4A2BDD4: start_thread (in /usr/lib64/libpthread-2.17.so) ==13857== by 0x4D3DEAC: clone (in /usr/lib64/libc-2.17.so) This only happens when the mutex has the PTHREAD_PRIO_INHERIT attribute. This doesn't seem to be a new problem - I can reproduce with Valgrind 3.13. The Helgrind code also implies that there could be a race condition between the intercept passing the call on to pthread_mutex_lock and Helgrind registering the return from pthread_mutex_lock. That could be affected by changing the priority. The other possibility is that DRD uses its own CV and mutex to ensure that the thread is joinable before it runs. There could be interference between the guest mutex with elevated priority and this CV/mutex.
Created attachment 143748 [details] reproducer with DRD extras I tried to merge the extras (wrapper + mutex + CV) that DRD adds. However this does not reproduce the problem.
I tried running both the the standalone and DRD under strace. Under DRD I see the main thread getting as far as the kill, then I see futex(0x4a814c0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) (which is one of DRD's mutex lock calls I think) Need to debug more.
I can also reproduce the problem on FreeBSD. What I see so far is that main runs all the way to pthread_join. contender does not run at all, despite the sleep. Only after main is blocked in the pthread_join does contender run. When it does run, the pthread_mutex_lock fails. If I change contender to be void *contender_start(void *arg) { pthread_mutex_t *mutex = arg; fprintf(stderr, "contender trying to lock mutex\n"); while (pthread_mutex_lock(mutex)) { perror("contender error locking mutex:"); } fprintf(stderr, "contender locked mutex\n"); fprintf(stderr, "contender unlocking mutex\n"); pthread_mutex_unlock(mutex); fprintf(stderr, "contender unlocked mutex\n"); return NULL; } then I get mutex initialized main mutex locked thread attributes initialized contender trying to lock mutex thread created sleeping signalling joining thread nullHandler running contender error locking mutex:: Invalid argument with the perror printed just once I'll need to do a fairly long debugging session inside DRD and Valgrind core to see what is happening there. with --trace-mutex=yes ==5143== [2] post_mutex_lock error checking mutex 0x7fc000290 rc 1 owner 1 (locking failed) ... mutex for perror ... ==5143== [2] mutex_trylock error checking mutex 0x7fc000290 rc 1 owner 1
And this post on SO covers why the contender mutex lock fails. https://stackoverflow.com/questions/12781944/pthread-mutex-lock-returns-invalid-argument
Unfortunately this doesn't seem to help on Linux. The problem seems to be that after the signal contender doesn't return to waiting.
Please take a look at commit bf0579a44a44 ("drd: Add a test program that interrupts pthread_mutex_lock()"). If I run the new test program (the program attached to this bug with minor changes) natively then the text "unlock" appears before the text "contender locked mutex". If I run that program under memcheck then the order of these two output lines is reversed. So I think this indicates a bug in the Valgrind core.
On FreeBSD I get mutex initialized thread attributes initialized thread created sleeping signalling sleeping nullHandler running Assertion failed: (ret == 0), function contender_start, file pth_mutex_signal.c, line 23. I'll move to the valgrind developers list for further discussion - will update here as necessary.
Yocto project started getting this too with valgrind 3.19.0 on linux: drd/tests/pth_mutex_signal.stderr.diff ************ --- pth_mutex_signal.stderr.exp 2018-03-09 12:34:56.000000000 +0000 +++ pth_mutex_signal.stderr.out 2022-04-27 07:32:55.310000000 +0000 @@ -6,10 +6,42 @@ signalling sleeping nullHandler running -unlocking +Thread 2: +The impossible happened: mutex is locked simultaneously by two threads: mutex 0x........, recursion count 1, owner 1. + at 0x........: pthread_mutex_lock (drd_pthread_intercepts.c:?) + by 0x........: contender_start (pth_mutex_signal.c:?) + by 0x........: vgDrd_thread_wrapper (drd_pthread_intercepts.c:?) + by 0x........: start_thread + by 0x........: clone (in /...libc...) +mutex 0x........ was first observed at: + at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?) + by 0x........: main (pth_mutex_signal.c:?) + contender locked mutex contender unlocking mutex contender unlocked mutex +unlocking +Thread 1: +Mutex not locked by calling thread: mutex 0x........, recursion count 1, owner 2. + at 0x........: pthread_mutex_unlock (drd_pthread_intercepts.c:?) + by 0x........: main (pth_mutex_signal.c:?) +mutex 0x........ was first observed at: + at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?) + by 0x........: main (pth_mutex_signal.c:?) + joining thread +Mutex still locked at thread exit: mutex 0x........, recursion count 1, owner 2. + at 0x........: pthread_join (drd_pthread_intercepts.c:?) + by 0x........: main (pth_mutex_signal.c:?) +mutex 0x........ was first observed at: + at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?) + by 0x........: main (pth_mutex_signal.c:?) + +Destroying locked mutex: mutex 0x........, recursion count 1, owner 0. + at 0x........: main (pth_mutex_signal.c:?) +mutex 0x........ was first observed at: + at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?) + by 0x........: main (pth_mutex_signal.c:?) + -ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) +ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
On FreeBSD, single stepping out after the user interrupt I get here sleep_in_kernel: if (abstime == NULL) ret = __thr_umutex_lock(&m->m_lock, id); // RETURN HERE UNDER VG else if (__predict_false(abstime->tv_nsec < 0 || abstime->tv_nsec >= 1000000000)) ret = EINVAL; else ret = __thr_umutex_timedlock(&m->m_lock, id, abstime); done: if (ret == 0 || ret == EOWNERDEAD) { // RETURN HERE STANDALONE enqueue_mutex(curthread, m, ret); if (ret == EOWNERDEAD) m->m_lock.m_flags |= UMUTEX_NONCONSISTENT; } return (ret); } Retcode is 4, invalid.
I think that I understand the problem, at least on Linux. PTHREAD_PRIO_INHERIT causes the type of mutex to change so that it will return ERESTARTNOINTR if interrupted. I see, with strace 14177 <... futex resumed> ) = ? ERESTARTNOINTR (To be restarted) Standalone I see 20015 rt_sigreturn({mask=[]}) = 202 20015 futex(0x7fffffffd070, FUTEX_LOCK_PI_PRIVATE, 1, NULL <unfinished ...> 20014 <... nanosleep resumed> NULL) = 0 The problem is that the Valgrind sigreturn mechanism hijacks the return address so we no longer return to the guest futex syscall. That means that the futex doesn't get restarted. Even worse, since the futex was supposed to be restarted the return code to userland is zero, so we don't even see that it failed. I don't see any way to fix this at the moment on Linux.
Looks similar on FreeBSD. truss output, standalone 78094: thr_kill(133780,SIGRT59) = 0 (0x0) 78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_LOCK,0x0,0x0,0x0) ERESTART 78094: SIGNAL 124 (SIGRT59) code=SI_LWP pid=78094 uid=501 78094: write(2,"sleeping\n",9) = 9 (0x9) 78094: sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },0x0) = 0 (0x0) 78094: write(2,"nullHandler running\n",20) = 20 (0x14) 78094: sigreturn(0x7fffdfffd670) EJUSTRETURN 78094: nanosleep({ 1.000000000 }) = 0 (0x0) 78094: write(2,"unlocking\n",10) = 10 (0xa) 78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_UNLOCK,0x0,0x0,0x0) = 0 (0x0) 78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_LOCK,0x0,0x0,0x0) ERESTART 78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_LOCK,0x0,0x0,0x0) = 0 (0x0) 78094: write(2,"contender locked mutex\n",23) = 23 (0x17) Without pthread_mutexattr_setprotocol(&mutex_attr, PTHREAD_PRIO_INHERIT); 78129: thr_kill(133828,SIGRT59) = 0 (0x0) 78129: _umtx_op(0x800684008,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) ERR#4 'Interrupted system call' 78129: SIGNAL 124 (SIGRT59) code=SI_LWP pid=78129 uid=501 78129: write(2,"sleeping\n",9) = 9 (0x9) 78129: sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },0x0) = 0 (0x0) 78129: write(2,"nullHandler running\n",20) = 20 (0x14) 78129: sigreturn(0x7fffdfffd640) EJUSTRETURN 78129: nanosleep({ 1.000000000 }) = 0 (0x0) 78129: write(2,"unlocking\n",10) = 10 (0xa) 78129: _umtx_op(0x800684008,UMTX_OP_MUTEX_WAKE2,0x0,0x0,0x0) = 0 (0x0) 78129: _umtx_op(0x800684008,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0) 78129: write(2,"contender locked mutex\n",23) = 23 (0x17) This time under DRD the mutex does return an error code - invalid argument.
Created attachment 149549 [details] Patch for FreeBSD This patch fixes the drd/tests/pth_mutex_signal testcase on FreeBSD. I think that it should extend easily to Linux. I need to do more testing and it would also be good to find out a list of all syscalls that use ERESTARTNOINTR (Linux) and ERESTART (FreeBSD).
If I add to the previous patch *flags |= SfMayBlock; * start add* if ((ARG2 & (VKI_FUTEX_PRIVATE_FLAG|VKI_FUTEX_LOCK_PI)) == (VKI_FUTEX_PRIVATE_FLAG|VKI_FUTEX_LOCK_PI)) { *flags |= SfKernelRestart; } *end add * to static void futex_pre_helper ( ThreadId tid, SyscallArgLayout* layout, SyscallArgs* arrghs, SyscallStatus* status, UWord* flags, Bool is_time64 ) then I get paulf> ../../vg-in-place --tool=drd -q ./pth_mutex_signal mutex initialized thread attributes initialized thread created sleeping signalling sleeping nullHandler running unlocking contender locked mutex contender unlocking mutex contender unlocked mutex joining thread No error
Also note: drd/tests/pth_mutex_signal when run under simple memcheck, fails because pthread_mutex_lock() returns EINTR, while POSIX says of it (and related functions) "These functions shall not return an error code of [EINTR]."
(In reply to Nick Briggs from comment #15) > Also note: drd/tests/pth_mutex_signal when run under simple memcheck, fails > because pthread_mutex_lock() returns EINTR, while POSIX says of it (and > related functions) "These functions shall not return an error code of > [EINTR]." The big question is why Linux and FreeBSD aren't restarting their respective _umtx_op/futex system calls. Both are using internal kernel values of errno that indicate that the sycall should be restarted, ERESTARTNOINTR on Linux and ERESTART on FreeBSD. When running standalone, the signal handler will create a sigframe with the return trampoline doing a sigreturn to the syscall. When running under Valgrind, things are a lot different. We will have replaced the client signal handler by async_signalhandler. So when the signal arrives, this is what gets called, with a sigframe for sigreturn back to the syscall. But we will never return from async_signalhandler. Instead async_signalhandler calls fixup_guest_state_after_syscall_interrupted *** which does not handle kernel restartable syscalls *** async_signalhandler then calls deliver_signal which synthesizes an extended sigframe with out own return trampoline that does a (fake) sigreturn. async_sighandler jumps back to the scheduler, which will then run the guest signal handler. That will return via our trampoline to our (fake) sigreturn. This fixes up the Virtual CPU to make it look like the client is returning to the syscall but then (fake) sigreturn just sets the status (so there is no real syscall) and returns back to the Valgrind scheduler. That all takes us back to where fixup_guest_state_after_syscall_interrupted said to go after the sigreturn. We'd like it to restart the syscall, like when running standalone. But that doesn't seem to be the case at the moment.
One thing that I'm not sure about is whether restartable syscalls re-read their userland arguments or whether they reuse the copyin()'d arguments.
In the FreeBSD kernel it looks like the mutex functions are only restartable if they don't have a timeout. The 3 cases that I see are UMTX_OP_MUTEX_LOCK UMTX_OP_MUTEX_TRYLOCK and UMTX_OP_MUTEX_WAIT when the timeout argument is null The commit below seems to fix this issue on FreeBSD. commit d8cc70f6d5a37b9b773c85f8666d6c5f7f65a29b (HEAD -> master, origin/master, origin/HEAD) Author: Paul Floyd <pjfloyd@wanadoo.fr> Date: Sun Dec 25 10:18:51 2022 +0100 FreeBSD only: fix 445743 Restart 3 of the umtx_op mutex operations if they are interrupted.
c.f. restart_syscall(2) in the Linux kernel -- https://man7.org/linux/man-pages/man2/restart_syscall.2.html
(In reply to Nick Briggs from comment #19) > c.f. restart_syscall(2) in the Linux kernel -- > https://man7.org/linux/man-pages/man2/restart_syscall.2.html That might be useful for Linux if/when someone implements it. I don't know if it will help with the way that we handle syscalls and signals, but it looks like it will be useful for timed syscalls.
commit 94a41e050c92a504515c66ebdaf84cad91c869bb (HEAD -> master, origin/master, origin/HEAD) Author: Paul Floyd <pjfloyd@wanadoo.fr> Date: Tue Feb 28 22:17:25 2023 +0100