Bug 445743 - "The impossible happened: mutex is locked simultaneously by two threads" while using mutexes with priority inheritance and signals
Summary: "The impossible happened: mutex is locked simultaneously by two threads" whil...
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: drd (show other bugs)
Version: 3.16.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-19 13:05 UTC by Vladimir Nikolov
Modified: 2023-02-28 21:20 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Example to reproduce the issue. (2.18 KB, text/x-csrc)
2021-11-19 13:05 UTC, Vladimir Nikolov
Details
reproducer with DRD extras (3.29 KB, text/x-csrc)
2021-11-19 21:09 UTC, Paul Floyd
Details
Patch for FreeBSD (3.24 KB, patch)
2022-06-08 07:52 UTC, Paul Floyd
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vladimir Nikolov 2021-11-19 13:05:25 UTC
Created attachment 143727 [details]
Example to reproduce the issue.

SUMMARY
***
The provided example shows two threads synchronizing on a mutex, which has been initialized with the PTHREAD_PRIO_INHERIT protocol. While the contender is blocked on pthread_mutex_lock() the owner of the mutex sends a signal to the contender via pthread_kill(). While running under valgrind supervision with --tool=drd --trace-mutex=yes "The impossible happened: mutex is locked simultaneously by two threads" is reported and the example terminates unexpectedly.
***


STEPS TO REPRODUCE

Compile the example with -O0 and run with valgrind with --tool=drd --trace-mutex=yes enabled.

OBSERVED RESULT

When run outside valgrind the example does not terminate. This is expected behavior.
However, when run with valgrind's memcheck tool the program terminates. This seems to be already wrong. 
Furthermore, when run with valgrind's drd tool the impossible happens:

$ valgrind --tool=drd ./mutex_signal 
==264795== drd, a thread error detector
==264795== Copyright (C) 2006-2017, and GNU GPL'd, by Bart Van Assche.
==264795== Using Valgrind-3.16.0 and LibVEX; rerun with -h for copyright info
==264795== Command: ./mutex_signal
==264795== 
mutex initialized
thread attributes initialized
contender trying to lock mutex
thread created
sleeping
signalling
joining thread
nullHandler running
==264795== Thread 2:
==264795== The impossible happened: mutex is locked simultaneously by two threads: mutex 0x1ffefff850, recursion count 1, owner 1.
==264795==    at 0x4C3F298: pthread_mutex_lock_intercept (drd_pthread_intercepts.c:893)
==264795==    by 0x4C3F298: pthread_mutex_lock (drd_pthread_intercepts.c:903)
==264795==    by 0x400BE3: contender_start (mutex_signal.c:16)
==264795==    by 0x4C3BB48: vgDrd_thread_wrapper (drd_pthread_intercepts.c:449)
==264795==    by 0x4E5E149: start_thread (in /usr/lib64/libpthread-2.28.so)
==264795==    by 0x5172DC2: clone (in /usr/lib64/libc-2.28.so)
==264795== mutex 0x1ffefff850 was first observed at:
==264795==    at 0x4C3E2C4: pthread_mutex_init_intercept (drd_pthread_intercepts.c:826)
==264795==    by 0x4C3E2C4: pthread_mutex_init (drd_pthread_intercepts.c:835)
==264795==    by 0x400D34: main (mutex_signal.c:48)
==264795== 
contender locked mutex
contender unlocking mutex
contender unlocked mutex
==264795== Thread 1:
==264795== Mutex still locked at thread exit: mutex 0x1ffefff850, recursion count 1, owner 2.
==264795==    at 0x4C3CB26: pthread_join_intercept (drd_pthread_intercepts.c:716)
==264795==    by 0x4C3CB26: pthread_join (drd_pthread_intercepts.c:723)
==264795==    by 0x400E91: main (mutex_signal.c:74)
==264795== mutex 0x1ffefff850 was first observed at:
==264795==    at 0x4C3E2C4: pthread_mutex_init_intercept (drd_pthread_intercepts.c:826)
==264795==    by 0x4C3E2C4: pthread_mutex_init (drd_pthread_intercepts.c:835)
==264795==    by 0x400D34: main (mutex_signal.c:48)
==264795== 
==264795== 
==264795== For lists of detected and suppressed errors, rerun with: -s
==264795== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 158 from 148)

EXPECTED RESULT

When run outside valgrind the example does not terminate. 
When run outside valgrind the mutex is not locked by both threads when the signal is sent. 

$ ./mutex_signal 
mutex initialized
thread attributes initialized
thread created
sleeping
contender trying to lock mutex
signalling
joining thread
nullHandler running

SOFTWARE/OS VERSIONS

Linux 4.18.0-305.19.1.el8_4.x86_64 #1 SMP Wed Sep 15 15:39:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 8.4.2105

ADDITIONAL INFORMATION

This issue was reproduced also with latest valgrind-3.19.0.GIT
Comment 1 Paul Floyd 2021-11-19 16:28:08 UTC
Interesting.

With Helgrind I get 

==13857== Thread #2: Bug in libpthread: write lock granted on mutex/rwlock which is currently wr-held by a different thread
==13857==    at 0x403586F: mutex_lock_WRK (hg_intercepts.c:942)                                                            
==13857==    by 0x4012E2: contender_start (sched.c:16)                                                                     
==13857==    by 0x40384A6: mythread_wrapper (hg_intercepts.c:406)                                                          
==13857==    by 0x4A2BDD4: start_thread (in /usr/lib64/libpthread-2.17.so)                                                 
==13857==    by 0x4D3DEAC: clone (in /usr/lib64/libc-2.17.so) 

This only happens when the mutex has the PTHREAD_PRIO_INHERIT attribute.

This doesn't seem to be a new problem - I can reproduce with Valgrind 3.13.

The Helgrind code also implies that there could be a race condition between the intercept passing the call on to pthread_mutex_lock and Helgrind registering the return from pthread_mutex_lock. That could be affected by changing the priority.

The other possibility is that DRD uses its own CV and mutex to ensure that the thread is joinable before it runs. There could be interference between the guest mutex with elevated priority and this CV/mutex.
Comment 2 Paul Floyd 2021-11-19 21:09:40 UTC
Created attachment 143748 [details]
reproducer with DRD extras

I tried to merge the extras (wrapper + mutex + CV) that DRD adds.

However this does not reproduce the problem.
Comment 3 Paul Floyd 2021-11-19 22:19:36 UTC
I tried running both the the standalone and DRD under strace.

Under DRD I see the main thread getting as far as the kill, then I see

futex(0x4a814c0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
(which is one of DRD's mutex lock calls I think)


Need to debug more.
Comment 4 Paul Floyd 2021-11-20 08:03:09 UTC
I can also reproduce the problem on FreeBSD.

What I see so far is that main runs all the way to pthread_join. contender does not run at all, despite the sleep. Only after main is blocked in the pthread_join does contender run. When it does run, the pthread_mutex_lock fails.

If I change contender to be 

void *contender_start(void *arg)
{
  pthread_mutex_t *mutex = arg;
  fprintf(stderr, "contender trying to lock mutex\n");
  while (pthread_mutex_lock(mutex))
  {
     perror("contender error locking mutex:");
  }
  fprintf(stderr, "contender locked mutex\n");
  fprintf(stderr, "contender unlocking mutex\n");
  pthread_mutex_unlock(mutex);
  fprintf(stderr, "contender unlocked mutex\n");
  return NULL;
}

then I get

mutex initialized
main mutex locked
thread attributes initialized
contender trying to lock mutex
thread created
sleeping
signalling
joining thread
nullHandler running
contender error locking mutex:: Invalid argument

with the perror printed just once

I'll need to do a fairly long debugging session inside DRD and Valgrind core to see what is happening there.

with --trace-mutex=yes

==5143== [2] post_mutex_lock error checking mutex 0x7fc000290 rc 1 owner 1 (locking failed)
... mutex for perror ...
==5143== [2] mutex_trylock   error checking mutex 0x7fc000290 rc 1 owner 1
Comment 5 Paul Floyd 2021-11-20 09:53:52 UTC
And this post on SO covers why the contender mutex lock fails.

https://stackoverflow.com/questions/12781944/pthread-mutex-lock-returns-invalid-argument
Comment 6 Paul Floyd 2021-11-20 13:59:31 UTC
Unfortunately this doesn't seem to help on Linux.

The problem seems to be that after the signal contender doesn't return to waiting.
Comment 7 Bart Van Assche 2021-11-20 22:32:17 UTC
Please take a look at commit bf0579a44a44 ("drd: Add a test program that interrupts pthread_mutex_lock()"). If I run the new test program (the program attached to this bug with minor changes) natively then the text "unlock" appears before the text "contender locked mutex". If I run that program under memcheck then the order of these two output lines is reversed. So I think this indicates a bug in the Valgrind core.
Comment 8 Paul Floyd 2021-11-21 07:43:11 UTC
On FreeBSD I get

mutex initialized
thread attributes initialized
thread created
sleeping
signalling
sleeping
nullHandler running
Assertion failed: (ret == 0), function contender_start, file pth_mutex_signal.c, line 23.

I'll move to the valgrind developers list for further discussion - will update here as necessary.
Comment 9 Alexander Kanavin 2022-04-27 08:06:14 UTC
Yocto project started getting this too with valgrind 3.19.0 on linux:

drd/tests/pth_mutex_signal.stderr.diff
************
--- pth_mutex_signal.stderr.exp	2018-03-09 12:34:56.000000000 +0000
+++ pth_mutex_signal.stderr.out	2022-04-27 07:32:55.310000000 +0000
@@ -6,10 +6,42 @@
 signalling
 sleeping
 nullHandler running
-unlocking
+Thread 2:
+The impossible happened: mutex is locked simultaneously by two threads: mutex 0x........, recursion count 1, owner 1.
+   at 0x........: pthread_mutex_lock (drd_pthread_intercepts.c:?)
+   by 0x........: contender_start (pth_mutex_signal.c:?)
+   by 0x........: vgDrd_thread_wrapper (drd_pthread_intercepts.c:?)
+   by 0x........: start_thread
+   by 0x........: clone (in /...libc...)
+mutex 0x........ was first observed at:
+   at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?)
+   by 0x........: main (pth_mutex_signal.c:?)
+
 contender locked mutex
 contender unlocking mutex
 contender unlocked mutex
+unlocking
+Thread 1:
+Mutex not locked by calling thread: mutex 0x........, recursion count 1, owner 2.
+   at 0x........: pthread_mutex_unlock (drd_pthread_intercepts.c:?)
+   by 0x........: main (pth_mutex_signal.c:?)
+mutex 0x........ was first observed at:
+   at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?)
+   by 0x........: main (pth_mutex_signal.c:?)
+
 joining thread
+Mutex still locked at thread exit: mutex 0x........, recursion count 1, owner 2.
+   at 0x........: pthread_join (drd_pthread_intercepts.c:?)
+   by 0x........: main (pth_mutex_signal.c:?)
+mutex 0x........ was first observed at:
+   at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?)
+   by 0x........: main (pth_mutex_signal.c:?)
+
+Destroying locked mutex: mutex 0x........, recursion count 1, owner 0.
+   at 0x........: main (pth_mutex_signal.c:?)
+mutex 0x........ was first observed at:
+   at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?)
+   by 0x........: main (pth_mutex_signal.c:?)
+
 
-ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
+ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
Comment 10 Paul Floyd 2022-05-10 21:15:37 UTC
On FreeBSD, single stepping out after the user interrupt I get here

sleep_in_kernel:
        if (abstime == NULL)
                ret = __thr_umutex_lock(&m->m_lock, id); // RETURN HERE UNDER VG
        else if (__predict_false(abstime->tv_nsec < 0 ||
            abstime->tv_nsec >= 1000000000))
                ret = EINVAL;
        else
                ret = __thr_umutex_timedlock(&m->m_lock, id, abstime);
done:
        if (ret == 0 || ret == EOWNERDEAD) { // RETURN HERE STANDALONE
                enqueue_mutex(curthread, m, ret);
                if (ret == EOWNERDEAD)
                        m->m_lock.m_flags |= UMUTEX_NONCONSISTENT;
        }
        return (ret);
}

Retcode is 4, invalid.
Comment 11 Paul Floyd 2022-06-07 15:16:11 UTC
I think that I understand the problem, at least on Linux.

PTHREAD_PRIO_INHERIT causes the type of mutex to change so that it will return ERESTARTNOINTR if interrupted.

I see, with strace

14177 <... futex resumed> )             = ? ERESTARTNOINTR (To be restarted)

Standalone I see

20015 rt_sigreturn({mask=[]})           = 202
20015 futex(0x7fffffffd070, FUTEX_LOCK_PI_PRIVATE, 1, NULL <unfinished ...>
20014 <... nanosleep resumed> NULL)     = 0


The problem is that the Valgrind sigreturn mechanism hijacks the return address so we no longer return to the guest futex syscall. That means that the futex doesn't get restarted.

Even worse, since the futex was supposed to be restarted the return code to userland is zero, so we don't even see that it failed. 

I don't see any way to fix this at the moment on Linux.
Comment 12 Paul Floyd 2022-06-07 22:20:39 UTC
Looks similar on FreeBSD. truss output, standalone

78094: thr_kill(133780,SIGRT59)			 = 0 (0x0)
78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_LOCK,0x0,0x0,0x0) ERESTART
78094: SIGNAL 124 (SIGRT59) code=SI_LWP pid=78094 uid=501
78094: write(2,"sleeping\n",9)			 = 9 (0x9)
78094: sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },0x0) = 0 (0x0)
78094: write(2,"nullHandler running\n",20)	 = 20 (0x14)
78094: sigreturn(0x7fffdfffd670)		 EJUSTRETURN
78094: nanosleep({ 1.000000000 })		 = 0 (0x0)
78094: write(2,"unlocking\n",10)		 = 10 (0xa)
78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_UNLOCK,0x0,0x0,0x0) = 0 (0x0)
78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_LOCK,0x0,0x0,0x0) ERESTART
78094: _umtx_op(0x800684008,UMTX_OP_MUTEX_LOCK,0x0,0x0,0x0) = 0 (0x0)
78094: write(2,"contender locked mutex\n",23)	 = 23 (0x17)

Without pthread_mutexattr_setprotocol(&mutex_attr, PTHREAD_PRIO_INHERIT);   


78129: thr_kill(133828,SIGRT59)			 = 0 (0x0)
78129: _umtx_op(0x800684008,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) ERR#4 'Interrupted system call'
78129: SIGNAL 124 (SIGRT59) code=SI_LWP pid=78129 uid=501
78129: write(2,"sleeping\n",9)			 = 9 (0x9)
78129: sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },0x0) = 0 (0x0)
78129: write(2,"nullHandler running\n",20)	 = 20 (0x14)
78129: sigreturn(0x7fffdfffd640)		 EJUSTRETURN
78129: nanosleep({ 1.000000000 })		 = 0 (0x0)
78129: write(2,"unlocking\n",10)		 = 10 (0xa)
78129: _umtx_op(0x800684008,UMTX_OP_MUTEX_WAKE2,0x0,0x0,0x0) = 0 (0x0)
78129: _umtx_op(0x800684008,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
78129: write(2,"contender locked mutex\n",23)	 = 23 (0x17)

This time under DRD the mutex does return an error code - invalid argument.
Comment 13 Paul Floyd 2022-06-08 07:52:36 UTC
Created attachment 149549 [details]
Patch for FreeBSD

This patch fixes the drd/tests/pth_mutex_signal testcase on FreeBSD. I think that it should extend easily to Linux.

I need to do more testing and it would also be good to find out a list of all syscalls that use ERESTARTNOINTR  (Linux) and ERESTART (FreeBSD).
Comment 14 Paul Floyd 2022-06-08 20:33:45 UTC
If I add to the previous patch

     *flags |= SfMayBlock;
* start add*
   if ((ARG2 & (VKI_FUTEX_PRIVATE_FLAG|VKI_FUTEX_LOCK_PI)) == (VKI_FUTEX_PRIVATE_FLAG|VKI_FUTEX_LOCK_PI)) {
       *flags |= SfKernelRestart;
   }
*end add *

to static void futex_pre_helper ( ThreadId tid, SyscallArgLayout* layout,
                               SyscallArgs* arrghs, SyscallStatus* status,
                               UWord* flags, Bool is_time64 )

then I get

paulf> ../../vg-in-place --tool=drd -q ./pth_mutex_signal
mutex initialized
thread attributes initialized
thread created
sleeping
signalling
sleeping
nullHandler running
unlocking
contender locked mutex
contender unlocking mutex
contender unlocked mutex
joining thread

No error
Comment 15 Nick Briggs 2022-06-15 01:50:43 UTC
Also note:  drd/tests/pth_mutex_signal when run under simple memcheck, fails because pthread_mutex_lock() returns EINTR, while POSIX says of it (and related functions) "These functions shall not return an error code of [EINTR]."
Comment 16 Paul Floyd 2022-06-15 12:21:27 UTC
(In reply to Nick Briggs from comment #15)
> Also note:  drd/tests/pth_mutex_signal when run under simple memcheck, fails
> because pthread_mutex_lock() returns EINTR, while POSIX says of it (and
> related functions) "These functions shall not return an error code of
> [EINTR]."

The big question is why Linux and FreeBSD aren't restarting their respective _umtx_op/futex system calls. Both are using internal kernel values of errno that indicate that the sycall should be restarted, ERESTARTNOINTR  on Linux and ERESTART on FreeBSD.

When running standalone, the signal handler will create a sigframe with the return trampoline doing a sigreturn to the syscall.

When running under Valgrind, things are a lot different. We will have replaced the client signal handler by async_signalhandler. So when the signal arrives, this is what gets called, with a sigframe for sigreturn back to the syscall. But we will never return from async_signalhandler.

Instead async_signalhandler calls fixup_guest_state_after_syscall_interrupted *** which does not handle kernel restartable syscalls ***

async_signalhandler  then calls deliver_signal which synthesizes an extended sigframe with out own return trampoline that does a (fake) sigreturn. async_sighandler jumps back to the scheduler, which will then run the guest signal handler. That will return via our trampoline to our (fake) sigreturn. This fixes up the Virtual CPU to make it look like the client is returning to the syscall but then (fake) sigreturn just sets the status (so there is no real syscall) and returns back to the Valgrind scheduler.

That all takes us back to where fixup_guest_state_after_syscall_interrupted  said to go after the sigreturn. We'd like it to restart the syscall, like when running standalone. But that doesn't seem to be the case at the moment.
Comment 17 Paul Floyd 2022-06-15 14:44:25 UTC
One thing that I'm not sure about is whether restartable syscalls re-read their userland arguments or whether they reuse the copyin()'d arguments.
Comment 18 Paul Floyd 2022-12-25 09:21:51 UTC
In the FreeBSD kernel it looks like the mutex functions are only restartable if they don't have a timeout.

The 3 cases that I see are
UMTX_OP_MUTEX_LOCK
UMTX_OP_MUTEX_TRYLOCK
and
UMTX_OP_MUTEX_WAIT when the timeout argument is null

The commit below seems to fix this issue on FreeBSD.

commit d8cc70f6d5a37b9b773c85f8666d6c5f7f65a29b (HEAD -> master, origin/master, origin/HEAD)
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Sun Dec 25 10:18:51 2022 +0100

    FreeBSD only: fix 445743
    
    Restart 3 of the umtx_op mutex operations if they are interrupted.
Comment 19 Nick Briggs 2022-12-25 17:53:40 UTC
c.f. restart_syscall(2) in the Linux kernel -- https://man7.org/linux/man-pages/man2/restart_syscall.2.html
Comment 20 Paul Floyd 2022-12-26 19:51:55 UTC
(In reply to Nick Briggs from comment #19)
> c.f. restart_syscall(2) in the Linux kernel --
> https://man7.org/linux/man-pages/man2/restart_syscall.2.html

That might be useful for Linux if/when someone implements it. I don't know if it will help with the way that we handle syscalls and signals, but it looks like it will be useful for timed syscalls.
Comment 21 Paul Floyd 2023-02-28 21:20:24 UTC
commit 94a41e050c92a504515c66ebdaf84cad91c869bb (HEAD -> master, origin/master, origin/HEAD)
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Tue Feb 28 22:17:25 2023 +0100