A few of the DRD tests are failing on OI hipster 2024.10. For instance hold_lock_1 paulf@openindiana:~/valgrind$ cat drd/tests/hold_lock_1.stderr.diff --- hold_lock_1.stderr.exp 2023-09-10 09:26:27.606842684 +0200 +++ hold_lock_1.stderr.out 2025-03-14 07:30:48.347271974 +0100 @@ -1,27 +1,61 @@ Locking mutex ... -Acquired at: +The object at address 0x........ is not a mutex. at 0x........: pthread_mutex_lock (drd_pthread_intercepts.c:?) by 0x........: main (hold_lock.c:?) -Lock on mutex 0x........ was held during ... ms (threshold: 500 ms). - at 0x........: pthread_mutex_unlock (drd_pthread_intercepts.c:?) +mutex 0x........ was first observed at: + at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?) + by 0x........: main (hold_lock.c:?) + +The object at address 0x........ is not a mutex. + at 0x........: pthread_mutex_lock (drd_pthread_intercepts.c:?) by 0x........: main (hold_lock.c:?) mutex 0x........ was first observed at: at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?) by 0x........: main (hold_lock.c:?) -Locking rwlock exclusively ... -Acquired at: - at 0x........: pthread_rwlock_wrlock (drd_pthread_intercepts.c:?) +Mutex type changed: mutex 0x........, recursion count 2, owner 1. + at 0x........: pthread_mutex_unlock (drd_pthread_intercepts.c:?) by 0x........: main (hold_lock.c:?) -Lock on rwlock 0x........ was held during ... ms (threshold: 500 ms). - at 0x........: pthread_rwlock_unlock (drd_pthread_intercepts.c:?) +mutex 0x........ was first observed at: + at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?) by 0x........: main (hold_lock.c:?) -rwlock 0x........ was first observed at: - at 0x........: pthread_rwlock_init (drd_pthread_intercepts.c:?) + + +drd: drd_mutex.c:405 (vgDrd_mutex_unlock): Assertion 'p->mutex_type == mutex_type' failed. + +host stacktrace: + at 0x........: show_sched_status_wrk (m_libcassert.c:?) + by 0x........: report_and_quit (m_libcassert.c:?) + by 0x........: vgPlain_assert_fail (m_libcassert.c:?) + by 0x........: vgDrd_mutex_unlock (drd_mutex.c:?) + by 0x........: handle_thr_client_request (drd_clientreq.c:?) + by 0x........: handle_client_request (drd_clientreq.c:?) + by 0x........: wrap_tool_handle_client_request (m_tooliface.c:?) + by 0x........: do_client_request (scheduler.c:?) + by 0x........: vgPlain_scheduler (scheduler.c:?) + by 0x........: thread_wrapper (syswrap-solaris.c:134) + by 0x........: run_a_thread_NORETURN (syswrap-solaris.c:182) + +sched status: + running_tid=1 + +Thread 1: status = VgTs_Runnable (lwpid 1) + at 0x........: pthread_mutex_unlock (drd_pthread_intercepts.c:?) by 0x........: main (hold_lock.c:?) +client stack range: [0x........ 0x........] client SP: 0x........ +valgrind stack range: [0x........ 0x........] top usage: 10664 of 1048576 The code is pthread_mutexattr_init(&mutexattr); pthread_mutexattr_settype(&mutexattr, PTHREAD_MUTEX_RECURSIVE); pthread_mutex_init(&mutex, &mutexattr); pthread_mutexattr_destroy(&mutexattr); pthread_mutex_lock(&mutex); // error here on line 51 DRD contains two wrappers for pthread_mutex_init, one for the function itself and one Solaris (and Illumos) only for mutex_init. Same thing for pthread_mutex_destroy and mutex_destroy. The two 'init' functions are different. However, for 'destroy' a weak alias is used. I'm not too sure how or why this ever worked properly. My suspicion is that at some time 'pthread_mutex_init' made a sibling call to 'mutex_init' (see the changes here https://code.illumos.org/c/illumos-gate/+/3255/3/usr/src/lib/libc/port/threads/pthr_mutex.c#b245). That would hide the call to mutex_init, so DRD would only see one 'init' call and one 'destroy' call. After the change it would be seeing two inits and one destroy. I don't know if the 'type' is different between the two. Solaris 11.3 and 11.4 don't use a sibling call. Anyway, my initial debugging in gdb shows that I see - intercepted pthread_mutex_init with tyoe mt equal to zero - intercepted mutex_init with type equal to 6 - intercepted mutex_lock I'm not certain but I think that the second 'init' is failing to record the init with the right type (because it has already been recorded) and then the lock looks for the mutext with type 6 and fails to find it. I don't see much difference compared to Solaris 11. Need to debug more the mutex kind.
Looking at this call pthread_mutex_init(&mutex, &mutexattr); The type for OI and Solaris 11.4 the same but different. OI is 8 (DEFAULT) and S11.4 is 0 (DEFAULT and NORMAL). We treat both of these the same. Then there is the call to mutex_init. Here the type is 6 (ERRORCHECK|RECURSIVE). DRD_(thread_to_drd_mutex_type) treats this as RECURSIVE (it only checks with one OR at a time).
There is a bug in Illumos pthread_mutexattr_gettype See https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/threads/pthr_mutex.c#L370 It just reads an uninit local variable which gets optimised to just a literal zero.
commit 4d578ef2153133c8480489f2f88a3dc1ffddef85 (HEAD -> master, origin/master, origin/HEAD) Author: Paul Floyd <pjfloyd@wanadoo.fr> Date: Fri Mar 14 21:56:35 2025 +0100 Bug 501479 - Illumos DRD pthread_mutex_init wrapper errors