Bug 503668 - DRD regtest failures on Fedora 42 amd64
Summary: DRD regtest failures on Fedora 42 amd64
Status: RESOLVED MOVED
Alias: None
Product: valgrind
Classification: Developer tools
Component: drd (other bugs)
Version First Reported In: 3.25 GIT
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Paul Floyd
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-05-02 17:03 UTC by Paul Floyd
Modified: 2025-08-19 08:02 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Floyd 2025-05-02 17:03:25 UTC
I get the following 4 failures

drd/tests/fork-parallel                  (stderr)
drd/tests/fork-serial                    (stderr)
drd/tests/threaded-fork-vcs              (stderr)
drd/tests/threaded-fork                  (stder

They all look like lock reinitialization issues. The diff for fork-serial is

+Thread 2:
+Reader-writer lock reinitialization: rwlock 0x.........
+   at 0x........: pthread_rwlock_init (drd_pthread_intercepts.c:?)
+   by 0x........: fork (in /...libc...)
+   by 0x........: startproc (fork.c:?)
+   by 0x........: vgDrd_thread_wrapper (drd_pthread_intercepts.c:?)
+   by 0x........: start_thread
+   by 0x........: clone (in /...libc...)
+rwlock 0x........ was first observed at:
+   at 0x........: pthread_rwlock_rdlock (drd_pthread_intercepts.c:?)
+   by 0x........: _Fork (in /...libc...)
+   by 0x........: fork (in /...libc...)
+   by 0x........: startproc (fork.c:?)
+   by 0x........: vgDrd_thread_wrapper (drd_pthread_intercepts.c:?)
+   by 0x........: start_thread
+   by 0x........: clone (in /...libc...)
+
 /bin/ls
+Thread 3:
+Reader-writer lock reinitialization: rwlock 0x.........
+   at 0x........: pthread_rwlock_init (drd_pthread_intercepts.c:?)
+   by 0x........: fork (in /...libc...)
+   by 0x........: startproc (fork.c:?)
+   by 0x........: vgDrd_thread_wrapper (drd_pthread_intercepts.c:?)
+   by 0x........: start_thread
+   by 0x........: clone (in /...libc...)
+rwlock 0x........ was first observed at:
+   at 0x........: pthread_rwlock_rdlock (drd_pthread_intercepts.c:?)
+   by 0x........: _Fork (in /...libc...)
+   by 0x........: fork (in /...libc...)
+   by 0x........: startproc (fork.c:?)
+   by 0x........: vgDrd_thread_wrapper (drd_pthread_intercepts.c:?)
+   by 0x........: start_thread
+   by 0x........: clone (in /...libc...)
+

My guess is that this is a glibc 2.41 issue.
Comment 1 Paul Floyd 2025-05-23 15:14:25 UTC
Lookin at this it looks like it is a regression in glibc. Version 2.41 added some locking around fork(). 

Without filtering the errors look like

==101957== Thread 3:
==101957== Reader-writer lock reinitialization: rwlock 0x4a71b60.
==101957==    at 0x486191A: pthread_rwlock_init_intercept (drd_pthread_intercepts.c:1734)
==101957==    by 0x486191A: pthread_rwlock_init@* (drd_pthread_intercepts.c:1742)
==101957==    by 0x494CA6F: fork (fork.c:88)
==101957==    by 0x4004EE: startproc (fork.c:18)
==101957==    by 0x4848562: vgDrd_thread_wrapper (drd_pthread_intercepts.c:512)
==101957==    by 0x48F91D3: start_thread (pthread_create.c:448)
==101957==    by 0x497BB13: clone (clone.S:100)
==101957== rwlock 0x4a71b60 was first observed at:
==101957==    at 0x4862ABA: pthread_rwlock_rdlock_intercept (drd_pthread_intercepts.c:1798)
==101957==    by 0x4862ABA: pthread_rwlock_rdlock@* (drd_pthread_intercepts.c:1812)
==101957==    by 0x4946CE4: _Fork (_Fork.c:31)
==101957==    by 0x494CA1F: fork (fork.c:75)
==101957==    by 0x4004EE: startproc (fork.c:18)
==101957==    by 0x4848562: vgDrd_thread_wrapper (drd_pthread_intercepts.c:512)
==101957==    by 0x48F91D3: start_thread (pthread_create.c:448)
==101957==    by 0x497BB13: clone (clone.S:100)

Looking at the source for that, the creation of the phtread_rwlock looks like

  internal_sigset_t original_sigmask;
  __abort_lock_rdlock (&original_sigmask);

which is calling into this code

__libc_rwlock_define_initialized (static, lock);

void
__abort_lock_rdlock (internal_sigset_t *set)
{
  internal_signal_block_all (set);
  __libc_rwlock_rdlock (lock);
}



as always navigating the macros is non-obvious but I guess that comes from

#define __libc_rwlock_define_initialized(CLASS,NAME) \
  CLASS __libc_rwlock_t NAME = PTHREAD_RWLOCK_INITIALIZER;

So the summary here is that pthread_rwlock_rdlock is being called using a file static that is correctly initialized.

Then where the error occurs the code is

	  call_function_static_weak (__abort_fork_reset_child);

which does

void
__abort_fork_reset_child (void)
{
  __libc_rwlock_init (lock);
}

Note that I don't see any intervening calls to pthread_rwlock_destroy.

The man page says

 Results are undefined if pthread_rwlock_init() is called specifying an already initialized read-write lock.
Comment 2 Paul Floyd 2025-05-23 15:25:36 UTC
I've opened a glibc bugzilla item for this:

https://sourceware.org/bugzilla/show_bug.cgi?id=32994
Comment 3 Mark Wielaard 2025-06-27 14:24:09 UTC
This looks like a glibc bug. At least the proposed fix for glibc in https://sourceware.org/bugzilla/show_bug.cgi?id=32994 works for me.
Comment 4 Mark Wielaard 2025-08-01 14:24:37 UTC
(In reply to Mark Wielaard from comment #3)
> This looks like a glibc bug. At least the proposed fix for glibc in
> https://sourceware.org/bugzilla/show_bug.cgi?id=32994 works for me.

That patch is now in glibc git and on the 2.41 and 2.42 release branches so once distros pick that up these drd failures should be gone.
Comment 5 Paul Floyd 2025-08-19 08:02:36 UTC
With Fedora 42 I now get

-- Finished tests in drd/tests (in 241 sec) ----------------------------

== 136 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==