When running the program below under valgrind 2.4.0 rc2, valgrind aborts with an assert: cc atfork.c -o atfork -lpthread valgrind --tool=none ./atfork [...] valgrind: vg_scheduler.c:471 (run_thread_for_a_while): Assertion `!_qq_tst->sched_jmpbuf_valid' failed. ==7903== at 0xB0035AA7: vgPlain_skin_assert_fail (vg_mylibc.c:1170) ==7903== by 0xB0035AA6: assert_fail (vg_mylibc.c:1166) ==7903== by 0xB0035B01: vgPlain_core_assert_fail (vg_mylibc.c:1177) ==7903== by 0xB0019B32: run_thread_for_a_while (vg_scheduler.c:483) ==7903== by 0xB001A16A: vgPlain_scheduler (vg_scheduler.c:712) ==7903== by 0xB007E3DD: vgArch_thread_wrapper (core_os.c:69) ==7903== by 0xB007BFAB: start_thread (syscalls.c:240) ==7903== by 0xB007BB2F: (within /Projects/software/IA32-LIN/valgrind-2.4.0-rc2/libc6-2.3.2/lib/valgrind/stage2) sched status: running_tid=2 Thread 1: status = VgTs_Yielding ==7903== at 0x3AA7EE7C: clone (in /lib/tls/libc-2.3.2.so) ==7903== by 0x3A997497: create_thread (in /lib/tls/libpthread-0.60.so) ==7903== by 0x3A996F80: pthread_create@@GLIBC_2.1 (in /lib/tls/libpthread-0.60.so) ==7903== by 0x80486D2: main (in /Projects/psp/pohly/src/ict/tracing/vampirtrace/test/atfork) Removing the pthread_create/join from "case 0" let's the program run normally under valgrind. System: x86 + RH EL3.0 (glibc 2.3.2) uname -a: Linux knscsl004.ikn.intel.com 2.4.21-15.ELsmp #1 SMP Thu Apr 22 00:18:24 EDT 2004 i686 i686 i386 GNU/Linux valgrind --version: valgrind-2.4.0.rc2 Another remark: the same program also shows a problem with valgrind's 2.2.0 pthread library. It only calls afterfork() in the first child process, but not in the second one. I don't think this needs any further attention, with 2.4.0 just around the corner... --------------- atfork.c ------------------------ #include <pthread.h> #include <sys/types.h> #include <unistd.h> #include <stdio.h> #include <signal.h> static void afterfork( void ) { fprintf( stderr, "child with pid %d: after fork\n", getpid() ); } static void *threadmain( void *dummy ) { fprintf( stderr, "thread alive: pid %d\n", getpid() ); sleep( (int)dummy ); return NULL; } int main( int argc, char **argv ) { pid_t childpid; pthread_t childthread; int i; void *res; fprintf( stderr, "master: pid %d\n", getpid() ); pthread_create( &childthread, NULL, threadmain, (void *)60 ); pthread_atfork( NULL, NULL, afterfork ); for( i = 0; i < 2; i++ ) { childpid = fork(); switch( childpid ) { case 0: fprintf( stderr, "child %d: I'm alive\n", i ); pthread_create( &childthread, NULL, threadmain, 0 ); pthread_join( childthread, &res ); exit(0); break; case -1: fprintf( stderr, "fork %d failed\n", i ); break; default: fprintf( stderr, "child %d: pid %d\n", i, childpid ); break; } } pthread_kill( childthread, SIGHUP ); pthread_join( childthread, &res ); return 0; } --------------- atfork.c ------------------------
Interesting. Looking at it.
OK, I think this checkin should fix it: When a multi-threaded program forks(), only the thread actually calling fork() appears in the child. The child Valgrind will inherit a VG_(threads) array which still describes the other threads. The code in vg_scheduler:sched_fork_cleanup is responsible for doing this, but it was only "killing" the other threads by setting their statuses to VgTs_Empty. This was causing confusion if the child later created other threads and found partially initialized threads structures. This change makes sched_fork_cleanup fully reinitialize the other thread slots in VG_(threads).