The valgrind code attempts to terminate the program in a way that is consistent with the disposition of a signal that is not handled by the instrumented program. Unfortunately, the results depend upon whether the unhandled signal is received by the main thread, or another thread. Reproducible: Always Steps to Reproduce: Running the following test script (which compiles and runs the test program) illustrates the problem: ------------------------------------------------------- #!/bin/sh test() { rm -f foo.pid "$@" > foo.pid 2>/dev/null & PID=$! while [ "$(wc -l < foo.pid)" -lt 2 ] ; do sleep 1 done head -1 foo.pid kill $(tail -1 foo.pid) wait $PID echo $? } gcc -o foo foo.c -lpthread echo Plain test ./foo test ./foo x echo '' valgrind --version test valgrind ./foo test valgrind ./foo x echo '' valgrind/bin/valgrind --version test valgrind/bin/valgrind ./foo test valgrind/bin/valgrind ./foo x echo '' ------------------------------------------------------- #include <stdio.h> #include <unistd.h> #include <signal.h> #include <pthread.h> void * slavethread(void *arg) { while (1) sleep(1); } int main(int argc, char **argv) { pthread_t slave; if (argc > 1) printf("unblocked\n"); else { if (pthread_create(&slave, 0, &slavethread, 0)) return 255; sigset_t sigmask; if (sigfillset(&sigmask)) return 255; if (sigprocmask(SIG_BLOCK, &sigmask, 0)) return 255; printf("blocked\n"); } printf("%d\n", getpid()); fflush(stdout); while (1) sleep(1); return 0; } Actual Results: $ sh foo.sh Plain blocked 143 unblocked 143 valgrind-3.10.0.SVN blocked 137 unblocked 143 valgrind-3.11.0 blocked 137 unblocked 143 Expected Results: Ideally, all the test cases should show: -------------------- blocked 143 unblocked 143 -------------------- In the absence of that, the exit codes between the blocked and unblocked cases should be the same (and not show 137 in the blocked case, and 143 in the unblocked case).
Created attachment 98326 [details] Candidate patch to propagate termination signal Here is an attempt to propagate the signal that terminates a thread into the entire program.
Do you have a small test case which demonstrates that Valgrind's behaviour at present, differs from when the program is run "natively" ? I'd be happier about this if you have something directly that demonstrates that V's behaviour differs from that of the kernel, in this respect.
(In reply to Julian Seward from comment #2) > Do you have a small test case which demonstrates that Valgrind's behaviour at > present, differs from when the program is run "natively" ? In the bug description, I pasted a small C program and accompanying shell script that demonstrates the difference between a natively run program, and the same running under valgrind. Are you looking for something different ?
Created attachment 100145 [details] Self contained test case
I compiled and ran the submitted test case both with and without valgrind. This shows that under valgrind, the child process terminates with SIGKILL rather than the expected SIGTERM. $ gcc -o test test.c -lpthread && ./test Signalled with 15 $ gcc -o test test.c -lpthread && valgrind ./test ==14625== Memcheck, a memory error detector ==14625== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==14625== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==14625== Command: ./test ==14625== ==14626== ==14626== Process terminating with default action of signal 15 (SIGTERM) ==14626== at 0x4124B96: ??? (syscall-template.S:81) ==14626== by 0x412493C: sleep (sleep.c:137) ==14626== by 0x804868E: slavethread (in /tmp/test) ==14626== by 0x4058F71: start_thread (pthread_create.c:312) ==14626== by 0x415AF8D: clone (clone.S:129) ==14626== ==14626== HEAP SUMMARY: ==14626== in use at exit: 136 bytes in 1 blocks ==14626== total heap usage: 1 allocs, 0 frees, 136 bytes allocated ==14626== ==14626== LEAK SUMMARY: ==14626== definitely lost: 0 bytes in 0 blocks ==14626== indirectly lost: 0 bytes in 0 blocks ==14626== possibly lost: 136 bytes in 1 blocks ==14626== still reachable: 0 bytes in 0 blocks ==14626== suppressed: 0 bytes in 0 blocks ==14626== Rerun with --leak-check=full to see details of leaked memory ==14626== ==14626== For counts of detected and suppressed errors, rerun with: -v ==14626== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Signalled with 9 test: test.c:52: main: Assertion `15 == (((__extension__ (((union { __typeof(status) __in; int __i; }) { .__in = (status) }).__i))) & 0x7f)' failed. ==14625== ==14625== Process terminating with default action of signal 6 (SIGABRT) ==14625== at 0x409D687: raise (raise.c:56) ==14625== by 0x40A0AB2: abort (abort.c:89) ==14625== by 0x40967C6: __assert_fail_base (assert.c:92) ==14625== by 0x4096876: __assert_fail (assert.c:101) ==14625== by 0x8048816: main (in /tmp/test) ==14625== ==14625== HEAP SUMMARY: ==14625== in use at exit: 0 bytes in 0 blocks ==14625== total heap usage: 3 allocs, 3 frees, 550 bytes allocated ==14625== ==14625== All heap blocks were freed -- no leaks are possible ==14625== ==14625== For counts of detected and suppressed errors, rerun with: -v ==14625== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Aborted
Philippe, didn't you fix something like this recently?
(In reply to Julian Seward from comment #6) > Philippe, didn't you fix something like this recently? Not recently. Related to thread termination, I did something some years ago, to fix bug 324227. I can take a look at this bug and see if I find something.
(In reply to earl_chew from comment #4) > Created attachment 100145 [details] > Self contained test case It seems there was an attachment problem. Could you re-attach the self contained test case ? Thanks
Comment on attachment 100145 [details] Self contained test case .
As Philippe pointed out, there is a problem with the existing attachment. Please have a look what is there inside and re-attach.
(In reply to Ivo Raisr from comment #10) > As Philippe pointed out, there is a problem with the existing attachment. > Please have a look what is there inside and re-attach. Yes, that's obviously wrong. Let me resurrect the intended content.
The content of attachment 100145 [details] has been deleted for the following reason: Removing non-relevant content
Created attachment 101149 [details] Standalone test case I could not find the previously mentioned standalone test case, so I've uploaded a reconstructed version that illustrates the same behaviour. Note also that I have included a candidate patch with this ticket: https://bugs.kde.org/attachment.cgi?id=98326 When run outside valgrind, the test program prints: --------------------------------------------------- $ gcc -o test test.c -lpthread && ./test ; echo $? Signal 15 0 --------------------------------------------------- When run under valgrind, the result is different: --------------------------------------------------- $ gcc -o test test.c -lpthread && valgrind ./test ; echo $? ==3351== Memcheck, a memory error detector ==3351== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==3351== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==3351== Command: ./test ==3351== ==3352== ==3352== Process terminating with default action of signal 15 (SIGTERM) ==3352== at 0x405DF3F: pthread_sigmask (pthread_sigmask.c:48) ==3352== by 0x804875E: slavethread (in /home/earl/test) ==3352== by 0x4058F71: start_thread (pthread_create.c:312) ==3352== by 0x415AF8D: clone (clone.S:129) ==3352== ==3352== HEAP SUMMARY: ==3352== in use at exit: 136 bytes in 1 blocks ==3352== total heap usage: 1 allocs, 0 frees, 136 bytes allocated ==3352== ==3352== LEAK SUMMARY: ==3352== definitely lost: 0 bytes in 0 blocks ==3352== indirectly lost: 0 bytes in 0 blocks ==3352== possibly lost: 136 bytes in 1 blocks ==3352== still reachable: 0 bytes in 0 blocks ==3352== suppressed: 0 bytes in 0 blocks ==3352== Rerun with --leak-check=full to see details of leaked memory ==3352== ==3352== For counts of detected and suppressed errors, rerun with: -v ==3352== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Signal 9 test: test.c:88: main: Assertion `(((__extension__ (((union { __typeof(status) __in; int __i; }) { .__in = (status) }).__i))) & 0x7f) == 15' failed. ==3351== ==3351== Process terminating with default action of signal 6 (SIGABRT) ==3351== at 0x409D687: raise (raise.c:56) ==3351== by 0x40A0AB2: abort (abort.c:89) ==3351== by 0x40967C6: __assert_fail_base (assert.c:92) ==3351== by 0x4096876: __assert_fail (assert.c:101) ==3351== by 0x80489E5: main (in /home/earl/test) ==3351== ==3351== HEAP SUMMARY: ==3351== in use at exit: 0 bytes in 0 blocks ==3351== total heap usage: 3 allocs, 3 frees, 550 bytes allocated ==3351== ==3351== All heap blocks were freed -- no leaks are possible ==3351== ==3351== For counts of detected and suppressed errors, rerun with: -v ==3351== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Aborted 134 ---------------------------------------------------
Fixed in revision 15982. Note that the fix committed consists in calling VG_(reap_threads)(tid); after nuke all threads. Thanks for the bug report and the test case.