when valgrind reports an error and you attach to gdb to check, gdb starts up just fine and you can do whatever you want. However when you quit gdb you get the following situation (gdb) q The program is running. Quit anyway (and detach it)? (y or n) y /nevyn/local/gdb/gdb-6.3/gdb/linux-nat.c:1007: internal-error: linux_nat_detach: Assertion `num_lwps == 1' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) y /nevyn/local/gdb/gdb-6.3/gdb/linux-nat.c:1007: internal-error: linux_nat_detach: Assertion `num_lwps == 1' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Create a core file of GDB? (y or n) n ==20774== ==20774== Debugger has detached. Valgrind regains control. We continue. I have now 6.3 of gdb and 3.0 of valgrind and this problem has been around for quite some time. gdb is in error, correct but should valgrind then fail to resume as it claims to ? The only workaround is to killall -9 valgrind. Ctrl-c nor ctrl-z works PLSE ! help
I suspect this is a kernel bug - the gdb process has died but the forked copy of valgrind is not responding to a KILL signal. Either that or it has responded but it has the wrong parent so doesn't report it's exit status properly. I assume if you strace the valgrind process is it stuck in waitpid? how many valgrind processes are there - one? or two? You could try inserting this line before the VG_(kill) at line 173 of m_debugger.c: ptrace(PTRACE_KILL, pid, NULL, 0) I'm not sure if the kernel will accept that however as valgrind won't be attached as the current ptracer, and possibly can't attach if the kernel still thinks gdb is attached.
On Saturday 13 August 2005 01:14, Tom Hughes wrote: [bugs.kde.org quoted mail] the kernel I run is 2.6.9-1 I suppose you have more connections to figure out if it IS a bug and/or if it is fixed in some hihger version of the kernel ? > > I assume if you strace the valgrind process is it stuck in waitpid? how > many valgrind processes are there - one? or two? two processes, 21220 : parent valgrind, strace shows wait4( 21312, ...) 21312 : child valgrind, strace shows nothing. in fact the second process is terminated ps -axef shows 21390 pts/1 Ss 0:01 | \_ /bin/bash SSH_AGENT_PID=25244 DM_CONTROL=/var/run/xdmctl SHELL=/bin/bash XDM_MANAGED=/var/run/xdmctl/xdmctl-:0,maysd,mayf 21220 pts/1 Sl+ 0:18 | | \_ valgrind --suppressions=/home/u19809/valgrind.supp --db-attach=yes --num-callers=20 AP_QtApp MSPA -c /home/u19809/pro 21312 pts/1 T+ 0:00 | | \_ valgrind --suppressions=/home/u19809/valgrind.supp --db-attach=yes --num-callers=20 AP_QtApp MSPA -c /home/u19809 > > You could try inserting this line before the VG_(kill) at line 173 of > m_debugger.c: > > ptrace(PTRACE_KILL, pid, NULL, 0) tried it but does not help
FWIW, I also have this problem using valgrind 3.1.1 on Debian Testing (Linux hostname 2.6.11-9-amd64-k8 #1 Wed Jun 29 17:33:01 CEST 2005 x86_64 GNU/Linux), and on Ubuntu Breezy (don't have a uname -a handy, but it's a Pentium III). If there's anything I can do to help, please let me know. Thanks.
Note that the child valgrind process is sat in the T state, because gdb has quit abruptly without trying to detach from it. If you send it a SIGCONT (eg with kill(1)) then the valgrind session continues normally. So I don't think this is a kernel bug. If you wanted valgrind to be able to recover from this kind of debugger bug I suppose you could have it check to see whether the child valgrind was sat in the T state and just send it a SIGCONT if so. More interesting would be to find out what valgrind is doing that is causing gdb's threading support to assert.
The implication is that a SIGKILL has failed to terminate the child valgrind because it is stuck in the T state - is that not a kernel bug? SIGKILL is supposed to kill just about anything with extreme prejudice. Which process did you send SIGCONT to? The child valgrind? Shouldn't the kernel have released the T state anyway when the tracing process (gdb) died? If sending a SIGCONT to the child (in addition the SIGKILL we already send) will help then I can easily add that - it is a one line fix.
The --db-attach feature is deprecated as of valgrind 3.10.0. It will be removed in the next valgrind feature release. The built-in GDB server capabilities are superior and should be used instead. Therefore, this bug will not be fixed.