Bug 110669

Summary: valgrind attach to gdb and quitting gdb hangs valgrind
Product: [Developer tools] valgrind Reporter: Wim Delvaux <wim.delvaux>
Component: memcheckAssignee: Julian Seward <jseward>
Status: RESOLVED INTENTIONAL    
Severity: normal CC: flo2030
Priority: NOR    
Version: 3.0.0   
Target Milestone: ---   
Platform: Debian testing   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Wim Delvaux 2005-08-13 00:35:16 UTC
when valgrind reports an error and you attach to gdb to check, gdb starts up 
just fine and you can do whatever you want. 
 
However when you quit gdb you get the following situation 
 
(gdb) q 
The program is running.  Quit anyway (and detach it)? (y or n) y 
 
/nevyn/local/gdb/gdb-6.3/gdb/linux-nat.c:1007: internal-error: 
linux_nat_detach: Assertion `num_lwps == 1' failed. 
A problem internal to GDB has been detected, 
further debugging may prove unreliable. 
Quit this debugging session? (y or n) y 
/nevyn/local/gdb/gdb-6.3/gdb/linux-nat.c:1007: internal-error: 
linux_nat_detach: Assertion `num_lwps == 1' failed. 
A problem internal to GDB has been detected, 
further debugging may prove unreliable. 
Create a core file of GDB? (y or n) n 
==20774== 
==20774== Debugger has detached.  Valgrind regains control.  We continue. 
 
I have now 6.3 of gdb and 3.0 of valgrind and this problem has been around for 
quite some time.  gdb is in error, correct  but should valgrind then fail to 
resume as it claims to ? 
 
The only workaround is to killall -9 valgrind. Ctrl-c nor ctrl-z works 
 
PLSE ! help
Comment 1 Tom Hughes 2005-08-13 01:14:16 UTC
I suspect this is a kernel bug - the gdb process has died but the forked copy of valgrind is not responding to a KILL signal. Either that or it has responded but it has the wrong parent so doesn't report it's exit status properly.

I assume if you strace the valgrind process is it stuck in waitpid? how many valgrind processes are there - one? or two?

You could try inserting this line before the VG_(kill) at line 173 of m_debugger.c:

  ptrace(PTRACE_KILL, pid, NULL, 0)

I'm not sure if the kernel will accept that however as valgrind won't be attached as the current ptracer, and possibly can't attach if the kernel still thinks gdb is attached.
Comment 2 Wim Delvaux 2005-08-13 02:27:18 UTC
On Saturday 13 August 2005 01:14, Tom Hughes wrote:
[bugs.kde.org quoted mail]

	the kernel I run is 2.6.9-1

	I suppose you have more connections to figure out if it IS a bug and/or
	if it is fixed in some hihger version of the kernel ?

>
> I assume if you strace the valgrind process is it stuck in waitpid? how
> many valgrind processes are there - one? or two?


	two processes,
	21220 : parent valgrind, strace shows wait4( 21312, ...)

	21312 : child valgrind, strace shows nothing. in fact the
		     second process is terminated

	ps -axef shows

21390 pts/1    Ss     0:01  |   \_ /bin/bash SSH_AGENT_PID=25244 
DM_CONTROL=/var/run/xdmctl SHELL=/bin/bash 
XDM_MANAGED=/var/run/xdmctl/xdmctl-:0,maysd,mayf
21220 pts/1    Sl+    0:18  |   |   \_ valgrind 
--suppressions=/home/u19809/valgrind.supp --db-attach=yes --num-callers=20 
AP_QtApp MSPA -c /home/u19809/pro
21312 pts/1    T+     0:00  |   |       \_ valgrind 
--suppressions=/home/u19809/valgrind.supp --db-attach=yes --num-callers=20 
AP_QtApp MSPA -c /home/u19809

>
> You could try inserting this line before the VG_(kill) at line 173 of
> m_debugger.c:
>
>   ptrace(PTRACE_KILL, pid, NULL, 0)


	tried it but does not help
Comment 3 Joel Dice 2006-05-25 18:24:43 UTC
FWIW, I also have this problem using valgrind 3.1.1 on Debian Testing (Linux hostname 2.6.11-9-amd64-k8 #1 Wed Jun 29 17:33:01 CEST 2005 x86_64 GNU/Linux), and on Ubuntu Breezy (don't have a uname -a handy, but it's a Pentium III).  If there's anything I can do to help, please let me know.  Thanks.
Comment 4 Peter Maydell 2006-09-13 13:14:37 UTC
Note that the child valgrind process is sat in the T state, because gdb has quit abruptly without trying to detach from it. If you send it a SIGCONT (eg with kill(1)) then the valgrind session continues normally. So I don't think this is a kernel bug.

If you wanted valgrind to be able to recover from this kind of debugger bug I suppose you could have it check to see whether the child valgrind was sat in the T state and just send it a SIGCONT if so.

More interesting would be to find out what valgrind is doing that is causing gdb's threading support to assert.
Comment 5 Tom Hughes 2006-09-13 13:28:00 UTC
The implication is that a SIGKILL has failed to terminate the child valgrind because it is stuck in the T state - is that not a kernel bug? SIGKILL is supposed to kill just about anything with extreme prejudice.

Which process did you send SIGCONT to? The child valgrind?

Shouldn't the kernel have released the T state anyway when the tracing process (gdb) died?

If sending a SIGCONT to the child (in addition the SIGKILL we already send) will help then I can easily add that - it is a one line fix.
Comment 6 Florian Krohm 2014-09-25 08:11:43 UTC
The --db-attach feature is deprecated as of valgrind 3.10.0.
It will be removed in the next valgrind feature release.
The built-in GDB server capabilities are superior and should be
used instead. 
Therefore, this bug will not be fixed.