110669 – valgrind attach to gdb and quitting gdb hangs valgrind

Bug 110669 - valgrind attach to gdb and quitting gdb hangs valgrind

Summary: valgrind attach to gdb and quitting gdb hangs valgrind

Status:	RESOLVED INTENTIONAL

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	3.0.0
Platform:	Debian testing Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-08-13 00:35 UTC by Wim Delvaux
Modified:	2014-09-25 08:11 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Wim Delvaux 2005-08-13 00:35:16 UTC

when valgrind reports an error and you attach to gdb to check, gdb starts up 
just fine and you can do whatever you want. 
 
However when you quit gdb you get the following situation 
 
(gdb) q 
The program is running.  Quit anyway (and detach it)? (y or n) y 
 
/nevyn/local/gdb/gdb-6.3/gdb/linux-nat.c:1007: internal-error: 
linux_nat_detach: Assertion `num_lwps == 1' failed. 
A problem internal to GDB has been detected, 
further debugging may prove unreliable. 
Quit this debugging session? (y or n) y 
/nevyn/local/gdb/gdb-6.3/gdb/linux-nat.c:1007: internal-error: 
linux_nat_detach: Assertion `num_lwps == 1' failed. 
A problem internal to GDB has been detected, 
further debugging may prove unreliable. 
Create a core file of GDB? (y or n) n 
==20774== 
==20774== Debugger has detached.  Valgrind regains control.  We continue. 
 
I have now 6.3 of gdb and 3.0 of valgrind and this problem has been around for 
quite some time.  gdb is in error, correct  but should valgrind then fail to 
resume as it claims to ? 
 
The only workaround is to killall -9 valgrind. Ctrl-c nor ctrl-z works 
 
PLSE ! help

Comment 1 Tom Hughes 2005-08-13 01:14:16 UTC

I suspect this is a kernel bug - the gdb process has died but the forked copy of valgrind is not responding to a KILL signal. Either that or it has responded but it has the wrong parent so doesn't report it's exit status properly.

I assume if you strace the valgrind process is it stuck in waitpid? how many valgrind processes are there - one? or two?

You could try inserting this line before the VG_(kill) at line 173 of m_debugger.c:

  ptrace(PTRACE_KILL, pid, NULL, 0)

I'm not sure if the kernel will accept that however as valgrind won't be attached as the current ptracer, and possibly can't attach if the kernel still thinks gdb is attached.

Comment 2 Wim Delvaux 2005-08-13 02:27:18 UTC

On Saturday 13 August 2005 01:14, Tom Hughes wrote:
[bugs.kde.org quoted mail]

	the kernel I run is 2.6.9-1

	I suppose you have more connections to figure out if it IS a bug and/or
	if it is fixed in some hihger version of the kernel ?

>
> I assume if you strace the valgrind process is it stuck in waitpid? how
> many valgrind processes are there - one? or two?


	two processes,
	21220 : parent valgrind, strace shows wait4( 21312, ...)

	21312 : child valgrind, strace shows nothing. in fact the
		     second process is terminated

	ps -axef shows

21390 pts/1    Ss     0:01  |   \_ /bin/bash SSH_AGENT_PID=25244 
DM_CONTROL=/var/run/xdmctl SHELL=/bin/bash 
XDM_MANAGED=/var/run/xdmctl/xdmctl-:0,maysd,mayf
21220 pts/1    Sl+    0:18  |   |   \_ valgrind 
--suppressions=/home/u19809/valgrind.supp --db-attach=yes --num-callers=20 
AP_QtApp MSPA -c /home/u19809/pro
21312 pts/1    T+     0:00  |   |       \_ valgrind 
--suppressions=/home/u19809/valgrind.supp --db-attach=yes --num-callers=20 
AP_QtApp MSPA -c /home/u19809

>
> You could try inserting this line before the VG_(kill) at line 173 of
> m_debugger.c:
>
>   ptrace(PTRACE_KILL, pid, NULL, 0)


	tried it but does not help

Comment 3 Joel Dice 2006-05-25 18:24:43 UTC

FWIW, I also have this problem using valgrind 3.1.1 on Debian Testing (Linux hostname 2.6.11-9-amd64-k8 #1 Wed Jun 29 17:33:01 CEST 2005 x86_64 GNU/Linux), and on Ubuntu Breezy (don't have a uname -a handy, but it's a Pentium III).  If there's anything I can do to help, please let me know.  Thanks.

Comment 4 Peter Maydell 2006-09-13 13:14:37 UTC

Note that the child valgrind process is sat in the T state, because gdb has quit abruptly without trying to detach from it. If you send it a SIGCONT (eg with kill(1)) then the valgrind session continues normally. So I don't think this is a kernel bug.

If you wanted valgrind to be able to recover from this kind of debugger bug I suppose you could have it check to see whether the child valgrind was sat in the T state and just send it a SIGCONT if so.

More interesting would be to find out what valgrind is doing that is causing gdb's threading support to assert.

Comment 5 Tom Hughes 2006-09-13 13:28:00 UTC

The implication is that a SIGKILL has failed to terminate the child valgrind because it is stuck in the T state - is that not a kernel bug? SIGKILL is supposed to kill just about anything with extreme prejudice.

Which process did you send SIGCONT to? The child valgrind?

Shouldn't the kernel have released the T state anyway when the tracing process (gdb) died?

If sending a SIGCONT to the child (in addition the SIGKILL we already send) will help then I can easily add that - it is a one line fix.

Comment 6 Florian Krohm 2014-09-25 08:11:43 UTC

The --db-attach feature is deprecated as of valgrind 3.10.0.
It will be removed in the next valgrind feature release.
The built-in GDB server capabilities are superior and should be
used instead. 
Therefore, this bug will not be fixed.