77824 – --db-attach does not work

Bug 77824 - --db-attach does not work

Summary: --db-attach does not work

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	2.1.1
Platform:	Unlisted Binaries Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Tom Hughes

URL:
Keywords:

Duplicates (2):	77851 80139 (view as bug list)
Depends on:
Blocks:

Reported:	2004-03-17 12:02 UTC by Gottfried.Ganssauge
Modified:	2004-04-26 10:51 UTC (History)
CC List:	2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Hack to make db-attach work again (701 bytes, patch) 2004-04-06 20:11 UTC, Peter Seiderer	Details
Fix typo in previous patch (703 bytes, patch) 2004-04-13 17:41 UTC, Peter Seiderer	Details
Show Obsolete (1) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Gottfried.Ganssauge 2004-03-17 12:02:31 UTC

--db-attach appears not to work at all.
Consider the following console protocol:
$ uname -a
Linux gglinux 2.4.22 #1 SMP Mon Nov 3 11:40:28 CET 2003 i686 unknown unknown
GNU/Linux
$ cat x.c
void main () {
* (char *) 0 = 0;
}
$ cc -g -o x x.c
x.c: In function `main':
x.c:1: warning: return type of `main' is not `int'
$ valgrind --db-attach=yes ./x
==23026== Memcheck, a memory error detector for x86-linux.
==23026== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward.
==23026== Using valgrind-2.1.1, a program supervision framework for x86-linux.
==23026== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward.
==23026== For more details, rerun with: -v
==23026== 
==23026== Invalid write of size 1
==23026==    at 0x8048324: main (x.c:2)
==23026==  Address 0x0 is not stack'd, malloc'd or free'd
==23026== 
==23026== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- y
starting debugger
==23026== starting debugger with cmd: /usr/bin/gdb -nw /proc/23029/fd/822 23029

valgrind: vg_signals.c:1587 (vg_sync_signalhandler): Assertion `info->si_code <=
0' failed.
==23029==    at 0xB802F9F0: vgPlain_skin_assert_fail (vg_mylibc.c:1211)
==23029==    by 0xB802F9EF: assert_fail (vg_mylibc.c:1207)
==23029==    by 0xB802FA5D: vgPlain_core_assert_fail (vg_mylibc.c:1218)
==23029==    by 0xB80360E1: vg_sync_signalhandler (vg_signals.c:1630)

sched status:

Thread 1: status = Runnable, associated_mx = 0x0, associated_cv = 0x0
==23029==    at 0x8048314: main (x.c:1)


Note: see also the FAQ.txt in the source distribution.
It contains workarounds to several common problems.

If that doesn't help, please report this bug to: valgrind.kde.org

In the bug report, send all the above text, the valgrind
version, and what Linux distro you are using.  Thanks.

GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-linux".../proc/23029/fd/822: Permission denied.

Attaching to process 23029
ptrace: Operation not permitted.
/tmp/23029: No such file or directory.
(gdb) ==23026== 
==23026== Debugger has detached.  Valgrind regains control.  We continue.
==23026== 
==23026== Process terminating with default action of signal 11 (SIGSEGV):
dumping core
==23026==  Access not within mapped region at address 0x0
==23026==    at 0x8048324: main (x.c:2)
==23026== 
==23026== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- 
==23026== 
==23026== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 11 from 1)
==23026== malloc/free: in use at exit: 0 bytes in 0 blocks.
==23026== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==23026== For a detailed leak analysis,  rerun with: --leak-check=yes
==23026== For counts of detected errors, rerun with: -v
$

Comment 1 Aleksander Salwa 2004-03-24 13:07:56 UTC

The same behaviour on latest version from CVS (2004-03-24), RedHat 8.0.
uname -a:
Linux asalwa.osmosys.tv 2.4.25 #2 SMP Tue Mar 2 14:13:26 CET 2004 i686 i686 i386 GNU/Linux

Comment 2 Tom Hughes 2004-03-24 14:07:20 UTC

I can reproduce this on RedHat 8.0, but not on RedHat 9 or Fedora Core 1. Obviously the ptrace(PTRACE_DETACH, pid, NULL, SIGSTOP) manages to work with those kernels although it isn't obvious why. Then again I can't quite work out from the kernel source how the signal from a PTRACE_CONT or PTRACE_DETACH is every delivered...

Comment 3 Aleksander Salwa 2004-03-30 11:23:33 UTC

I can reproduce it on RedHat 9.0 too. (kernel 2.4.24, SMP machine, glibc-2.3.2-27.9).

BTW, bug 77851 should be resolved as a duplicate of this one.

Comment 4 Tom Hughes 2004-03-30 11:37:14 UTC

Well I'm using kernel 2.4.20-24.9smp on an SMP machine and it seems to work for me which is rather odd...

Comment 5 Tom Hughes 2004-03-30 11:37:58 UTC

*** Bug 77851 has been marked as a duplicate of this bug. ***

Comment 6 Gottfried.Ganssauge 2004-03-30 11:46:39 UTC

In the meantime I had to switch my machine.
It runs Debian testing with kernel
Linux helios 2.4.25-1-686-smp #1 SMP Tue Feb 24 12:07:16 EST 2004 i686 GNU/Linux
and libc-2.3.2.ds1
The bug still happens

Comment 7 Peter Seiderer 2004-04-06 20:11:15 UTC

Created attachment 5557 [details]
Hack to make db-attach work again

The problem witdh --db-attach=yes seems to be that in vg_main.c:start_debugger
the child resumes executing after the 'ptrace(PTRACE_DETACH, ...)' call of
the parent (maybe kernel related).

A crude fix for this is sending an additional SIGSTOP befor
calling PTRACE_DETACH (see attached patch).

Comment 8 Tom Hughes 2004-04-06 20:25:12 UTC

In message <20040406181117.27139.qmail@ktown.kde.org> you wrote:

> The problem witdh --db-attach=yes seems to be that in vg_main.c:start_debugger
> the child resumes executing after the 'ptrace(PTRACE_DETACH, ...)' call of
> the parent (maybe kernel related).
> 
> A crude fix for this is sending an additional SIGSTOP befor
> calling PTRACE_DETACH (see attached patch).

Is there any guarantee that doing that will work though? I'm not sure
what happens if a process catches a signal while it is stopped in the
debugger...

I guess the signal will most likely be queued and processed when the
debugger continues the process, which is good. The only question then
is whether the process can execute at all before the signal is delivered
to it.

Tom

Comment 9 Peter Seiderer 2004-04-13 17:41:46 UTC

Created attachment 5621 [details]
Fix typo in previous patch

The line
   kill(pid, SIGSTOP) == 0 &
should read
   kill(pid, SIGSTOP) == 0 &&

Peter

Comment 10 Tom Hughes 2004-04-21 17:33:50 UTC

Well applying that actually breaks things on my RH9 system as the child process appears to get two STOP signals, one from the kill and one from the PTRACE_CONT call and valgrind then hangs after you exit the debugger, waiting for the child to exit. Sending SIGCONT to the child lets things continue.

Comment 11 Tom Hughes 2004-04-21 17:40:01 UTC

CVS commit by thughes: 

Change the debugger attachment code to send the STOP signal to the
forked process before using ptrace() to continue it, instead of asking
ptrace to deliver it, as that doesn't seem to work on some versions
of linux.

CCMAIL: 77824-done@bugs.kde.org


  M +2 -1      vg_main.c   1.150


--- valgrind/coregrind/vg_main.c  #1.149:1.150
@@ -351,5 +351,6 @@ void VG_(start_debugger) ( Int tid )
           WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP &&
           ptrace(PTRACE_SETREGS, pid, NULL, &regs) == 0 &&
-          ptrace(PTRACE_DETACH, pid, NULL, SIGSTOP) == 0) {
+          kill(pid, SIGSTOP) == 0 &&
+          ptrace(PTRACE_DETACH, pid, NULL, 0) == 0) {
          Char pidbuf[15];
          Char file[30];

Comment 12 Tom Hughes 2004-04-22 20:02:57 UTC

*** Bug 80139 has been marked as a duplicate of this bug. ***

Comment 13 Gottfried.Ganssauge 2004-04-23 13:15:32 UTC

It works now on my system.
Good work!

Comment 14 Stephan Wefing 2004-04-26 10:51:31 UTC

Works for me, too. Thanks!

Stephan