Bug 214909 - valgrind should implement gdbserver remote protocol
Summary: valgrind should implement gdbserver remote protocol
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR wishlist
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-17 02:29 UTC by Paul Pluzhnikov
Modified: 2011-06-28 08:30 UTC (History)
7 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
patch that implements a gdbserver in valgrind (84.70 KB, patch)
2010-03-24 00:43 UTC, Philippe Waroquiers
Details
patch that implements a gdbserver in valgrind (84.70 KB, application/x-bzip2)
2010-03-24 01:16 UTC, Philippe Waroquiers
Details
patch that implements a gdbserver in valgrind (87.75 KB, application/x-bzip2)
2010-03-28 21:12 UTC, Philippe Waroquiers
Details
patch that implements a gdbserver in valgrind (with VEX parts now) (88.67 KB, application/x-bzip2)
2010-03-30 00:19 UTC, Philippe Waroquiers
Details
patch that implements a gdbserver in valgrind (99.31 KB, application/x-bzip2)
2010-04-17 18:48 UTC, Philippe Waroquiers
Details
patch that implements a gdbserver in valgrind (105.32 KB, application/x-bzip2)
2010-04-27 23:53 UTC, Philippe Waroquiers
Details
patch for a gdbserver in valgrind ("1.0 release ?") (108.22 KB, application/x-bzip2)
2010-05-09 22:09 UTC, Philippe Waroquiers
Details
new version : based on GPLv2+ code, handled Tom Hughes comments (91.07 KB, application/x-bzip2)
2010-08-09 20:52 UTC, Philippe Waroquiers
Details
gdbserver patch, with regression tests and user manual (113.86 KB, application/x-bzip2)
2011-03-20 15:52 UTC, Philippe Waroquiers
Details
patch for s390x compile (1.00 KB, patch)
2011-03-20 17:28 UTC, Florian Krohm
Details
gdbserver+test+documentation. x86+amd64+arm+ppc32+ppc64+s390x (120.55 KB, application/x-bzip2)
2011-04-11 21:21 UTC, Philippe Waroquiers
Details
Upgraded to rev 11700 + compiles on Darwin + small changes (e.g. new ppc registers, doc, ..) (121.27 KB, application/x-bzip2)
2011-04-16 13:42 UTC, Philippe Waroquiers
Details
handled review comments + upgraded to rev 11705 (116.68 KB, application/x-bzip2)
2011-04-21 15:12 UTC, Philippe Waroquiers
Details
upgraded to rev 11713 + some fixes/improvements (118.17 KB, application/x-bzip2)
2011-04-28 02:24 UTC, Philippe Waroquiers
Details
filter_gdb (6.40 KB, application/octet-stream)
2011-04-28 23:47 UTC, Philippe Waroquiers
Details
upgraded to rev 11719, integrated filter_gdb, retested, fix ppc64 bug (118.23 KB, application/x-bzip2)
2011-05-01 10:48 UTC, Philippe Waroquiers
Details
tarfile of gdb_server *.out files for ppc32-linux (Debian 6.0) (40.00 KB, application/octet-stream)
2011-05-07 12:26 UTC, Julian Seward
Details
patch to improve testing on various platforms (24.95 KB, text/plain)
2011-05-09 01:34 UTC, Philippe Waroquiers
Details
tarfile of gdb_server *.out files for ppc32-linux with c66 diff (Debian 6.0) (20.00 KB, application/octet-stream)
2011-05-09 12:57 UTC, Julian Seward
Details
patch to improve testing on various platforms + PTRACE_GETREGS better conditional compilation (31.62 KB, text/plain)
2011-05-10 00:08 UTC, Philippe Waroquiers
Details
gdbsrv cleanup: --vgdb=full for mcsig[no]pass, fix 2 bugs (detected by IBM beam), (3.45 KB, text/plain)
2011-05-11 00:50 UTC, Philippe Waroquiers
Details
fix arm thumb by transforming an address to its thumb form when needed (4.43 KB, text/plain)
2011-05-13 00:42 UTC, Philippe Waroquiers
Details
fix some tests on ppc-debian6,s390x + handled Nick Nethercote, Josef Weidendorfer comments (24.02 KB, text/plain)
2011-05-14 17:16 UTC, Philippe Waroquiers
Details
ensure proper cleanup of gdbsrv FIFOs/shmem files with untraced fork/exec (6.72 KB, text/plain)
2011-05-15 17:34 UTC, Philippe Waroquiers
Details
implement delta leak reporting (43.24 KB, text/plain)
2011-05-15 17:45 UTC, Philippe Waroquiers
Details
add cleanup: line to none/tests/require-text-symbol-2.vgtest (496 bytes, text/plain)
2011-05-17 21:32 UTC, Philippe Waroquiers
Details
fix gdbsrv arm thumbness (16.89 KB, text/plain)
2011-05-22 08:23 UTC, Philippe Waroquiers
Details
fix gdbsrv arm thumbness : imark delta based solution (20.47 KB, text/plain)
2011-05-27 00:54 UTC, Philippe Waroquiers
Details
fix mcsig(no)pass on arm Ubuntu10, arm thumb internal doc, improve simulate_control_c (12.84 KB, text/plain)
2011-05-29 23:07 UTC, Philippe Waroquiers
Details
fix safe_fd exhaustion in fork chain caused by non closing of shared_mem_fd (3.56 KB, text/plain)
2011-06-11 18:34 UTC, Philippe Waroquiers
Details
implement delta leak reporting (updated to new rev 11821, minor doc updates) (45.35 KB, text/plain)
2011-06-17 23:56 UTC, Philippe Waroquiers
Details
minor documentation fixes (9.41 KB, text/plain)
2011-06-18 01:35 UTC, Philippe Waroquiers
Details
fix error in usability msg (--vgdb= must be --vgdb-prefix=) + make a test more deterministic (3.07 KB, text/plain)
2011-06-21 00:10 UTC, Philippe Waroquiers
Details
Support for PIC executables (e.g. firefox on Ubuntu 11) by adding the "auxv" protocol packet to gdbsrv. (8.76 KB, text/plain)
2011-06-21 00:22 UTC, Philippe Waroquiers
Details
fix stupid bug (False instead of True) + SIGSTOP iso SIGTRAP (avoid race cond) + 2 new tests (12.26 KB, text/plain)
2011-06-23 01:07 UTC, Philippe Waroquiers
Details
valgrind and tool mon. cmds prefixes changes + doc fixes + new vgdb option --cmd-time-out (79.92 KB, text/plain)
2011-06-27 00:30 UTC, Philippe Waroquiers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Pluzhnikov 2009-11-17 02:29:02 UTC
Version:           SVN (using Devel)
OS:                Linux
Installed from:    Compiled sources

This is related to (but supersedes) bug 84348.

It would be very nice if VG implemented GDB remote protocol stub.

This would allow seamless integration with GDB:
- no more clumsy detach/attach (see bugs 106238, 110669, 167212)
- if you get repeated errors, GDB does not need to reread debug info for each error (for large executable this could be minutes for each GDB attach!).
- set breakpoints and examine values while under VG supervision (how did I get to the bad state VG eventually reports?)
- query VG (memcheck) about the sate of memory it has been tracking (Is this memory free()d or still allocated? Has this value been initialized? Etc.)

There has been recent work on this (good info towards the end of this thread):
http://thread.gmane.org/gmane.comp.debugging.valgrind.devel/9626
Comment 1 Philippe Waroquiers 2010-03-24 00:43:03 UTC
Created attachment 42213 [details]
patch that implements a gdbserver in valgrind

patch on valgrind revision 11096, VEX revision 1964.
Comment 2 Philippe Waroquiers 2010-03-24 00:44:40 UTC
Attached a patch based on revision 11096 VEX revision 1964

Example of usage:
   valgrind --vgctl-check=500 --vgctl-errors=0 ./prog

gdb ./prog
target remote | vgctl

After that, info thread, break, next, continue, C-c, ... works.

Known limitations: 
   1 cannot attach to a process blocked in system calls
   2 does not support the gdb "handle" command
   3 does not support calling code e.g.  p hello_world()
Unknown limitations:
   none, otherwise they would be in the previous section :)
   More seriously: 
     only limited tests done with a small test program + firefox on fedora12
     x86 32 bits
   a previous version of this patch was compiled and moderately tested
   on fedora12 amd64, on ubuntu x86+amd64

ppc32 and ppc64 not done yet. AIX and Darwin not done yet.
(I have no access to such systems).

I will try to enhance the patch in the recent future (e.g. to release
limitations 2 and 3).

More info in file  coregrind/m_gdbserver/README_DEVELOPPERS

Any feedback/suggestions/criticism/... welcome
Comment 3 Philippe Waroquiers 2010-03-24 01:16:08 UTC
Created attachment 42215 [details]
patch that implements a gdbserver in valgrind

Previous attachment attached with a wrong content type
Comment 4 Tom Tromey 2010-03-24 23:01:40 UTC
I applied this to svn valgrind today and gave it a try.
I'm using x86 Fedora 11.

The modified valgrind crashed when I ran it on gdb:

valgrind: m_hashtable.c:222 (vgPlain_HT_ResetIter): Assertion 'table' failed.
==28597==    at 0x38028D85: report_and_quit (m_libcassert.c:191)
==28597==    by 0x38028F57: vgPlain_assert_fail (m_libcassert.c:265)
==28597==    by 0x380270D4: vgPlain_HT_ResetIter (m_hashtable.c:222)
==28597==    by 0x3806580E: vgPlain_is_watched (m_gdbserver.c:278)
==28597==    by 0x38022FE2: vgMemCheck_record_address_error (mc_errors.c:835)
==28597==    by 0x38014B9B: mc_LOADVn_slow (mc_main.c:1233)
==28597==    by 0x38014D7E: vgMemCheck_helperc_LOADV32le (mc_main.c:4150)
==28597==    by 0x64131A22: ???


Also, it would be nice if vgctl supported the extended-remote mode,
so in gdb I could:

(gdb) target extended-remote | vgctl
(gdb) run

... and have a new inferior start up under valgrind.

It may even be worthwhile to have a special "target valgrind"
in gdb.  That way we could support I/O redirection and other
things not supported (AFAIK) by extended-remote.
Comment 5 Philippe Waroquiers 2010-03-25 20:53:09 UTC
> valgrind: m_hashtable.c:222 (vgPlain_HT_ResetIter): Assertion 'table' failed.

I have reproduced the bug in the following circumstances:
  run valgrind without activating gdbserver on an executable which dereferences
  a non-addressable address.
A bypass is to give a --vgctl-check= argument.
The proper fix is to add the following 2 lines at the beginning of the
function VG_(is_watched) (in m_gdbserver.c):

   if (!gdbserver_called)
      return False;



> Also, it would be nice if vgctl supported the extended-remote mode,
> so in gdb I could:
> 
> (gdb) target extended-remote | vgctl
> (gdb) run
> 
> ... and have a new inferior start up under valgrind.
It looks not trivial or not possible to support extended-remote mode.
I understand the extended-remote allows to start a gdbserver and 
tell it afterward to attach to a process or to start an executable.
But as the gdbserver is embedded inside valgrind, there is no gdbserver
before you have started valgrind. Even if it would be possible to run
valgrind without specifying an executable to run, gdbserver would then have
to indicate to its "caller" which executable to run.


> 
> It may even be worthwhile to have a special "target valgrind"
> in gdb.  That way we could support I/O redirection and other
> things not supported (AFAIK) by extended-remote.

What do you mean exactly by I/O redirection and other things not supported by
extended-remote ?
Comment 6 Tom Tromey 2010-03-26 16:59:46 UTC
(In reply to comment #5)
> It looks not trivial or not possible to support extended-remote mode.

Oops, yeah, sorry about that.

What I'm really looking for is a way to make it very easy for
gdb users to run their program using valgrind without leaving
gdb.  Something like "target valgrind; run".
Comment 7 Tom Tromey 2010-03-26 21:46:06 UTC
Thanks for the follow-up patch.  I tried it and it seems to work great!
I think this is very cool and useful.

I did run into a few more bugs.

1.  In one window I ran valgrind --vgctl-check=1 ..../gdb
In other window I ran plain "gdb" then "target remote | vgctl".
This fails to do anything useful:

(gdb) target remote | vgctl
Remote debugging using | vgctl
relaying data between gdb and process 14124
0x0093f37b in ?? ()

I have to tell gdb about the file, like "gdb .../gdb" before
setting the target; otherwise gdb can't seem to figure out what
valgrind is running.

In this situation, if I then exit gdb, valgrind just uses a lot
of CPU but otherwise does nothing.  I have to kill it from the 
command line.

2. If I kill a valgrind which is hung in this way, it leaves its
fifo in /tmp.  Then gdb complains at startup:

target remote | vgctl(gdb) target remote | vgctl
Remote debugging using | vgctl
read directory: No such process
vgctl error: reading directory /tmp/ for vgctl fifo

This doesn't appear to affect operation, but it is potentially
confusing.

3. Another way to wedge valgrind is to "detach".  The valgrind
process reports:

Detaching from process 14136

... which is a little weird, since it is itself process 14136.
Again the only recourse is to kill valgrind.
Comment 8 Paul Pluzhnikov 2010-03-26 21:58:06 UTC
I should mention that I too played with Philippe's patch, and I think it will make VG 10x more useful when debugging non-obvious bugs.

I urge core VG developers to help get this patch into shape for submission!
Comment 9 Philippe Waroquiers 2010-03-26 23:26:40 UTC
(In reply to comment #7)

> 1.  In one window I ran valgrind --vgctl-check=1 ..../gdb
> In other window I ran plain "gdb" then "target remote | vgctl".
> This fails to do anything useful:
From what I can see, this is a basic limitation of gdb+gdbserver: when
trying with the non embedded gdbserver, gdb also can't figure out the
executable.

> In this situation, if I then exit gdb, valgrind just uses a lot
> of CPU but otherwise does nothing.  I have to kill it from the 
> command line.
I suspect that everything works properly (i.e. valgrind continues to
execute the program) but *extremely* slowly due to the --vgctl-check=1
argument: this means that every valgrind scheduler return causes a check
to see if gdbserver has something to do. This check is done by a select system call. So, basically, what you have is a select system call every few instructions. Try the same with e.g. --vgctl-check=5000.
Note that I intend to replace this use of select by checking a counter
in a piece of shared memory between vgctl and valgrind. The check will then
be quite cheap.



> vgctl error: reading directory /tmp/ for vgctl fifo
I will fix this. This is effectively harmless, it is just wrong error
reporting.

>
> 3. Another way to wedge valgrind is to "detach".  The valgrind
> process reports:
> 
> Detaching from process 14136
> 
> ... which is a little weird, since it is itself process 14136.
> Again the only recourse is to kill valgrind.
I think this is also caused by the too small value of --vgctl-check.
Can you retry with a bigger value ? (e.g. 5000) ?

Thanks for reporting these bugs.
Comment 10 Philippe Waroquiers 2010-03-28 21:12:47 UTC
Created attachment 42319 [details]
patch that implements a gdbserver in valgrind

new version of the patch
   * fixing the problems reported by Tom
   * replaced polling using "poll" by looking at a shared counter between vgctl
     and valgrind
   * implemented calling functions from gdb
         e.g.  p printf("hello from gdb\n")   works
     !!!! if VG detects errors during this call and so calls back
        gdbserver, it can sometimes go bezerk 
         so it is better to first do        
            mo vg.s v 100000

tested on fedora 12 x86, somewhat tested on fedora 12 amd64
Comment 11 Tom Tromey 2010-03-29 21:39:30 UTC
> From what I can see, this is a basic limitation of gdb+gdbserver: when
> trying with the non embedded gdbserver, gdb also can't figure out the
> executable.

I am not very familiar with gdbserver or the remote protocol, but it
seems to me that if we can transfer the shared library list, then we
can somehow also transfer the executable's name -- at the very least via
a new remote protocol extension.

The new patch does fix the bugs I found.  Thanks!

I found a couple more problems.

First, if I valgrind --vgctl-error=1, then when valgrind hits the
first error I can't interrupt it from the tty.  I think maybe C-c
should still kill it.

Second, if I use valgrind --vgctl-error=0, I get a valgrind crash
when I "target remote | vgctl" and then "cont" in gdb:

opsy. valgrind --vgctl-error=0 ~/gnu/baseline-gdb/build/gdb/gdb 
==30105== Memcheck, a memory error detector
==30105== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==30105== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
==30105== Command: /home/tromey/gnu/baseline-gdb/build/gdb/gdb
==30105== 

IR SANITY CHECK FAILURE

IRSB {
   t0:I32   t1:I32   t2:I32   t3:I32   t4:I32   t5:I32   t6:I32   t7:I32
   t8:I32   t9:I32   t10:I32   t11:I32   t12:I32   t13:I32   t14:I32   t15:I32
   t16:I32   t17:I32   t18:I32   t19:I32   t20:I32   t21:I32   t22:I1   t23:I32
   t24:I32   t25:I32   t26:I32   t27:I1   t28:I32   t29:I32   t30:I32   t31:I32
   t32:I1   t33:I32   t34:I32   t35:I32   t36:I1   t37:I32   t38:I32   t39:I1
   t40:I32   t41:I32   t42:I32   t43:I32   t44:I1   t45:I32   t46:I32   t47:I1
   t48:I32   t49:I32   t50:I32   t51:I1   t52:I32   t53:I32   t54:I1   t55:I32
   t56:I1   t57:I1   t58:I32   t59:I32   t60:I32   t61:I32   t62:I32   t63:I32
   t64:I32   t65:I1   t66:I1   t67:I32   t68:I32   t69:I32   t70:I32   t71:I32
   t72:I32   t73:I1   t74:I32   

   ------ IMark(0x8F16A1, 6) ------
   PUT(60) = 0x8F16A1:I32
   DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8F16A1:I32)
   t21 = GET:I32(60)
   t22 = CmpNE32(0x8F16A1:I32,t21)
   if (t22) goto {Boring} 0x3C:I16
   t23 = GET:I32(356)
   t10 = GET:I32(20)
   t24 = Or32(t23,0x0:I32)
   t25 = Left32(t24)
   t26 = t25
   t9 = Add32(t10,0xFFFFFF5C:I32)
   t27 = CmpNEZ32(t26)
   DIRTY t27 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
   t28 = 0x0:I32
   t29 = DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_LOADV32le)[rp=1]{0x38014d40}(t9)
   t30 = t29
   t11 = LDle:I32(t9)
   PUT(364) = t30
   PUT(28) = t11
   ------ IMark(0x8F16A7, 7) ------
   PUT(60) = 0x8F16A7:I32
   DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8F16A7:I32)
   t31 = GET:I32(60)
   t32 = CmpNE32(0x8F16A7:I32,t31)
   if (t32) goto {Boring} 0x3C:I16
   PUT(60) = 0x8F16A7:I32
   t33 = Or32(t30,0x0:I32)
   t34 = Left32(t33)
   t35 = t34
   t12 = Add32(t11,0xC:I32)
   t36 = CmpNEZ32(t35)
   DIRTY t36 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
   t37 = 0x0:I32
   DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_STOREV32le)[rp=2]{0x38014690}(t12,0x0:I32)
   STle(t12) = 0x1:I32
   ------ IMark(0x8F16AE, 5) ------
   PUT(60) = 0x8F16AE:I32
   DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8F16AE:I32)
   t38 = GET:I32(60)
   t39 = CmpNE32(0x8F16AE:I32,t38)
   if (t39) goto {Boring} 0x3C:I16
   PUT(60) = 0x8F16AE:I32
   t40 = GET:I32(352)
   t15 = GET:I32(16)
   t41 = Or32(t40,0x0:I32)
   t42 = Left32(t41)
   t43 = t42
   t14 = Sub32(t15,0x4:I32)
   PUT(352) = t43
   DIRTY 1:I1 RdFX-gst(16,4) ::: track_new_mem_stack_4[rp=1]{0x38008eb0}(t14)
   PUT(16) = t14
   t44 = CmpNEZ32(t43)
   DIRTY t44 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
   t45 = 0x0:I32
   DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_STOREV32le)[rp=2]{0x38014690}(t14,0x0:I32)
   STle(t14) = 0x8F16B3:I32
   ------ IMark(0x8FDC60, 1) ------
   PUT(60) = 0x8FDC60:I32
   DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8FDC60:I32)
   t46 = GET:I32(60)
   t47 = CmpNE32(0x8FDC60:I32,t46)
   if (t47) goto {Boring} 0x3C:I16
   PUT(60) = 0x8FDC60:I32
   t48 = Or32(t45,0x0:I32)
   t49 = Left32(t48)
   t50 = t49
   t16 = Sub32(t14,0x4:I32)
   PUT(352) = t50
   DIRTY 1:I1 RdFX-gst(16,4) ::: track_new_mem_stack_4[rp=1]{0x38008eb0}(t16)
   PUT(16) = t16
   t51 = CmpNEZ32(t50)
   DIRTY t51 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
   t52 = 0x0:I32
   DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_STOREV32le)[rp=2]{0x38014690}(t16,t23)
   STle(t16) = t10
   ------ IMark(0x8FDC61, 2) ------
   PUT(60) = 0x8FDC61:I32
   DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8FDC61:I32)
   t53 = GET:I32(60)
   t54 = CmpNE32(0x8FDC61:I32,t53)
   if (t54) goto {Boring} 0x3C:I16
   PUT(356) = t52
   PUT(20) = t16
   ------ IMark(0x8FDC63, 1) ------
   PUT(60) = 0x8FDC63:I32
   DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8FDC63:I32)
   t55 = GET:I32(60)
   t56 = CmpNE32(0x8FDC63:I32,t55)
   if (t56) goto {Boring} 0x3C:I16
   PUT(60) = 0x8FDC63:I32
   t57 = CmpNEZ32(t52)
   DIRTY t57 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
   t58 = 0x0:I32
   t59 = DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_LOADV32le)[rp=1]{0x38014d40}(t16)
   t60 = t59
   t5 = LDle:I32(t16)
   t61 = Or32(t58,0x0:I32)
   t62 = Left32(t61)
   t63 = t62
   t19 = Add32(t16,0x4:I32)
   PUT(352) = t63
   DIRTY 1:I1 RdFX-gst(16,4) ::: track_die_mem_stack_4[rp=1]{0x38012d60}(t19)
   PUT(16) = t19
   PUT(356) = t60
   PUT(20) = t5
   ------ IMark(0x8FDC64, 1) ------
   PUT(60) = 0x8FDC64:I32
   DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8FDC64:I32)
   t64 = GET:I32(60)
   t65 = CmpNE32(0x8FDC64:I32,t64)
   if (t65) goto {Boring} 0x3C:I16
   PUT(60) = 0x8FDC64:I32
   t66 = CmpNEZ32(t63)
   DIRTY t66 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
   t67 = 0x0:I32
   t68 = DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_LOADV32le)[rp=1]{0x38014d40}(t19)
   t69 = t68
   t8 = LDle:I32(t19)
   t70 = Or32(t67,0x0:I32)
   t71 = Left32(t70)
   t72 = t71
   t20 = Add32(t19,0x4:I32)
   PUT(352) = t72
   DIRTY 1:I1 RdFX-gst(16,4) ::: track_die_mem_stack_4[rp=1]{0x38012d60}(t20)
   PUT(16) = t20
   DIRTY 1:I1 ::: VG_(helperc_InvalidateIfNotGdbserved)[rp=1]{0x380649f0}(t8)
   t73 = CmpNEZ32(t69)
   DIRTY t73 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
   t74 = 0x0:I32
   goto {Return} t8
}

IN STATEMENT:

if (t22) goto {Boring} 0x3C:I16

ERROR = IRStmt.Exit.dst: not :: guest word type


vex: the `impossible' happened:
   sanityCheckFail: exiting due to bad IR
vex storage: T total 12935320 bytes allocated
vex storage: P total 576 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().
==30105==    at 0x38028DC5: report_and_quit (m_libcassert.c:191)
==30105==    by 0x38028E26: panic (m_libcassert.c:275)
==30105==    by 0x38028E7D: vgPlain_core_panic_at (m_libcassert.c:280)
==30105==    by 0x38028E98: vgPlain_core_panic (m_libcassert.c:285)
==30105==    by 0x38043E06: failure_exit (m_translate.c:674)
==30105==    by 0x380CA97E: vpanic (main_util.c:237)
==30105==    by 0x380CE3CC: sanityCheckFail (ir_defs.c:2245)
==30105==    by 0x380CFA98: sanityCheckIRSB (ir_defs.c:2793)
==30105==    by 0x380C9307: LibVEX_Translate (main_main.c:549)
==30105==    by 0x38041581: vgPlain_translate (m_translate.c:1518)
==30105==    by 0x38072351: vgPlain_scheduler (scheduler.c:866)
==30105==    by 0x3809F3C4: run_a_thread_NORETURN (syswrap-linux.c:94)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable
==30105==    at 0x8F16A1: dl_main (rtld.c:1627)
==30105==    by 0x903DAD: _dl_sysdep_start (dl-sysdep.c:243)
==30105==    by 0x8EF25B: _dl_start (rtld.c:328)
==30105==    by 0x8EE856: ??? (in /lib/ld-2.10.2.so)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.
Comment 12 Philippe Waroquiers 2010-03-30 00:14:13 UTC
(In reply to comment #11)
> > From what I can see, this is a basic limitation of gdb+gdbserver: when
> > trying with the non embedded gdbserver, gdb also can't figure out the
> > executable.
> 
> I am not very familiar with gdbserver or the remote protocol, but it
> seems to me that if we can transfer the shared library list, then we
> can somehow also transfer the executable's name -- at the very least via
> a new remote protocol extension.
Yes, probably changing gdb and/or the protocol will allow to do that.
I suppose that the limitation is at gdb side, in the "target remote":
probably this command assumes that it knows the executable file and so does
not search it on the side of gdbserver.



> First, if I valgrind --vgctl-error=1, then when valgrind hits the
> first error I can't interrupt it from the tty.  I think maybe C-c
> should still kill it.
The handling of signals is still a limitation but in any case, I think
this particular thing cannot be done:
I think what is happening is that valgrind is  blocking the signals 
while it is not executing client code.
When the gdbserver code is called, this is valgrind code being executed.
Once valgrind executes client code (so, after gdb has attached and told it
to continue), the signal will be delivered and will interrupt the process.
It is not clear to me there is a way to handle the "client" signals
while valgrind code is running. So, I believe this will have to stay
like that (but if a valgrind developper reads this, he/she might correct my
interpretation)


> 
> Second, if I use valgrind --vgctl-error=0, I get a valgrind crash
> when I "target remote | vgctl" and then "cont" in gdb:
...
> if (t22) goto {Boring} 0x3C:I16
> 
> ERROR = IRStmt.Exit.dst: not :: guest word type
That was a real mystery to me, till I checked the patch content.
I thought "svn diff" in valgrind directory would give the diffs for valgrind
and VEX, but it only gives the diff for valgrind, not for the VEX subdir.
So, I will attach a new patch with the VEX part included.
I am amazed that the half-patch is working at all.

Thanks for the tests
Comment 13 Philippe Waroquiers 2010-03-30 00:19:03 UTC
Created attachment 42364 [details]
patch that implements a gdbserver in valgrind (with VEX parts now)

no functional changes compared to previous patch, only including VEX part
+ better msg in call to sanityCheckFail.
Comment 14 Tom Tromey 2010-03-30 21:48:42 UTC
Thanks, this patch is much better :-)

I have been thinking a little about how to make the
gdb experience nicer.  In particular I am thinking
of adding a new "target valgrind", which will reuse
the remote code under the hood.

With a real "target valgrind", we could support "attach"
and "run" properly.  I think the latter may require a
small change to vgctl and valgrind (to make it possible
to run vgctl and have it start valgrind and attach to it
directly).

I'd appreciate feedback on this idea.  I could start work
on it next week.
Comment 15 Philippe Waroquiers 2010-03-31 00:08:56 UTC
(In reply to comment #14)
> With a real "target valgrind", we could support "attach"
> and "run" properly.  I think the latter may require a
> small change to vgctl and valgrind (to make it possible
> to run vgctl and have it start valgrind and attach to it
> directly).
> 
> I'd appreciate feedback on this idea.  I could start work
> on it next week.

If gdb could somewhat better understand the valgrind target, that
would be an excellent idea. 

Note however that up to now, there was no feedback on this work
from the (real :) valgrind developpers.

I think that *if* the approach/patch is found relevant, that there will be
some serious additional work to do before  gdbserver can be added in 
valgrind code basis:
* some bugs to chase and/or nice to have and/or mandatory : 
     1. a break is "encountered" twice if you 
       'print hello_world()' from gdb when the break is encountered.
        This has something to do with the addition of breaks by gdb at
        a place which is not an instruction boundary.
        I am busy investigating that.
     2. if you 'print hello_world()' from gdb, and the execution of
        this code gives valgrind errors to be reported, then the whole
        stuff crashes. The 'mo vg.s v 1000000' bypass works
        (seems to work?) but is quite horrible. Understanding why it
        crashes would be a good thing.
     3. Mandatory: the tests are for the moment quite basic and manual:
        only documented in the README_DEVELOPPERS. IMO, we must have
        automatic tests for all that.
     4. Mandatory: writing documentation.
     5. Signal handling is "nice to have".

So, the work to adapt gdb to a valgrind gdbserver which has
not yet been added in valgrind might be somewhat premature :).

But for sure, a good thing to put in the "to do list", so here it is:
     6. Have a better valgrind target support in gdb, allowing e.g.
        run or attach.

At this phase of the work, have feedback from the valgrind masters
(in particular, on the changes in VEX, scheduler.c, aspacemgr.c and the
global approach) would be an encouraging sign to start updating gdb.

Philippe
Comment 16 Tom Hughes 2010-03-31 00:51:42 UTC
I am reading all the discussion, but I was waiting for some indication that the work was complete before reviewing it and there seems to still be active discussion.

It sounds like there is stuff here that will need Julian's input as well though.
Comment 17 Philippe Waroquiers 2010-03-31 07:09:23 UTC
(In reply to comment #16)
> I am reading all the discussion, but I was waiting for some indication that the
> work was complete before reviewing it and there seems to still be active
> discussion.
The work is for sure not yet complete (see todo list in previous post).
But I think it is in a state at which feedback would be very useful.

Note that fully completing the work implies to have access 
to Darwin/AIX/ppc/(arm?) platforms. I only have access to an x86/fedora
and an intermittent access to an amd64/fedora.
Comment 18 Julian Seward 2010-04-16 12:08:19 UTC
Could someone please add a short summary of what a GDB remote debug
server stub does, and what basic functionality is required of the
process/system into which the stub is embedded?  I know nothing about
this.  And in particular I'd like to know how the stub interacts with
Valgrind's basic control flow for the simulated program.

I ask because I would like to figure out if this breaks any underlying
structural assumptions in Valgrind's virtual cpu, or in its handling
of threads, syscalls and signals.
Comment 19 Philippe Waroquiers 2010-04-17 18:34:05 UTC
(In reply to comment #18)

> I ask because I would like to figure out if this breaks any underlying
> structural assumptions in Valgrind's virtual cpu, or in its handling
> of threads, syscalls and signals.
Ultra short answer: when valgrind wants to be debugged by a gdb (typically
when an error is reported by m_errormgr.c), it does
read system calls to get commands from gdb, and gives replies using write
system call. At selected places in the valgrind code, some calls to gdbserver
are inserted to ask gdb what to do for some events (e.g. a signal).

Longer answer (the below is information which is in README_DEVELOPPERS of the
next patch version I will put here)
Functional and technical description of gdbserver in valgrind
-------------------------------------------------------------
gdb debugger typically is used to debug a process running
on the same machine : gdb uses system calls (such as ptrace) 
to fetch data from the process being debugged
or to change data in the process 
or interrupt the process 
or ...

gdb can also debug processes running in a different computer
(e.g. it can debug a process running on a small real time
boards).
gdb does this by sending some commands (e.g. using tcp/ip)
to a piece of code running on the remote computer. This piece
of code  (called a gdb stub in small boards, or gdbserver when
the remote computer runs an OS such as GNU/linux) will 
provide a set of commands allowing gdb to remotely debug the process.
Examples of commands are: "get the registers", "get the
list of running threads", "read len bytes at address xxxx", etc.
The definition of all these commands and the associated replies
the gdb remote serial protocol, which is documented in
Appendix D of gdb user manual.

gdb has a standalone gdbserver (a small executable) which 
implements this protocol and the needed system calls to
allow gdb to remotely debug process running on a linux or
MacOS or ...

The gdbserver code has been modified so as to link it with
valgrind and allow the valgrind guest process to be debugged
by a gdb speaking to this gdbserver embedded in valgrind.

The gdbserver functionality is activated with valgrind command
line options. If these options are not given, then the
impact on valgrind runtime is minimal: basically just check
the command line option to see that there is nothing to do
for what concerns gdb server: there is a 
"if gdbserver is active" check in the instrument function 
of the tool (if the tool is made "breakable and stepable",
and in the valgrind scheduler.

The gdbserver can be activated of two (non-exclusive) manners:
   * --vgdb-error=xxxx
       allows gdb to debug the guest when an error is reported
     The xxxx gives the error nr from which valgrind will call
     the gdbserver code. This allows to debug either right from
     the begin (give 0 as xxxx) or from a specific error
     onwards.
 and/or
   * --vgdb-check=yyyyy
     allows gdb to "interrupt" the guest so as to debug it.
     The scheduler will check every yyyyy blocks if gdbserver
     wants to interrupt the guest to debug it.
     Note that the check is not doing a system call: it verifies
     a counter in shared memory. Whenever gdb writes something
     to valgrind, this counter is incremented.
     The same mechanism can be used without gdb:
     a user request (e.g. "search leaks", "dump callgrind file", ...
     can be given to the guest using a small standalone executable.

How does gdbserver code interacts with valgrind ?
When an error is reported, the gdbserver code is called.
It reads commands from gdb using read system call on a FIFO
(e.g. a command such as "get the registers").
It executes the command (i.e. fetches the registers from the
guest state) and writes the reply (a packet containing
the register data).
When gdb instructs gdbserver to "continue", the control is returned
to valgrind, which then continues to execute guest code.

How are signals "handled" ? When a signal is to be given to the
guest, valgrind first calls gdbserver. If gdb instructs to give
the signal to the process, the signal is delivered to the guest.
Otherwise, the signal is ignored (not given to the guest).

How are "break/step/stepi/next/..." implemented ?
When a break is put by gdb on an instruction, a command
is sent to the gdbserver in valgrind. This causes the basic block
of this instruction is re-instrumented so as to insert calls
to dirty helper which call the gdb server code.
When a block is instrumented for gdbserver, all the "jump targets"
of this block are invalidated, so as to allow step/stepi/next to
properly work: these blocks will themselves automatically be
re-instrumented for gdbserver if they are jumped to.

The embedded gdbserver will read gdb commands on a named pipe having
(by default) the name   /tmp/vgdb-pipe-from-gdb-to-%d
where %d will be replaced by the pid.
The embedded gdbserver will reply to gdb commands on a named pipe
/tmp/vgdb-pipe-to-gdb-from-%d

gdb does not speak directly with gdbserver in valgrind: a relay application
called vgdb is needed between gdb and the valgrind-ified process.
gdb writes commands on the stdin of vgdb. vgdb reads these
commands and writes them on FIFO /tmp/vgdb-pipe-from-gdb-to-%d.
vgdb reads replies on FIFO /tmp/vgdb-pipe-to-gdb-from-%d and writes
them on its stdout. 
    Note: The solution of named pipes was preferred to tcp ip connections as it allows
    a discovery of which valgrind-ified processes are ready to accept command
    by looking at the /tmp. Also, the usual unix protections are protecting 
    the valgrind process against other users sending commands.
gdb side is all setup up by the following gdb command:
   (gdb) target remote | vgdb --pid=1234
assuming the valgrind-ified process you want to debug/attach to has 1234 as pid.
If there is only one valgrind-ified process to be debugged, vgdb can find it
so it is enough to do:
   (gdb) target remote | vgdb



Implementation is based on the gdbserver code from gdb
------------------------------------------------------
This directory and sub-directories contain files needed for a gdbserver
embedded in valgrind. This embedded gdbserver allows a valgrind-ified process
to be debugged with an external gdb.

The gdbserver implementation is derived from the gdbserver included
in the gdb distribution. The modifications to the gdbserver are quite
trivial and have been done so as to minimise the differences implied 
for integrating gdbserver in valgrind. This allows either to include 
the changes in the official gdbserver upstream or at least allows an
easier comparison (and merge) between a newer gdb gdbserver and the 
version in valgrind.
Such minimal changes have been ensured by using the preprocessor
to replace calls by valgrind equivalent, e.g. #define read(...) VG_(read) (...).
Minimal changes have also be ensured by using "goto" at 3 places, rather
than split a big function in different functions.
The only non-trivial changes are due to the fact that gdbserver inside valgrind
must return the control to valgrind when the 'debugged' process has to run,
while in a classical gdbserver usage, the gdbserver process waits for a debugged
process to stop on a break or similar.
This has implied to have some variables to remember the state of gdbserver
before returning to valgrind (search for int resume_phase in server.c)
and "goto" the place where gdbserver expects a stopped process to return
control to gdbserver.
The above allowed to obtain relatively easily a working gdbserver stub
in valgrind with relatively few modifs in the gdbserver code itself.
However, this has also revealed that there is quite some significant
infrastructure framework useful for the standalone gdbserver of gdb, but which
is not useful for a gdbserver embedded in valgrind.
So, I will see if we can remove some big pieces of code which provides
things not useful for valgrind.
Comment 20 Philippe Waroquiers 2010-04-17 18:48:17 UTC
Created attachment 42850 [details]
patch that implements a gdbserver in valgrind

Functional changes:
 * signal handling properly implemented (i.e. gdb "handle" command)
 * vgctl replaced everywhere by vgdb (shorter to type and more clear)
 * info threads gives additional info about the state of valgrind threads
 * massif has been made breakable/stepable + use command to take a snapshot
 * gdb can look and modify the shadow registers
      (this implies to have the svn version of gdb)
Comment 21 Josef Weidendorfer 2010-04-20 20:08:34 UTC
> * --vgdb-check=yyyyy
>   allows gdb to "interrupt" the guest so as to debug it.
>   The scheduler will check every yyyyy blocks if gdbserver
>   wants to interrupt the guest to debug it.
>   Note that the check is not doing a system call: it verifies
>   a counter in shared memory. Whenever gdb writes something
>   to valgrind, this counter is incremented.
>   The same mechanism can be used without gdb:
>   a user request (e.g. "search leaks", "dump callgrind file", ...
>   can be given to the guest using a small standalone executable.

Ah, so gdbserver is another feature needing to handle asynchronous
requests from the outside. You found a slightly better way to handle
this than callgrind does now (using shared memory without system call).

However, this does not solve my basic issue with the current
implementation: if VG is sleeping in some system call, no BBs
are executed, thus VG is unable to react on user requests from
the outside.

Perhaps it is possible to abuse a real time signal for this together with
incrementing of the counter? If the counter is not increased, the signal
would be forwarded to the client application.
Comment 22 Philippe Waroquiers 2010-04-20 21:48:51 UTC
(In reply to comment #21)

> Ah, so gdbserver is another feature needing to handle asynchronous
> requests from the outside. You found a slightly better way to handle
> this than callgrind does now (using shared memory without system call).
> 
> However, this does not solve my basic issue with the current
> implementation: if VG is sleeping in some system call, no BBs
> are executed, thus VG is unable to react on user requests from
> the outside.
Effectively, there is no big difference between the callgrind way to
be "interrupted" and what is used for gdbserver. The two main differences are:
   * the polling done in scheduler is just checking a counter in shared mem
   * the mechanism is done at a central place, allowing "user commands"
     to be written to any tool (via the tool "handle_client_request")
     (currently, I have added external command handling in memcheck
      and massif).

> 
> Perhaps it is possible to abuse a real time signal for this together with
> incrementing of the counter? If the counter is not increased, the signal
> would be forwarded to the client application.
Currently, there is only one counter: the counter is increased by gdb (more
precisely by the vgdb relay application) and is only looked by valgrind
scheduler. Sending a signal to wake up the process if it does not
react fast enough would imply to have two counters: 
one written by vgdb and read by valgrind scheduler,
the other counter written by valgrind scheduler and read by vgdb.
This is quite easy to do.
The tricky part looks to be the signal: if vgdb sends a signal to
valgrind, it will interrupt the "user system call". Valgrind has then
to ensure that the system call is properly restarted.
At least for the initial implementation of gdbserver, I wanted to limit the
interaction between valgrind and gdbserver to the strict mininum.
In particular, signal handling looks somewhat adventurous.

Note that if we can manage to wake up valgrind with a signal, I am not sure
we still need the shared memory stuff.

But clearly, the "blocked in system call" is an annoying limitation.

If/when this gdbserver stuff is integrated in valgrind, then it would
be a good idea to try to make the "interrupt the tool" better.
Comment 23 Philippe Waroquiers 2010-04-27 23:53:18 UTC
Created attachment 43068 [details]
patch that implements a gdbserver in valgrind

Functional changes:
  * gdb/vgdb can now properly "interrupt" a valgrind process that has
    all its threads blocked in a syscall (NB: this is not using a signal
    so guest system calls should not be influenced/EINTR-upted
  * callgrind user commands to dump/zero callgrind statistics
    (so, Josef, you might try to dump stats using:
              vgdb ct.dump 
       or use vgdb ct.zero to zero_all_cost)
       and that should properly dump the stats of a syscall blocked process)

(you need a recent linux kernel to have the above working.
See coregrind/m_gdbserver/README_DEVELOPPERS for some more info)
Comment 24 Philippe Waroquiers 2010-05-09 22:09:06 UTC
Created attachment 43412 [details]
patch for a gdbserver in valgrind ("1.0 release ?")

Functionally, I think this patch version can be a "1.0" release.
(see below a short description of the latest changes).
Technically, it has been tested using small tests, and on two
big applications (a.o. firefox 3.5).

If found relevant to integrate in valgrind, to improve the technical
aspects, it would be nice to have

   * feedback about the GPL3 versus GPL2 (to see if I must or not 
     rework on the basis of gdbserver 6.6).

   * a review (e.g. of the changes in the valgrind tools
     and coregrind). Review of the gdbserver protocol files is probably
     less critical : I will further cleanup all this once the GPL3/GPL2
     is clarified.

   * feedback about the idea to automate the gdb/valgrind tests in vgreg_test
     using the perl expect module (see unanswered post on the 13 April
     on valgrind developpers list).

--------------------------------------
User visible changes in the latest patch:
* valgrind vgdb args somewhat reworked
    --vgdb=no|yes             activate external command and gdb server? [no]
    --vgdb-poll=<number>      poll for gdb/vgdb every <number> basic blocks [5000] 
    --vgdb-error=<number>     wait for gdb/vgdb after <number> errors [1] 
    --vgdb-shadow-registers=no|yes   let gdb see the shadow registers [no]
    --vgdb-prefix=<prefix>    prefix for vgdb FIFOs [/tmp/vgdb-pipe]
* default length value of 1 byte for mc user memory related commands
   (e.g. get_vbits)
* mc.check_memory now describes the address being checked

Most significant technical changes
* simplified the interface to instrument for gdbserver: only one
  function call needed in the tool instrument function.
* replaced the read/write threads in vgdb.c by poll system call.
* various bug fixes
Comment 25 Tom Hughes 2010-06-15 15:56:58 UTC
I've had a (fairly) quick look over the core stuff - that is to say the changes to coregrind mainly but not including the actual m_gdbserver code.

Some major questions:

* I don't know what Julian thinks, but I think the idea of a "string request" is
  probably a bit too generic - a "gdb command" client request might be better.
* I think we need to think about the select/poll interface in m_libcfile as
  the select interface in particular is a pretty kludgy clone of the system
  header files. In fact, is it even used? Only GDBSERVER_USE_POLL seems to
  be defined which makes everything use poll instead of select?
* I'm not sure exposing VG_(vgdb_next_poll) outside the scheduler is a good
  idea - maybe have VG_(disable_vgdb_polling) and VG_(force_vgdb_poll) routines
  in scheduler instead to hide details?
* What do the changed in VG_(show_all_errors) have to do with gdb support?
* Test is is_sig_ign should be reversed as calling gdb to ask about the signal
  is much more expensive that checking a local variable.

Minor points:

* Both VG_(clo_vgdb_shadow_registers) and VGDB_DEFAULT are defined in
  pub_tool_options.h but neither is used by any tools so pub_core_options.h
  would be more appropriate.
* VGDB_DEFAULT should probably be VG_CLO_VGDB_PREFIX_DEFAULT or something.
* You seem to have removed the full stop from the end of the comment that
  appears before VG_(strncpy_safely) in pub_tool_libcbase.h
* I'm not sure it's necessary to explain VG_(strtok) and VG_(strtok_r) in
  quite such detail in pub_tool_libcbase.h if they behave the same as the
  normal C library versions.
* The "is this the good place" comment in valgrind.h needs to be removed.
* If VG_(mknod) needs to be able to return error details it should return
  a SysRes object, otherwise just -1 on any error.
* There seems to be no need for VG_(fd_open) as it just calls VG_(open).
* Both VG_(strtoull10) and VG_(sttrtoull16) should return ULong not UWord
  so that they match the unsigned versions.
* In VG_(synth_sigill) etc there is no need to have the resume_scheduler
  call in both branches of the if - just put it before the if.
Comment 26 Philippe Waroquiers 2010-06-15 23:17:37 UTC
(In reply to comment #25)
> 
> * I don't know what Julian thinks, but I think the idea of a "string request" is
>   probably a bit too generic - a "gdb command" client request might be better.
    VG_USERREQ__STRING_REQUEST = 0x1202, /* GDBTD??? is this the good place */
could be changed to
    VG_USERREQ__GDB_MONITOR_COMMAND = 0x1202,

> * I think we need to think about the select/poll interface in m_libcfile as
>   the select interface in particular is a pretty kludgy clone of the system
>   header files. In fact, is it even used? Only GDBSERVER_USE_POLL seems to
>   be defined which makes everything use poll instead of select?
1st implementation was based on select, was replaced by poll as the select
gave problem on a "google modified ubuntu" (which has a huuuuge nr of fd).
As poll system call is available on MacOS, I will remove the select interface
and the GDBSERVER_USE_POLL config symbol.

> * I'm not sure exposing VG_(vgdb_next_poll) outside the scheduler is a good
>   idea - maybe have VG_(disable_vgdb_polling) and VG_(force_vgdb_poll) routines
>   in scheduler instead to hide details?
Good idea, I will look at this.

> * What do the changed in VG_(show_all_errors) have to do with gdb support?
The vgdb/gdb monitor command "vg.info all_errors" is implemented by calling
VG_(show_all_errors). But this function was up to now only called once
(at the end of the run of VG). If called multiple times (interactively
in gdb or by vgdb), after the 1st call, VG_(show_all_errors) does not
output the errors anymore. The changes done allows VG_(show_all_errors) to be
called more than once by ensuring the call has no side effect on the errors.

> * Test is is_sig_ign should be reversed as calling gdb to ask about the signal
>   is much more expensive that checking a local variable.
I will reverse the condition order.
(note that unless a gdb is connected and is really interested in seeing
a specific signal, the call to VG_(gdbserver_report_signal) will not do any interaction
with gdb).

 
> Minor points:
All minor points looks clear to me, except the below.

> * In VG_(synth_sigill) etc there is no need to have the resume_scheduler
>   call in both branches of the if - just put it before the if.
The call to resume_scheduler will (at least sometimes) do a longjmp. 
So I think the signal check with gdbserver must be done before the call to 
resume_scheduler.
But re-reading m_signals.c, I am not sure I understand how all this works:
sometimes, we first have a call to deliver_signal, followed by resume_scheduler,
and sometimes it is the other way around. I do not understand the logic for choosing
one order or the other.
Comment 27 Philippe Waroquiers 2010-08-09 20:52:34 UTC
Created attachment 49953 [details]
new version : based on GPLv2+ code, handled Tom Hughes comments

new version of the patch
  * handled Tom Hughes comments
  * done GPLv2+ switch asked by Julian  (so, based on gdb 6.6 code)
  * some code clean up and fixes
Comment 28 Philippe Waroquiers 2011-03-20 15:52:06 UTC
Created attachment 58192 [details]
gdbserver patch, with regression tests and user manual
Comment 29 Philippe Waroquiers 2011-03-20 15:55:28 UTC
New version of the gdbserver patch (most important changes
are regression tests, user manual (read e.g. sections 2.7, 2.8, 2.9,
and the 'tool monitor commands' e.g. 4.6).

Regression tests have been run on a fedora 12 32 bits, fedora 14 64 bits,
ubuntu 10.10 32 bits, red-hat 5.2 64 bits. Tests done with gdb 7.2,
some tests redone with gdb 7.0.
A previous version was compiled and manually tested on MacOS (regression tests
not run).

ARM, s390x, ppc32/ppc64 : some additional work needed to make it work
(mostly a "switch" mapping gdb registers to valgrind registers + dummy call push
in vgdb).

Read user manual and coregrind/m_gdbserver/README_DEVELOPPERS and
gdbserver_tests/README_DEVELOPPERS for more info.

How to use the patch:
mkdir patch_trial
cd patch_trial
svn co -r 11657 svn://svn.valgrind.org/valgrind/trunk valgrind
cd valgrind
patch -p0 -i /tmp/patch.txt
./autogen.sh 
./configure --prefix=`pwd`/install
make 2>&1 | tee m.out
cd docs
make html-docs
cd ..
make install 2>&1 | tee -a m.out
# can use the installed valgrind having gdbserver patch e.g.
install/bin/valgrind --help
# and read the installed doc:
firefox install/share/doc/valgrind/html/index.html
# e.g. Valgrind gdbserver related user manual sections 2.7, 2.8, 2.9, 
# and the 'tool monitor commands' e.g. 4.6

# to run the new gdbserver_tests, empty files are not
# properly created by svn diff + patch (bug in patch for empty files)
# So, create them by touch all the below files (possibly svn add them)
cd gdbserver_tests
touch \
./mcbreak.stderr.exp \
./mchelp.stderr.exp \
./mcinfcallRU.stderrB.exp \
./mcinvokeRU.stderr.exp \
./mcinvokeWS.stderr.exp \
./mcsignopass.stderrB.exp \
./mcsigpass.stderrB.exp \
./mcvabits.stderr.exp \
./mcwatchpoints.stderrB.exp \
./mssnapshot.stderr.exp \
./nlcontrolc.stderrB.exp
# it is also needed to make some files executables 
# (possibly svn propset the executable property)
chmod +x filter_* make_local_links simulate_control_c invoker
cd ..
make regtest 2>&1 | tee r.out
Comment 30 Florian Krohm 2011-03-20 17:28:09 UTC
Created attachment 58193 [details]
patch for s390x compile

Patch that is needed to get a successful compilation on s390x.
Comment 31 Philippe Waroquiers 2011-04-11 21:21:08 UTC
Created attachment 58809 [details]
gdbserver+test+documentation. x86+amd64+arm+ppc32+ppc64+s390x

* ported + tested on linux + x86/amd64/arm/ppc32/ppc64/s390x

* Darwin: not compiled recently, never reg-tested.
          basic functionalities should work (were tested manually)
Comment 32 Philippe Waroquiers 2011-04-16 13:42:23 UTC
Created attachment 59040 [details]
Upgraded to rev 11700 + compiles on Darwin + small changes (e.g. new ppc registers, doc, ..)

All linux platforms should be ok.

Darwin: it compiled on Darwin 10.7.0.
I had to disable the fixup_macho_loadcmds which crashed and seemed
not necessary (compiled with gcc on darwin):
link_tool_exe_darwin: ../coregrind/fixup_macho_loadcmds 0x134000000 0x800000 memcheck-amd64-darwin
fixup_macho_loadcmds: requested stack_addr (top) 0x134000000, stack_size 0x800000
fixup_macho_loadcmds: examining tool exe: memcheck-amd64-darwin
fixup_macho_loadcmds: fail: unexpected load command in Mach header


Limited manual test of gdbserver shows it seems ok.
However, many gdbserver reg tests are not ok on Darwin (behaviour differs
from linux systems for e.g. stack trace in gdb) but I have only a very
limited access to a Mac, so not time to update the tests for Darwin.
Comment 33 Julian Seward 2011-04-19 17:33:46 UTC
(In reply to comment #32)
> Created an attachment (id=59040) [details]
> Upgraded to rev 11700 + compiles on Darwin

This is a big changeset.  I started out by looking at only the changed
files, not the new ones, by applying the patch to trunk, then doing
"svn diff -rHEAD".  Here are some comments related to the changed
files only.  I did not yet look at the new files.


Larger concerns
---------------

Mostly looks OK.  My only large concern is that it looks like this
patch also adds some Memcheck leak checker functionality and therefore
is not directly related to implementing a GDB server.  Can you clarify
about this?  If this does add leak checker functionality, it would be
better to add that as a separate patch/bug report that can be
reviewed/applied separately.


Smaller comments
----------------

callgrind/main.c
 #include <pub_tool_threadstate.h>
+#include "pub_tool_gdbserver.h"

first one looks like it was always wrong -- < > vs " "
(no problem w/ the patch)


mc_main.c:
why did parse_range get globalised (--> VG_(parse_range) ) ?
probably need to rename parse_range to something better


mc_get_or_set_vbits_for_client: check this isn't
performance critical (on load/store/sarp paths)
(I don't think it is)
change param name client_request to is_client_request


+static Bool handle_gdb_monitor_command (ThreadId tid, Char *req)
+{
+   Char* wcmd;
+   Char s[VG_(strlen(req))]; /* copy for strtok_r */
dangerous; possible DOS if strlen(req) is huge?  overflow check?


+      case VG_USERREQ__GDB_MONITOR_COMMAND: {
+         /* we need to avoid calling gdbserver when handling the 
+            gdb_monitor_command so we temporarily disable clo_vgctl_error
+            effect */
+         Int   save_clo_vgdb_error = VG_(clo_vgdb_error); 
+         VG_(clo_vgdb_error) = 999999999;
+         Bool handled = handle_gdb_monitor_command (tid, (Char*)arg[1]);
+         VG_(clo_vgdb_error) = save_clo_vgdb_error;
seems like a nasty kludge (passing a parameter as a global
variable?)  can it be changed to be a parameter
to handle_gdb_monitor_cmd ?


memcheck/mc_translate.c:
changes are unnecessary (look at the diff!)


include/pub_tool_libcbase.h
new fn VG_(memmem) ?  what does it do??  At least add some
comment.
rm __builtin_expect, use LIKELY



coregrind/m_aspacemgr.c
Why are changes needed here?  m_aspacemgr is quite fragile; I
prefer not to change it unless absolutely necessary.



coregrind/m_scheduler
+   /* do vgdb initialization (but once). Only the first (main) task
+      starting up will do the below. */
+   if (!vgdb_startup_action_done) {
Urr .. couldn't this initialisation be moved into one of the
functions that already exist, to initialise the scheduler?
VG_(scheduler_init_phase1) or VG_(scheduler_init_phase2)



coregrind/m_libcproc.c
+Int VG_(prctl) (Int option, 
+                ULong arg2, ULong arg3, ULong arg4, ULong arg5)
What's prctl used for (in the GDB server)?  IOW, why do we
need it?
Comment 34 Philippe Waroquiers 2011-04-20 00:20:43 UTC
(In reply to comment #33)

> Larger concerns
> ---------------
> 
> Mostly looks OK.  My only large concern is that it looks like this
> patch also adds some Memcheck leak checker functionality and therefore
> is not directly related to implementing a GDB server.  Can you clarify
> about this?  If this does add leak checker functionality, it would be
> better to add that as a separate patch/bug report that can be
> reviewed/applied separately.
An objective of the gdbserver is to provide an interactive usage of Valgrind.
Interactive leak "delta" searching (at least on big applications) is
only usable when it provides a "delta leak search", or "delta malloc"
i.e. showing new leaks or new malloc since the previous leak search (see user
manual 4.6 Memcheck Monitor Commands).
The leak search algorithm itself is not touched, only the reporting of results.
But if you deem it necessary, it can for sure be splitted and looked at
it later (but it implies to remove the first arg of mc.leak_check command).


> callgrind/main.c
>  #include <pub_tool_threadstate.h>
> +#include "pub_tool_gdbserver.h"
> 
> first one looks like it was always wrong -- < > vs " "
> (no problem w/ the patch)

I can fix the < > (unless you prefer to do that in a separate patch).

> mc_main.c:
> why did parse_range get globalised (--> VG_(parse_range) ) ?
> probably need to rename parse_range to something better
I thought globalising together parse_Addr and parse_range was a good idea,
but there is only one caller of parse_range so globalising is not needed.

I can either unglobalise it
             or rename it VG_(parse_Addr_range).
As you prefer ...

> 
> 
> mc_get_or_set_vbits_for_client: check this isn't
> performance critical (on load/store/sarp paths)
> (I don't think it is)
There are now 3 calls (non performance critical) to this function : 
the previous 2 calls (client user requests get/set vbits)
+ a new call (gdbserver mc.get_vbits monitor command).

> change param name client_request to is_client_request
done.

> 
> 
> +static Bool handle_gdb_monitor_command (ThreadId tid, Char *req)
> +{
> +   Char* wcmd;
> +   Char s[VG_(strlen(req))]; /* copy for strtok_r */
> dangerous; possible DOS if strlen(req) is huge?  overflow check?

"real" DOS implies collaboration of the VG user (as the access to
Valgrind gdbserver is protected using the FIFO rw permissions, which by
the way is better than the "classical" gdbserver, which is not protected).

gdb checks the packet length before sending it (max slightly above 8Kb).
I just added a packet length check in vgdb.
These commands are normally small 
(e.g.  "mc.leak_check increased leakpossible").


> +         Int   save_clo_vgdb_error = VG_(clo_vgdb_error); 
> +         VG_(clo_vgdb_error) = 999999999;
> +         Bool handled = handle_gdb_monitor_command (tid, (Char*)arg[1]);
> +         VG_(clo_vgdb_error) = save_clo_vgdb_error;
> seems like a nasty kludge (passing a parameter as a global
> variable?)  can it be changed to be a parameter
> to handle_gdb_monitor_cmd ?

The idea behind that is that if a monitor command causes an error to be
reported, that it does not causes gdbserver to be called again (too
confusing).
But all the various paths that can lead to an error being reported can't all
get an additional argument.
In any case, it is better to have this clo_vgb_error (somewhat nasty
kludge) be centralised in a place common to all tools (which I will do).

> memcheck/mc_translate.c:
> changes are unnecessary (look at the diff!)
Cleaned up.

> include/pub_tool_libcbase.h
> new fn VG_(memmem) ?  what does it do??  At least add some
> comment.
I have removed it (a previous version of gdbserver needed a "locate substring").


> coregrind/m_aspacemgr.c
> Why are changes needed here?  m_aspacemgr is quite fragile; I
> prefer not to change it unless absolutely necessary.

This allows to map the shared memory used to communicate with vgdb.c.
There is no change in the logic of the address space manager.
The changes are :
   - allow to map a file MAP_SHARED (while it was just allowing MAP_PRIVATE)
   - make it visible by the core (to be called by a gdbserver file).



> coregrind/m_scheduler
> +   /* do vgdb initialization (but once). Only the first (main) task
> +      starting up will do the below. */
> +   if (!vgdb_startup_action_done) {
> Urr .. couldn't this initialisation be moved into one of the
> functions that already exist, to initialise the scheduler?
> VG_(scheduler_init_phase1) or VG_(scheduler_init_phase2)
From my memory: trying to init gdbserver earlier was not ok, as
it would "show" to gdb a process without a valid thread.
I will however look at this more in depth, and add a comment
explaining the reason (or initialize it earlier if I see it is ok)



> coregrind/m_libcproc.c
> +Int VG_(prctl) (Int option, 
> +                ULong arg2, ULong arg3, ULong arg4, ULong arg5)
> What's prctl used for (in the GDB server)?  IOW, why do we
> need it?

This is needed to (as best as possible) handle the recent ubuntu
ptrace_scope restriction.
See user manual 2.7.7 limitations of Valgrind gdbserver 
  (the bullet about ptrace).
Comment 35 Julian Seward 2011-04-20 12:40:45 UTC
(In reply to comment #34)
> Interactive leak "delta" searching (at least on big applications) is
> only usable when it provides a "delta leak search", or "delta malloc"
> i.e. showing new leaks or new malloc since the previous leak search (see user
> manual 4.6 Memcheck Monitor Commands).

Yes .. it is useful, and something similar is proposed for Massif (bug
269067).  

> But if you deem it necessary, it can for sure be splitted and looked at
> it later

I would prefer it to be split and made into a separate patch (which
for sure we should also integrate before 3.7.0).


> > first one looks like it was always wrong -- < > vs " "
> > (no problem w/ the patch)

I'll fix this in trunk.


> > mc_main.c:
> > why did parse_range get globalised (--> VG_(parse_range) ) ?
> > probably need to rename parse_range to something better
> I thought globalising together parse_Addr and parse_range was a good idea,
> but there is only one caller of parse_range so globalising is not needed.
> 
> I can either unglobalise it

Unglobalise pls.  Having stuff global, that doesn't need to be global,
is confusing.


> > mc_get_or_set_vbits_for_client: check this isn't
> > performance critical (on load/store/sarp paths)
> > (I don't think it is)
> There are now 3 calls (non performance critical) to this function : 
> the previous 2 calls (client user requests get/set vbits)
> + a new call (gdbserver mc.get_vbits monitor command).

ok.


> > +         Int   save_clo_vgdb_error = VG_(clo_vgdb_error); 
> > +         VG_(clo_vgdb_error) = 999999999;
> > +         Bool handled = handle_gdb_monitor_command (tid, (Char*)arg[1]);
> > +         VG_(clo_vgdb_error) = save_clo_vgdb_error;
> > seems like a nasty kludge (passing a parameter as a global
> > variable?)  can it be changed to be a parameter
> > to handle_gdb_monitor_cmd ?
> 
> The idea behind that is that if a monitor command causes an error to be
> reported, that it does not causes gdbserver to be called again (too
> confusing).
> But all the various paths that can lead to an error being reported can't all
> get an additional argument.
> In any case, it is better to have this clo_vgb_error (somewhat nasty
> kludge) be centralised in a place common to all tools (which I will
> > do).

The VG_(clo_xxx) variables are intended to hold information from the
command line options, set once at startup, and not changed after that.

Can you do this by exporting from m_gdbserver (in
pub_core_gdbserver.h) a global variable that has this role?  It's not
good for the clo_ variables to hold state that changes as the system
runs.


> > coregrind/m_aspacemgr.c
> > Why are changes needed here?  m_aspacemgr is quite fragile; I
> > prefer not to change it unless absolutely necessary.
> 
> This allows to map the shared memory used to communicate with vgdb.c.
> There is no change in the logic of the address space manager.
> The changes are :
>    - allow to map a file MAP_SHARED (while it was just allowing MAP_PRIVATE)
>    - make it visible by the core (to be called by a gdbserver file).

ok.

> 
> 
> 
> > coregrind/m_scheduler
> > +   /* do vgdb initialization (but once). Only the first (main) task
> > +      starting up will do the below. */
> > +   if (!vgdb_startup_action_done) {
> > Urr .. couldn't this initialisation be moved into one of the
> > functions that already exist, to initialise the scheduler?
> > VG_(scheduler_init_phase1) or VG_(scheduler_init_phase2)
> From my memory: trying to init gdbserver earlier was not ok, as
> it would "show" to gdb a process without a valid thread.
> I will however look at this more in depth, and add a comment
> explaining the reason (or initialize it earlier if I see it is ok)

Initialize earlier is my preference.  Does it work if you put it
in VG_(scheduler_init_phase2) ?  I think it should.


> > coregrind/m_libcproc.c
> > +Int VG_(prctl) (Int option, 
> > +                ULong arg2, ULong arg3, ULong arg4, ULong arg5)
> > What's prctl used for (in the GDB server)?  IOW, why do we
> > need it?
> 
> This is needed to (as best as possible) handle the recent ubuntu
> ptrace_scope restriction.
> See user manual 2.7.7 limitations of Valgrind gdbserver 
>   (the bullet about ptrace).

ok.
Comment 36 Julian Seward 2011-04-20 13:09:57 UTC
I just looked briefly though the files in coregrind/m_gdbserver.
No significant comments, only the following:

valgrind_low.h

pls add copyright notice


m_gdbserver.c
/* True means gdbserver is not interested in this signal, so
   client signal action can be done without involving gdbserver.
   Otherwise, gdbserver should be called */
extern int pass_signals[];

move to a header file?
Comment 37 Philippe Waroquiers 2011-04-21 15:12:46 UTC
Created attachment 59184 [details]
handled review comments + upgraded to rev 11705

* Updated to revision 11705, VEX 2127.
* Changed argname client_request of mc_get_or_set_vbits_for_client 
  to is_client_request
* centralised disabling of VG_(clo_vgdb_error) during monitor command handling
  + defined VG_(dyn_vgdb_error) to avoid modifying VG_(clo_vgdb_error)
* removed the mc_translate.c irrelevant diffs (useless include, white space)
* removed useless VG_(memmem) function
* unglobalised parse_range back to mc_main.c
* some improvements and fixes in comments, READMEs, doc.
* added copyright header to valgrind_low.h
* pass_signals array defined in server.h
* extracted the "delta" leak (to be put in a subsequent patch)
* added a comment in VG_(scheduler) explaining why the gdbserver
  is initialized there.
* implemented protection in vgdb.c against huge monitor commands
Comment 38 Julian Seward 2011-04-26 13:33:51 UTC
(In reply to comment #37)
> Created an attachment (id=59184) [details]
> handled review comments + upgraded to rev 11705

Thanks for the respin.  Looks better.

One thing though: after applying the patch, doing

   chmod +x ./gdbserver_tests/make_local_links
   chmod +x gdbserver_tests/filter_stderr
   chmod +x gdbserver_tests/filter_gdb 
   chmod +x gdbserver_tests/filter_make_empty 
   chmod +x gdbserver_tests/filter_memcheck_monitor 
   chmod +x gdbserver_tests/filter_vgdb 

rebuilding and then doing "make regtest", I get a 
failure like this

$ make regtest
[...]
make[1]: Leaving directory `/home/sewardj/VgTRUNK/gdbsr'
gdbserver_tests/make_local_links /usr/bin/gdb
gdbserver eval tests suppressed as /usr/bin/gdb version is not >= 7.2:  GNU gdb (GDB) 7.1-ubuntu
/usr/bin/perl tests/vg_regtest gdbserver_tests memcheck cachegrind callgrind massif lackey none helgrind drd exp-ptrcheck exp-bbv exp-dhat
-- Running  tests in gdbserver_tests -----------------------------------
mcbreak:         valgrind   --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcbreak ./t  (progB: ./gdb --quiet -l 10 --nx ./t)
Could not find `mcbreak.stderr.exp*'
make: *** [regtest] Error 255

There's no file mcbreak.stderr.exp* in the patch; did it get
left out maybe?
Comment 39 Julian Seward 2011-04-26 13:41:53 UTC
(In reply to comment #37) (more)
> Created an attachment (id=59184) [details]
> handled review comments + upgraded to rev 11705

Looks good.  There are various minor cleanups needed, but I can do
them.  Two comments though:


(1) I saw this

  coregrind/m_gdbserver/valgrind-low-ppc64.c
  +#include "libvex_guest_ppc64.h"
  +
  +struct reg regs[] = {
  +  { "r0", 0, 64 },
  +  { "r1", 64, 64 },
  +  { "r2", 128, 64 },

  (and equivalents for other architectures)

The magic numbers 0, 64, 128, etc .. are they bit offsets inside the
struct "VexGuestPPC64State"?  Because if so, this will break if/when
the elements of that struct change offset.  Or .. do I misunderstand?


(2) --vgdb-shadow-registers=no|yes   let gdb see the shadow registers [no]

Is this necessary (to have as a command line option) ?
Why not just let gdb always see the shadow registers?
Comment 40 Philippe Waroquiers 2011-04-26 14:27:46 UTC
(In reply to comment #38)
There is a bunch of files that have to be chmod-ed +x
(you already did a part of it) + some empty files are not created
due to a bug in the toolset "svn diff & patch".

See in comment 29 the list of all files to chmod +x or touch (to create
empty files).

Note also that in the below, one test is disabled (as it needs
gdb 7.2 to run). I have however not tested with gdb 7.1, so
the filter_* files might need some changes to properly filter out
gdb 7.1 output.

> rebuilding and then doing "make regtest", I get a 
> failure like this
> 
> $ make regtest
> [...]
> make[1]: Leaving directory `/home/sewardj/VgTRUNK/gdbsr'
> gdbserver_tests/make_local_links /usr/bin/gdb
> gdbserver eval tests suppressed as /usr/bin/gdb version is not >= 7.2:  GNU gdb
> (GDB) 7.1-ubuntu
> /usr/bin/perl tests/vg_regtest gdbserver_tests memcheck cachegrind callgrind
> massif lackey none helgrind drd exp-ptrcheck exp-bbv exp-dhat
> -- Running  tests in gdbserver_tests -----------------------------------
> mcbreak:         valgrind   --tool=memcheck --vgdb=yes --vgdb-error=0
> --vgdb-prefix=./vgdb-prefix-mcbreak ./t  (progB: ./gdb --quiet -l 10 --nx ./t)
> Could not find `mcbreak.stderr.exp*'
> make: *** [regtest] Error 255
Comment 41 Philippe Waroquiers 2011-04-26 15:07:40 UTC
(In reply to comment #39)
>
> Looks good.  There are various minor cleanups needed, but I can do
> them. Two comments though:

For what concerns needed cleanup: one thing I was intending to do
as part of the next patch (to re-add delta leak) is to avoid using
VG_(clo_show_*) for mc.leak_check monitor command (memcheck/mc_main.c).

Double checking with grep the places where VG_(clo_*) are modified,
I saw that the call to VG_(show_all_errors) in coregrind/m_gdbserver/server.c
also should have arguments rather than depend on changing VG_(clo_*).



>   +struct reg regs[] = {
>   +  { "r0", 0, 64 },
...
> The magic numbers 0, 64, 128, etc .. are they bit offsets inside the
> struct "VexGuestPPC64State"?  Because if so, this will break if/when
> the elements of that struct change offset.  Or .. do I misunderstand?

The mapping between the VexGuest state and gdb is implemented in the
valgrind-low-ppc64.c "static void transfer_register" (and equivalent
for other archs).
The above numbers are offsets and size in the gdbserver "registers packet"
and "registers buffer" (see coregrind/m_gdbserver/regdef.h).
How to create or get these offsets is documented in
coregrind/m_gdbserver/README_DEVELOPPERS (search for hal9000).

> 
> 
> (2) --vgdb-shadow-registers=no|yes   let gdb see the shadow registers [no]
> 
> Is this necessary (to have as a command line option) ?
> Why not just let gdb always see the shadow registers?
Showing shadow registers is optional as:
1. shadow registers implies to have a gdb >= 7.1  and this gdb
   must be compiled with xml support.
2. the gdb command 'info registers' is not very ergonomic in case
  there are a lot of registers.
Comment 42 Julian Seward 2011-04-26 18:14:14 UTC
(In reply to comment #40)
> See in comment 29 the list of all files to chmod +x or touch (to create
> empty files).

Oh, sorry.  Missed that.

Next q is: how long should I expect the test "mcinfcallWSRU" to take?
So far it (memcheck-amd64-linux) has accumulated almost 10 CPU minutes
on a 3.46 GHz Core i5, and still not finished.
Comment 43 Julian Seward 2011-04-26 18:25:00 UTC
Also, mcinvokeWS seemed to hang at 0% cpu and I had to kill it.
Same with nlcontrolc.  This is on Ubuntu 10.04.2, x86_64.
Comment 44 Julian Seward 2011-04-26 18:49:27 UTC
Also on Ubuntu 10.04.2 x86_64, I get memcheck/tests/addressable and
memcheck/tests/atomic_incs failing like this

+valgrind: m_gdbserver/m_gdbserver.c:519 (call_gdbserver): Assertion 'gs_addresses == NULL' failed.
+   at 0x........: report_and_quit (m_libcassert.c:210)
+   by 0x........: vgPlain_assert_fail (m_libcassert.c:284)
+   by 0x........: call_gdbserver (m_gdbserver.c:519)
+   by 0x........: vgPlain_gdbserver_activity (m_gdbserver.c:747)
+   by 0x........: run_thread_for_a_while (scheduler.c:800)
+   by 0x........: vgPlain_scheduler (scheduler.c:1154)
+   by 0x........: run_a_thread_NORETURN (syswrap-linux.c:95)
+
Comment 45 Philippe Waroquiers 2011-04-26 23:38:08 UTC
(In reply to comment #43)
> Also, mcinvokeWS seemed to hang at 0% cpu and I had to kill it.
> Same with nlcontrolc.  This is on Ubuntu 10.04.2, x86_64.

Have you chmod-ed +x all the needed files in gdbserver_tests ?
i.e. chmod +x filter_* make_local_links simulate_control_c invoker
I believe the blockage at 0% cpu or at 100% cpu are due to
simulate_control_c and/or invoker not being executable.
Comment 46 Philippe Waroquiers 2011-04-27 00:06:51 UTC
(In reply to comment #44)
> Also on Ubuntu 10.04.2 x86_64, I get memcheck/tests/addressable and
> memcheck/tests/atomic_incs failing like this
> 
> +valgrind: m_gdbserver/m_gdbserver.c:519 (call_gdbserver): Assertion
> 'gs_addresses == NULL' failed.

This bug is triggered when a process forks after having initialized
gdbserver, and the child does not do an exec. If VG_(scheduler) does a
call to gdbserver in the child, the child will initialize gdbserver
but the 'at fork cleanup' in m_gdbserver.c did not cleanup enough.
This insufficient cleanup is detected by the failing assert.

I have done a fix, tested that both addressable and atomic_incs tests are running ok now.
These tests are however only partially testing the fix,
so I think it is better to have a specific test for this bug.

During investigations, I also run the tests with gdb 7.1, and a (complex)
new sed expression is needed in filter_gdb as gdb 7.1 is producing
two lines instead of one in mcinfcallWSRU.

Do you prefer an new "global" patch with the 3 changes (the fix + the
new test + the changed filter_gdb) ?
Or do you prefer to work incrementally from now on ? (e.g. if you have
started to do some cleanups ?)
Comment 47 Julian Seward 2011-04-27 00:12:31 UTC
(Posting this despite the fact that it is partially redundant
following a mid-air collision with comment #46)

(In reply to comment #45)
> i.e. chmod +x filter_* make_local_links simulate_control_c invoker

Yes, that gets rid of all the hanging.  That means the regtest
failures are now as follows:

== 584 tests, 12 stderr failures, 2 stdout failures, 1 stderrB failure, 1
stdoutB failure, 0 post failures ==
gdbserver_tests/mcinfcallWSRU            (stderrB)
gdbserver_tests/nlcontrolc               (stdoutB)
memcheck/tests/addressable               (stdout)
memcheck/tests/addressable               (stderr)
memcheck/tests/atomic_incs               (stdout)
memcheck/tests/atomic_incs               (stderr)
memcheck/tests/linux/stack_switch        (stderr)
none/tests/async-sigs                    (stderr)
none/tests/fdleak_cmsg                   (stderr)
none/tests/fdleak_ipv4                   (stderr)
none/tests/shell                         (stderr)
helgrind/tests/pth_barrier3              (stderr)
helgrind/tests/tc06_two_races_xml        (stderr)
drd/tests/pth_barrier3                   (stderr)
drd/tests/threaded-fork                  (stderr)
exp-ptrcheck/tests/bad_percentify        (stderr)

Of these, the last 5 (helgrind/, drd/, exp-ptrcheck/) also fail
pre-patch, so are not interesting.  But the rest are new:

  memcheck/tests/addressable
  memcheck/tests/atomic_incs
  memcheck/tests/linux/stack_switch
  none/tests/async-sigs
  none/tests/fdleak_cmsg
  none/tests/fdleak_ipv4
  none/tests/shell

all fail with the same assertion as comment #44.

The two gdb failures are shown below.  If it's of any use, the
GDB version on this box is "GNU gdb (GDB) 7.1-ubuntu".

--------------------------------------------------------

$ cat gdbserver_tests/mcinfcallWSRU.stderrB.diff 
--- mcinfcallWSRU.stderrB.exp    2011-04-26 13:19:44.000000000 +0200
+++ mcinfcallWSRU.stderrB.out    2011-04-26 23:52:41.000000000 +0200
@@ -22,10 +22,14 @@
 Program received signal SIGTRAP, Trace/breakpoint trap.
 0x........ in do_burn () at sleepers.c:39
 39       for (i = 0; i < burn; i++) loopnr++;
-[Switching to thread 1 (Thread ....)]#0  0x........ in do_burn () at
sleepers.c:39
+[Switching to thread 1 (Thread ....)]#0  0x........ in do_burn ()
+    at sleepers.c:39
 39       for (i = 0; i < burn; i++) loopnr++;
 $1 = void
 [Switching to thread 2 (Thread ....)]#0  0x........ in syscall ...
+    at ../sysdeps/unix/syscall-template.S:82
+82    ../sysdeps/unix/syscall-template.S: No such file or directory.
+    in ../sysdeps/unix/syscall-template.S
 Could not write register  "xxx"; remote failure reply 'E.
 ERROR changing register  xxx regno y
 gdb commands changing registers (pc, sp, ...) (e.g. 'jump',
@@ -34,6 +38,8 @@
 Thread status is VgTs_WaitSys
 '
 [Switching to thread 3 (Thread ....)]#0  0x........ in syscall ...
+    at ../sysdeps/unix/syscall-template.S:82
+82    in ../sysdeps/unix/syscall-template.S
 Could not write register  "xxx"; remote failure reply 'E.
 ERROR changing register  xxx regno y
 gdb commands changing registers (pc, sp, ...) (e.g. 'jump',
@@ -41,7 +47,8 @@
 can only be accepted if the thread is VgTs_Runnable or VgTs_Yielding state
 Thread status is VgTs_WaitSys
 '
-[Switching to thread 4 (Thread ....)]#0  0x........ in do_burn () at
sleepers.c:39
+[Switching to thread 4 (Thread ....)]#0  0x........ in do_burn ()
+    at sleepers.c:39
 39       for (i = 0; i < burn; i++) loopnr++;
 $2 = void
 monitor command request to kill this process


--------------------------------------------------------

$ cat gdbserver_tests/nlcontrolc.stdoutB.diff 
--- nlcontrolc.stdoutB.exp    2011-04-26 13:19:44.000000000 +0200
+++ nlcontrolc.stdoutB.out    2011-04-26 23:53:09.000000000 +0200
@@ -2,14 +2,19 @@
 0x........ in _start () from ...start file...
 Continuing.
 Program received signal SIGTRAP, Trace/breakpoint trap.
-0x........ in syscall ...
+0x........ in select () at ../sysdeps/unix/syscall-template.S:82
+    in ../sysdeps/unix/syscall-template.S
 [New Thread ....]
 [New Thread ....]
 [New Thread ....]
   4 Thread .... (tid 4 VgTs_WaitSys)  0x........ in syscall ...
+    at ../sysdeps/unix/syscall-template.S:82
   3 Thread .... (tid 3 VgTs_WaitSys)  0x........ in syscall ...
+    at ../sysdeps/unix/syscall-template.S:82
   2 Thread .... (tid 2 VgTs_WaitSys)  0x........ in syscall ...
+    at ../sysdeps/unix/syscall-template.S:82
 * 1 Thread .... (tid 1 VgTs_WaitSys)  0x........ in syscall ...
+    at ../sysdeps/unix/syscall-template.S:82
 $1 = 0
 $2 = 0
 $3 = 0
Comment 48 Julian Seward 2011-04-27 00:16:23 UTC
(In reply to comment #46)
> I have done a fix,

Ok good.  You found VG_(atfork), yes?

> These tests are however only partially testing the fix,
> so I think it is better to have a specific test for this bug.

Yes.

> Do you prefer an new "global" patch with the 3 changes (the fix + the
> new test + the changed filter_gdb) ?

A new global patch would be good.  So far I only did reading and
testing, but no cleanups.
Comment 49 Philippe Waroquiers 2011-04-27 14:07:02 UTC
(In reply to comment #48)
> (In reply to comment #46)
> > I have done a fix,
> 
> Ok good.  You found VG_(atfork), yes?
yes, the cleanup is based on atfork.
Fix and test is done, but (this time :), I will re-run all tests
on multiple platforms (and not only on my main dev. platform, i.e. fedora 12
x86, which did not triggered this bug).

I will also run the tests with multiple gdb versions (at least 7.1 and 7.2).

> A new global patch would be good.  So far I only did reading and
> testing, but no cleanups.

To offload you as much as I can, if you have any specific cleanup
easy to describe I can do, I can take them in charge (this evening :).

A.o. if that suits you, I will change 
  void VG_(show_all_errors) ( void )
to
  void VG_(show_all_errors) ( Int verbosity, Bool xml)
to avoid setting+re-setting the equivalent VG_(clo_*)
Comment 50 Julian Seward 2011-04-27 20:59:53 UTC
(In reply to comment #49)
> To offload you as much as I can, if you have any specific cleanup
> easy to describe I can do, I can take them in charge (this evening
> :).

One thing I wrote down, that would be useful, is to print exact
commands needed to start and give to the gdb to make it attach.  What I
mean is, normally when it starts, it prints

==852== (action at startup) vgdb me ... 

and waits for me to figure out the magic commands to start
gdb and do the "target remote | .." thing (by looking in the
manual).

As a user I would find it much easier if (for example) it
printed out the exact gdb invokation and "target remote | .."
text required, so I can just copy/paste them in to the shell,
eg:

==852== (action at startup) vgdb me ..
==852== In a different shell, run this:
==852==    gdb /home/sewardj/VgTRUNK/gdbsr/memcheck/tests/errs1
==852== and once gdb is started, give it this command:
==852==    target remote | /home/sewardj/VgTRUNK/gdbsr/Inst/bin/vgdb

(assuming the program under test is
 /home/sewardj/VgTRUNK/gdbsr/memcheck/tests/errs1
and the V installation is at
 /home/sewardj/VgTRUNK/gdbsr/Inst
)

Makes sense?  I think it would be useful.  But only if it is
easy to do.  Don't stall producing a new patch if not easy.


> A.o. if that suits you, I will change 
>   void VG_(show_all_errors) ( void )
> to
>   void VG_(show_all_errors) ( Int verbosity, Bool xml)
> to avoid setting+re-setting the equivalent VG_(clo_*)

Yes, please do that.  For sure we don't want VG_(clo_*) variables
changing as we run; they are single-assignment things.


Last item of concern is, when I tried running a from-source
Firefox build, it ran successfully with "normal" memcheck but
hung after a while (circa 20 cpu seconds) when running attached
to GDB.  However, I am not sure about this -- I need to test 
it more.
Comment 51 Philippe Waroquiers 2011-04-28 02:24:40 UTC
Created attachment 59380 [details]
upgraded to rev 11713 + some fixes/improvements

* fixed bug in cleanup after fork + added a test
* cleaned up call to VG_(show_all_errors) to avoid touching VG_(clo*)
* tests redone and/or updated for
   fedora 12 x86, with gdb 7.0/7.1/7.2
   debian amd64, with gdb 7.1/7.2
* updated filter_gdb to handle the differences in behaviour reported by Julian
   in comment 47
  (having no access to an Ubuntu 64 bits, it might be that the sed expressions
   added in filter_gdb are not properly filtering the irrelevant diffs)
Comment 52 Philippe Waroquiers 2011-04-28 02:31:24 UTC
(In reply to comment #50)

> ==852== (action at startup) vgdb me ..
> ==852== In a different shell, run this:
> ==852==    gdb /home/sewardj/VgTRUNK/gdbsr/memcheck/tests/errs1
> ==852== and once gdb is started, give it this command:
> ==852==    target remote | /home/sewardj/VgTRUNK/gdbsr/Inst/bin/vgdb
> Makes sense?  I think it would be useful.  But only if it is
> easy to do.  Don't stall producing a new patch if not easy.
This would be easier for the user. However, not directly straightforward.

> >   void VG_(show_all_errors) ( Int verbosity, Bool xml)
> Yes, please do that.  For sure we don't want VG_(clo_*) variables
> changing as we run; they are single-assignment things.
Done as part of the last uploaded patch.
> Last item of concern is, when I tried running a from-source
> Firefox build, it ran successfully with "normal" memcheck but
> hung after a while (circa 20 cpu seconds) when running attached
> to GDB.  However, I am not sure about this -- I need to test 
> it more.
I have put firefox 4.0 on f12/x86 under memcheck + gdb, no problem encountered.

As firefox does quite a lot of fork/exec, maybe a consequence of the
wrong cleanup after fork ?
If this is the case, the last patch version should solve this.

NB: the last patch still need the same list of chmod+touch of comment 29.
Comment 53 Philippe Waroquiers 2011-04-28 23:47:14 UTC
Created attachment 59404 [details]
filter_gdb

I booted an Ubuntu 64 bits 10.10, and the sed expressions were effectively
not ok (as the real output contains tab characters, while the sed expressions
were expecting only space).
I have only attached filter_gdb, but can attach a new full global patch
(after an svn update) if preferrable
Comment 54 Philippe Waroquiers 2011-05-01 10:48:32 UTC
Created attachment 59478 [details]
upgraded to rev 11719, integrated filter_gdb, retested, fix ppc64 bug

Updated to Revision 11719
Re-validated on various linux platforms (x86,amd64,arm,ppc64,s390x)
increased gdb timeout values in tests (were too low for overloaded machines)
fixed a ppc64 bug in vgdb.c
modified a test to be less sensitive to scheduling unfairness
   (sched_setaffinity in the test helps, but does not fully resolve this)
modified filter_gdb to handle a ubuntu specific diff in syscall backtrace
Comment 55 Paul Pluzhnikov 2011-05-02 01:48:07 UTC
(In reply to comment #51)

>   (having no access to an Ubuntu 64 bits, it might be that the sed expressions
>    added in filter_gdb are not properly filtering the irrelevant diffs)

I've tested the latest patch (vg-gdbserver-patch-20110501) on 64-bit Ubuntu 10.04.1 LTS with GDB 7.2. My failures are:

== 586 tests, 21 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
memcheck/tests/linux/stack_switch        (stderr)
cachegrind/tests/chdir                   (stderr)
cachegrind/tests/clreq                   (stderr)
cachegrind/tests/dlclose                 (stderr)
cachegrind/tests/notpower2               (stderr)
cachegrind/tests/wrap5                   (stderr)
cachegrind/tests/x86/fpu-28-108          (stderr)
callgrind/tests/notpower2-hwpref         (stderr)
callgrind/tests/notpower2-use            (stderr)
callgrind/tests/notpower2-wb             (stderr)
callgrind/tests/notpower2                (stderr)
callgrind/tests/simwork-both             (stderr)
callgrind/tests/simwork-cache            (stderr)
callgrind/tests/simwork1                 (stderr)
callgrind/tests/simwork2                 (stderr)
callgrind/tests/simwork3                 (stderr)
callgrind/tests/threads-use              (stderr)
helgrind/tests/pth_barrier3              (stderr)
helgrind/tests/tc06_two_races_xml        (stderr)
drd/tests/pth_barrier3                   (stderr)
exp-ptrcheck/tests/bad_percentify        (stderr)

These are exact same failures I get without the patch.
Comment 56 Paul Pluzhnikov 2011-05-02 02:06:06 UTC
Also tested using current GDB snapshot (7.3.50.20110502-cvs).

One failure in gdbserver_tests due to GDB splitting 'Switching to thread' message into two lines:

--- mcinfcallWSRU.stderrB.exp   2011-05-01 17:02:18.000000000 -0700
+++ mcinfcallWSRU.stderrB.out   2011-05-01 17:05:39.000000000 -0700
@@ -22,10 +22,12 @@
 Program received signal SIGTRAP, Trace/breakpoint trap.
 0x........ in do_burn () at sleepers.c:39
 39        for (i = 0; i < burn; i++) loopnr++;
-[Switching to thread 1 (Thread ....)]#0  0x........ in do_burn () at sleepers.c:39
+[Switching to thread 1 (Thread ....)]
+#0  0x........ in do_burn () at sleepers.c:39
 39        for (i = 0; i < burn; i++) loopnr++;
 $1 = void
Comment 57 Julian Seward 2011-05-06 22:52:23 UTC
(In reply to comment #54)
> Created an attachment (id=59478) [details]
> upgraded to rev 11719, integrated filter_gdb, retested, fix ppc64 bug

Committed as r11727.
Comment 58 Julian Seward 2011-05-06 23:52:51 UTC
Builds and regtests OK on 64-bit Ubuntu 10.04.2 (x86_64).  Builds
ok on ppc32-linux and arm-linux, unclear what regtest status is on
those platforms yet (tests are still running).  I haven't tried
Darwin yet.
Comment 59 Philippe Waroquiers 2011-05-07 01:19:11 UTC
(In reply to comment #58)
> Builds and regtests OK on 64-bit Ubuntu 10.04.2 (x86_64).  Builds
> ok on ppc32-linux and arm-linux, unclear what regtest status is on
> those platforms yet (tests are still running).  I haven't tried
> Darwin yet.

On Darwin, I expect that it should compile properly, but there will for
sure be some failing tests : I had only access to a Darwin platform
during a few hours, and could only make it compile + tested manually
+ disable two tests which were blocking.

Note also that on the Darwin I had access to, the gdb version was very 
old (gdb 6.3 IIRC). The tests have only been run with a gdb >= 7.0. 
I believe that a gdb >= 6.5 is needed for the target remote | command.

I might obtain some additional hours of access to a Mac to fix the tests
on Darwin.
Comment 60 Julian Seward 2011-05-07 09:46:43 UTC
(In reply to comment #58)
> Builds ok on ppc32-linux and arm-linux, unclear what regtest status is on
> those platforms yet (tests are still running).

There are failures and (apparent) hangs.  What information do you
need, to diagnose these?


---------------

On arm-linux (Ubuntu 10.04), I get this:

[...]
gdbserver_tests/make_local_links /usr/bin/gdb
gdbserver eval tests suppressed as /usr/bin/gdb version is not >= 7.2:  GNU gdb (GDB) 7.1-ubuntu
/usr/bin/perl tests/vg_regtest gdbserver_tests memcheck cachegrind callgrind massif lackey none helgrind drd exp-ptrcheck exp-bbv exp-dhat
-- Running  tests in gdbserver_tests -----------------------------------
mcbreak:         valgrind   --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcbreak ./t  (progB: ./gdb --quiet -l 60 --nx ./t)
*** mcbreak failed (stdout) ***
*** mcbreak failed (stdoutB) ***
*** mcbreak failed (stderrB) ***
mcclean_after_fork: valgrind   --tool=memcheck --vgdb=full --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcclean_after_fork ./clean_after_fork  (progB: ./gdb --quiet -l 60 --nx ./clean_after_fork)
*** mcclean_after_fork failed (stdout) ***
*** mcclean_after_fork failed (stderr) ***
*** mcclean_after_fork failed (stdoutB) ***
*** mcclean_after_fork failed (stderrB) ***
mchelp:          valgrind   --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mchelp ./t  (progB: ./vgdb --wait=60 --vgdb-prefix=./vgdb-prefix-mchelp -c help -c help debug -c vg.kill)
mcinfcallRU:     valgrind   --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcinfcallRU ./sleepers 1 0 2000000000 B-B-B-B- (progB: ./gdb --quiet -l 60 --nx ./sleepers)

and it hangs there at 100% cpu for memcheck-arm-linux.  I waited maybe
20 minutes but no progress.

sewardj@u1004arm:~/VgTRUNK/trunk$ uname -a
Linux u1004arm 2.6.31-608-imx51 #15-Ubuntu Fri Jun 18 10:30:45 UTC
2010 armv7l GNU/Linux

sewardj@u1004arm:~/VgTRUNK/trunk$ gdb --version

GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabi".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.

sewardj@u1004arm:~/VgTRUNK/trunk$ gcc --version
gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3

---------------

On ppc32-linux, Debian 6.0, GNU gdb (GDB) 7.0.1-debian, the
GDB tests are very slow but don't hang.

gdbserver_tests/make_local_links /usr/bin/gdb
gdbserver eval tests suppressed as /usr/bin/gdb version is not >= 7.2:  GNU gdb (GDB) 7.0.1-debian
/usr/bin/perl tests/vg_regtest gdbserver_tests memcheck cachegrind callgrind massif lackey none helgrind drd exp-ptrcheck exp-bbv exp-dhat
-- Running  tests in gdbserver_tests -----------------------------------
mcbreak:         valgrind   --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcbreak ./t  (progB: ./gdb --quiet -l 60 --nx ./t)
*** mcbreak failed (stdoutB) ***
*** mcbreak failed (stderrB) ***
mcclean_after_fork: valgrind   --tool=memcheck --vgdb=full --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcclean_after_fork ./clean_after_fork  (progB: ./gdb --quiet -l 60 --nx ./clean_after_fork)
*** mcclean_after_fork failed (stdoutB) ***
*** mcclean_after_fork failed (stderrB) ***
mchelp:          valgrind   --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mchelp ./t  (progB: ./vgdb --wait=60 --vgdb-prefix=./vgdb-prefix-mchelp -c help -c help debug -c vg.kill)
mcinfcallRU:     valgrind   --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcinfcallRU ./sleepers 1 0 2000000000 B-B-B-B- (progB: ./gdb --quiet -l 60 --nx ./sleepers)
*** mcinfcallRU failed (stderr) ***
mcinfcallWSRU:   valgrind   --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcinfcallWSRU ./sleepers 100 100000000 1000000000 -S-SB-B- (progB: ./gdb --quiet -l 60 --nx 1>&2 ./sleepers)
*** mcinfcallWSRU failed (stderrB) ***
mcinvokeRU:      valgrind   --tool=memcheck --vgdb=yes --vgdb-prefix=./vgdb-prefix-mcinvokeRU ./sleepers 1 0 1000000000 B-B-B-B- (progB: ./invoker 10 --vgdb-prefix=./vgdb-prefix-mcinvokeRU --wait=60 -c vg.wait 0)
mcinvokeWS:      valgrind   --tool=memcheck --vgdb=yes --vgdb-prefix=./vgdb-prefix-mcinvokeWS ./sleepers 1 10000000 0 -S-S-S-S (progB: ./invoker 10 --vgdb-prefix=./vgdb-prefix-mcinvokeWS --wait=60 -c vg.wait 0)
mcleak:          (skipping, prereq failed: test -x ../memcheck/tests/leak-delta)
mcsignopass:     valgrind   --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcsignopass ./../none/tests/faultstatus  (progB: ./gdb --quiet -l 60 --nx ../none/tests/faultstatus)
*** mcsignopass failed (stderr) ***
*** mcsignopass failed (stdoutB) ***
mcsigpass:       valgrind   --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcsigpass ./../none/tests/faultstatus  (progB: ./gdb --quiet -l 60 --nx ../none/tests/faultstatus)
*** mcsigpass failed (stdoutB) ***
mcvabits:        (skipping, prereq failed: test -e ./gdb.eval)
mcwatchpoints:   valgrind   --tool=memcheck --vgdb=full --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcwatchpoints ./watchpoints  (progB: ./gdb --quiet -l 60 --nx ./watchpoints)
*** mcwatchpoints failed (stdoutB) ***
mssnapshot:      valgrind   --tool=massif --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mssnapshot ./t  (progB: ./gdb --quiet -l 60 --nx ./t)
*** mssnapshot failed (stdoutB) ***
*** mssnapshot failed (stderrB) ***
nlcontrolc:      valgrind   --tool=none --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-nlcontrolc ./sleepers 1000000000 1000000000 1000000000 BSBSBSBS (progB: ./gdb --quiet -l 60 --nx ./sleepers)
*** nlcontrolc failed (stdoutB) ***
-- Finished tests in gdbserver_tests -----------------------------------
Comment 61 Philippe Waroquiers 2011-05-07 10:10:56 UTC
(In reply to comment #60)
> (In reply to comment #58)
> > Builds ok on ppc32-linux and arm-linux, unclear what regtest status is on
> > those platforms yet (tests are still running).
> 
> There are failures and (apparent) hangs.  What information do you
> need, to diagnose these?
a tar.gz of the *.out files for the failing or hanging tests is for
sure a good start.

In any case, I will launch  a rebuild on the arm and ppc systems I have access to.
Comment 62 Julian Seward 2011-05-07 12:26:34 UTC
Created attachment 59721 [details]
tarfile of gdb_server *.out files for ppc32-linux (Debian 6.0)

(In reply to comment #61)
The associated failure summary is:

gdbserver_tests/mcbreak                  (stdoutB)
gdbserver_tests/mcbreak                  (stderrB)
gdbserver_tests/mcclean_after_fork       (stdoutB)
gdbserver_tests/mcclean_after_fork       (stderrB)
gdbserver_tests/mcinfcallRU              (stderr)
gdbserver_tests/mcinfcallWSRU            (stderrB)
gdbserver_tests/mcsignopass              (stderr)
gdbserver_tests/mcsignopass              (stdoutB)
gdbserver_tests/mcsigpass                (stdoutB)
gdbserver_tests/mcwatchpoints            (stdoutB)
gdbserver_tests/mssnapshot               (stdoutB)
gdbserver_tests/mssnapshot               (stderrB)
gdbserver_tests/nlcontrolc               (stdoutB)
Comment 63 Philippe Waroquiers 2011-05-07 12:58:28 UTC
(In reply to comment #62)
> Created an attachment (id=59721) [details]
> tarfile of gdb_server *.out files for ppc32-linux (Debian 6.0)

In parallel, I have recompiled last svn and retested

on arm, debian 5.0, gdb 7.0 and gdb 7.2 : all ok.
   I will rebuild a gdb 7.1, but that will take hours (the arm system is very
    old/slow).
   For the hangs: this might be due to the scheduler (un-)fairness we discussed.
   Again here, the *.out files should give hints about what is happening.
   Waiting for fairness, I can simplify the test (one single thread
   burning cpu rather than several).
   In parallel, I can also try to enhance vg_regtest perl to kill a test
   after a specified timeout: line in the xxxxxx.vgregtest file
   (default 5 minutes).
   What do you think about this ? This avoids blocking all tests due to
   a single hanging tests (at work, on red-hat 5.2, atomic_incs loops often
   forever, so it will bypass all such problems).

on ppc32/ppc64, debian 5.0, gdb 7.0 and gdb 7.2:
   all tests ok, except on ppc64/gdb 7.2: nlcontrolc and mcinvokeWSRU 
   which needs a filter_gdb sed expression to filter a specific stack trace.
  
Sorry for all these problems, I was expecting the various platform tests I did
to cover enough but the behaviour of combinations of valgrind+gdb+glibc are
"richer" than expected.
Comment 64 Julian Seward 2011-05-08 09:10:08 UTC
(In reply to comment #63)
>    Waiting for fairness, I can simplify the test (one single thread
>    burning cpu rather than several).
That sounds like a good thing to do.  Is it important to have multiple
threads using CPU, or would one thread be enough?

>    In parallel, I can also try to enhance vg_regtest perl to kill a test
>    after a specified timeout: line in the xxxxxx.vgregtest file
>    (default 5 minutes).
>    What do you think about this ?
I would prefer to make all test guaranteed non-hanging.

> This avoids blocking all tests due to
>    a single hanging tests (at work, on red-hat 5.2, atomic_incs loops often
>    forever, so it will bypass all such problems).
memcheck/tests/atomic_incs.c, you mean?  I am surprised to hear this:
it does not have any infinite loop, only a for loop from 0 to NNN
where
#define NNN 3456987
Comment 65 Philippe Waroquiers 2011-05-08 09:54:20 UTC
(In reply to comment #64)
> (In reply to comment #63)
> >    Waiting for fairness, I can simplify the test (one single thread
> >    burning cpu rather than several).
> That sounds like a good thing to do.  Is it important to have multiple
> threads using CPU, or would one thread be enough?
I think it is preferrable to have multiple threads for such tests
but a test with a single thread burning cpu should provide a reasonable
coverage of the "burning" case. I will look deeper into this.

> 
> >    In parallel, I can also try to enhance vg_regtest perl to kill a test
> >    after a specified timeout: line in the xxxxxx.vgregtest file
> >    (default 5 minutes).
> >    What do you think about this ?
> I would prefer to make all test guaranteed non-hanging.
Yes, for sure, that is the target. The idea however is to avoid blocking
all tests due to a single wrong test. Such "fail after timeout"
feature is e.g. used in the gdb dejagnu test framework.
But in any case, at this stage, I will work on having only one single
thread burning cpu, which should then avoid the dependency on fairness.

BTW, can you also attach a tar of the *.out of the arm failures ?

I have started looking the ppc32/debian6.0 failures. Most of these are
failing due to some additional gdb output I will filter.


> memcheck/tests/atomic_incs.c, you mean?  I am surprised to hear this:
My memory about the looping test name might be wrong. Next time I see it, I will
dig deeper. Note that IIRC, the test was also looping forever outside
of valgrind.
Comment 66 Philippe Waroquiers 2011-05-09 01:34:29 UTC
Created attachment 59785 [details]
patch to improve testing on various platforms

The changes in this patch should improve the situation on various platforms
and gdb combinations (but there is still things to look at).

* In all gdbserver_tests using gdb:
  Made a more general way to remove the initial start message.

* tests using threads burning cpu modified to have only 1 thread.
  This makes them independent of the scheduler fairness.

* filter_gdb and filter_vgdb enhanced to anonymise
    some debian 6.0/ppc specific things
    some s390x/gdb 7.0, gdb 7.1  specific things

* vgdb.c: added an #include <linux/ptrace.h> to fix compilation
  on s390x fedora and suse. (Christian Boerntrager)

* fixed a bug in valgrind-low.c debug log :
  when a register size is 0, its image cannot be output (and register
  should not be transferred).

* added a parameter --keep-unfiltered to vg_regtest.in
  This will make it easier to update filter_gdb:
  in case gdbserver_tests are failing due to "artificial"
  differences to be filtered, re-run the tests using:
     perl tests/vg_regtest --keep-unfiltered gdbserver_tests
  Then a tar file with all the *.out in gdbserver_tests
  will allow me to better/faster update the filter_gdb.


Retested the above on:

x86         * f12         * (gdb 7.0, 7.1, 7.2)
ppc32/ppc64 * debian 5.0  * (gdb 7.0, 7.1, 7.2)
amd64       * debian 5.0  * (gdb 7.0, 7.1, 7.2)
s390x       * RHEL 4      * (gdb 7.0, 7.1, 7.2)


Still to do:
------------
* There are still problems on arm, but could not advance on that.
  It looks like some of the problems are triggered when
  the computer is very slow.
* mcsigpass/mcsignopass tests failure on debian 6.0/ppc:
  still to be looked at (but I have only access to a debian 5, where
  these are working).
Comment 67 Julian Seward 2011-05-09 12:57:46 UTC
Created attachment 59798 [details]
tarfile of gdb_server *.out files for ppc32-linux with c66 diff (Debian 6.0)

(In reply to comment #66)
That improves it a lot on ppc32-linux (Debian 6.0).  No more
long delays.  As you said in c66, the remaining failures are
 
gdbserver_tests/mcsignopass              (stderr)
gdbserver_tests/mcsignopass              (stdoutB)
gdbserver_tests/mcsigpass                (stdoutB)

Will test on arm-linux next.
Comment 68 Philippe Waroquiers 2011-05-09 21:34:04 UTC
(In reply to comment #67)
> Created an attachment (id=59798) [details]
> tarfile of gdb_server *.out files for ppc32-linux with c66 diff (Debian 6.0)
> 
> (In reply to comment #66)
> That improves it a lot on ppc32-linux (Debian 6.0).  No more
> long delays.  As you said in c66, the remaining failures are
Good to see it improves.
This evening, I will work on generalising the fix in vgdb.c so
as to unbreak the build on older distro (e.g. ppc SUSE, ppc RHEL 5).
I should upload soon a new version of the patch (cfr comment #66).
(unless you prefer to commit in the current state of comment 66).

> 
> gdbserver_tests/mcsignopass              (stderr)
> gdbserver_tests/mcsignopass              (stdoutB)
> gdbserver_tests/mcsigpass                (stdoutB)
With a quick reading, not clear to me what is happening.
Is none/tests/faultstatus.vgtest working properly on this setup ?
(mssignopass and mcsigpass are using none/tests/faultstatus
to test gdbserver signal handling).

> 
> Will test on arm-linux next.
At least on an extremely loaded arm system, I have got failures
in the tests. It looks to be time dependent, because it goes much better
when I increase --max-invoke-ms to 500 ms or it works properly when
I type the gdb commands myself, rather than using a script file.
Comment 69 Philippe Waroquiers 2011-05-10 00:08:57 UTC
Created attachment 59820 [details]
patch to improve testing on various platforms + PTRACE_GETREGS better conditional compilation

* made a better detection of a working PTRACE_GETREGS at compile time
  and/or at run-time.
* updated to revision 11735

Apart of this, it is the same as 59785.

Recompiled and re-run gdbserver tests on:
f12/x86, arm/ubuntu9.10, amd64/debian5.0, ppc32/64/debian5.0, s390x/RHEL4
Comment 70 Julian Seward 2011-05-10 13:01:43 UTC
(In reply to comment #69)
> Created an attachment (id=59820) [details]
> patch to improve testing on various platforms + PTRACE_GETREGS better
> conditional compilation

Committed, r11740.
Comment 71 Philippe Waroquiers 2011-05-11 00:50:23 UTC
Created attachment 59887 [details]
gdbsrv cleanup: --vgdb=full for mcsig[no]pass, fix 2 bugs (detected by IBM beam),

* fixed two bugs reported by the IBM BEAM checker:
     fd leak in vgdb.c
     break missing in valgrind-low-s390x.c

* use option --vgdb=full for the tests mcsigpass.vgtest and mcsignopass.vgtest
  This might improve these tests on ppc32/debian 6.0

* added a paragraph in gdbserver_tests/README_DEVELOPPERS to indicate
  how to report problems about failing gdbserver tests.
Comment 72 Josef Weidendorfer 2011-05-11 19:33:25 UTC
Testing on Ubuntu 11.04. Just a few remarks:

* running "vgdb" standalone (ok, makes no sense, but could be a typo),
I can not terminate the relaying by pressing ctrl+c.

* running a kernel from kernel-ppa which does not have
the ptrace scoping compiled in, I get at end of "vgdb -h":
"Could not determine ptrace scope from /proc/sys/kernel/yama/ptrace_scope"
Running any monitor command such as "vgdb foo" gives the same warning in the terminal of the VG run. Why this warning at all? BTW, I did not know about
this security thing up to now ;-)

* I think it is useful to talk about "vgdb help" in the output of "vgdb -h"

* With callgrind, vgdb help gives
 callgrind monitor commands:
    ct.dump [<dump_hint>]
        dump counters
But when running "vgdb ct.dump bla", I get
 sending command cg.dump bla to pid 11064
 command 'cg.dump bla' not recognised
Comment 73 Philippe Waroquiers 2011-05-11 23:55:01 UTC
(In reply to comment #72)
> Testing on Ubuntu 11.04. Just a few remarks:
> 
> * running "vgdb" standalone (ok, makes no sense, but could be a typo),
> I can not terminate the relaying by pressing ctrl+c.
When used as relay application, vgdb must resist to a SIGINT (as gdb might
send a SIGINT to terminate, but vgdb must ensure a clean termination,
e.g. must ptrace detach if needed). 
Maybe what I can do is to check if stdin is a tty, and then report an error
to the user if no args have been given.

> * running a kernel from kernel-ppa which does not have
> the ptrace scoping compiled in, I get at end of "vgdb -h":
> "Could not determine ptrace scope from /proc/sys/kernel/yama/ptrace_scope"
> Running any monitor command such as "vgdb foo" gives the same warning in the
> terminal of the VG run. Why this warning at all? BTW, I did not know about
> this security thing up to now ;-)
vgdb.c and m_gdbserver/remote-utils.c are guessing ptrace_scope is to be
looked at if the pre-processor symbol PR_SET_PTRACER is defined.
In this case, the ptrace_scope setting will be read by vgdb and/or remote-utils.c

If it cannot be read, an error message is given back, and it is assumed
PR_SET_PTRACER is not needed.

Do you have a suggestion for an alternative behaviour you would prefer ?
The msg could be changed to:
   warning: could not determine ptrace scope from /proc/sys/kernel/yama/ptrace_scope
   warning: a process blocked in a system call might not be commanded.

Note however that this message is only supposed to be output by vgdb
when asking for help or when an erroneous usage is done.

The valgrind gdbserver side will however output the warning all the time if it
can't read the ptrace scope and PR_SET_PTRACER is defined.
It does this for all "new connections" (as this means there is a new vgdb which
must become the ptracer).
I might put a static local to output the warning only once.


> 
> * I think it is useful to talk about "vgdb help" in the output of "vgdb -h"
Yes, that looks a good idea.

> 
> * With callgrind, vgdb help gives
>  callgrind monitor commands:
>     ct.dump [<dump_hint>]
>         dump counters
> But when running "vgdb ct.dump bla", I get
>  sending command cg.dump bla to pid 11064
>  command 'cg.dump bla' not recognised
It looks to be a typo (cg. instead of ct.).

Here, it works:
[philippe@soleil valgrind]$ coregrind/vgdb help
sending command help to pid 2320
general valgrind monitor commands:
 ....
callgrind monitor commands:
  ct.dump [<dump_hint>]
        dump counters
  ct.zero
        zero counters

[philippe@soleil valgrind]$ ls -lrt | tail -4
drwxrwxr-x.  6 philippe philippe   4096 2011-05-10 23:36 exp-bbv
drwxrwxr-x.  6 philippe philippe   4096 2011-05-10 23:36 exp-dhat
drwxrwxr-x.  4 philippe philippe  12288 2011-05-11 00:43 gdbserver_tests
-rw-------.  1 philippe philippe      0 2011-05-11 23:34 callgrind.out.2320
[philippe@soleil valgrind]$ coregrind/vgdb ct.dump
sending command ct.dump to pid 2320
[philippe@soleil valgrind]$ ls -lrt | tail -4
drwxrwxr-x.  6 philippe philippe   4096 2011-05-10 23:36 exp-dhat
drwxrwxr-x.  4 philippe philippe  12288 2011-05-11 00:43 gdbserver_tests
-rw-------.  1 philippe philippe      0 2011-05-11 23:34 callgrind.out.2320
-rw-------.  1 philippe philippe  53559 2011-05-11 23:34 callgrind.out.2320.1
[philippe@soleil valgrind]$
Comment 74 Julian Seward 2011-05-12 00:55:41 UTC
(In reply to comment #71)
> Created an attachment (id=59887) [details]
> gdbsrv cleanup: --vgdb=full for mcsig[no]pass, fix 2 bugs (detected by IBM
> beam),

Committed, r11748.
Comment 75 Josef Weidendorfer 2011-05-12 14:04:10 UTC
(In reply to comment #73)
> > * running "vgdb" standalone (ok, makes no sense, but could be a typo),
> > I can not terminate the relaying by pressing ctrl+c.
> When used as relay application, vgdb must resist to a SIGINT (as gdb might
> send a SIGINT to terminate, but vgdb must ensure a clean termination,
> e.g. must ptrace detach if needed).

Isn't this the same situation? When pressing ctrl+c after starting
vgdb in a shell, it also should do a clean termination...

> Maybe what I can do is to check if stdin is a tty, and then report an error
> to the user if no args have been given.

Sounds good. Another possibility would be to have to specify e.g. "--relay"
for vgdb to go into relay mode.

> > * running a kernel from kernel-ppa which does not have
> > the ptrace scoping compiled in, I get at end of "vgdb -h":
> > "Could not determine ptrace scope from /proc/sys/kernel/yama/ptrace_scope"
> > Running any monitor command such as "vgdb foo" gives the same warning in the
> > terminal of the VG run. Why this warning at all? BTW, I did not know about
> > this security thing up to now ;-)
> vgdb.c and m_gdbserver/remote-utils.c are guessing ptrace_scope is to be
> looked at if the pre-processor symbol PR_SET_PTRACER is defined.
> In this case, the ptrace_scope setting will be read by vgdb and/or
> remote-utils.c
> 
> If it cannot be read, an error message is given back, and it is assumed
> PR_SET_PTRACER is not needed.
> 
> Do you have a suggestion for an alternative behaviour you would prefer ?

Perhaps my setup is a little bit strange, but I was wondering why printing
out a warning when everything actually works.

> > * With callgrind, vgdb help gives
> >  callgrind monitor commands:
> >     ct.dump [<dump_hint>]
> >         dump counters
> > But when running "vgdb ct.dump bla", I get
> >  sending command cg.dump bla to pid 11064
> >  command 'cg.dump bla' not recognised
> It looks to be a typo (cg. instead of ct.).

Oops. You are right, thanks!
Comment 76 Philippe Waroquiers 2011-05-13 00:42:11 UTC
Created attachment 59948 [details]
fix arm thumb by transforming an address to its thumb form when needed

* added a function thumb_pc transforming a pc to its thumb form if needed
  (using an heuristic to guess if this is a thumb address)
* when program counter is modified by gdb, use thumb_pc
* use thumb_pc in monitor command vg.translate

(I was able to check that this improves inferior call on a small
thumb compiled executable + mcinfcallRU test) but I could not compile
all tests with thumb).
Comment 77 Philippe Waroquiers 2011-05-14 17:16:08 UTC
Created attachment 59992 [details]
fix some tests on ppc-debian6,s390x + handled Nick Nethercote, Josef Weidendorfer comments

* improved testing & related doc
  - added option --vex-iropt-precise-memory-exns=yes to mcsig(no)pass.vgtest
    + updated manual-core.xml
  - cleanup some comments in *.vgtest 
  - modified filter_gdb and filter_memcheck_monitor to
    handle specific ppc/debian6.0 mcsig(no)pass output
    handle specific s390x 'missing debug info'
  - added more information in README_DEVELOPPERS on how to
    investigate failing gdbserver tests.

* handled Nick Nethercote comment:
  Replaced kludgy ms.snapshot detailed
              by  ms.detailed_snaphot
  Updated documentation and test.

* handled Josef Weindendorfer comments:
   - do not report an error if ptrace_scope file can't be read.
     Instead, a debug trace is done if -d (debug) option given
   - added an option -l to give the list of active Valgrind
     gdbserver. Useful a.o. to support callgrind_control.
     Updated documentation
   - added ref. to  vgdb help in the vgdb --help message
Comment 78 Bart Van Assche 2011-05-14 18:11:23 UTC
Hello Philippe,

Running the regression tests (trunk r11750) creates a large number of vgdb-* files in /tmp that do not get cleaned up. That seems like a bug to me.
Comment 79 Philippe Waroquiers 2011-05-15 01:36:15 UTC
(In reply to comment #78)
> Hello Philippe,
> 
> Running the regression tests (trunk r11750) creates a large number of vgdb-*
> files in /tmp that do not get cleaned up. That seems like a bug to me.
Hello Bart,

These files are left if the Valgrind process calls exec.
I thought it was not a big deal but in fact, it is better to do something:
If ever the pid nr is recycled, this could cause collision, and a new
process with the same pid could have a problem creating its resources.

If a process P1 running under Valgrind execs an executable E1, depending on
--trace-children, E1 will or will not run under Valgrind.
If it does not run under Valgrind, then it is better to cleanup (as otherwise
it leaves these files).
If it runs under valgrind, I think a cleanup before the exec is preferrable
to let a fresh gdbserver start after exec. Depending on --vgdb-error,
the freshly started gdbserver will stop at the beginning of execution, to
let the user put breakpoints before startup.

Now, there is another case to look at in this area.
If process P1 forks, giving a child P2.
Should P2 activate its gdbserver ?
I suggest the following logic:
   if --trace-children=no  : do not activate the gdbserver
   else activate gdbserver + if dyn_vgdb_error == 0, gdbserver waits for
        gdb to attach just after fork.
If after fork, the process P2 execs another executable, we are back to
the initial case.
(NB: dyn_vgdb_error gets its initial value from --vgdb-error, but can after
be controlled by the user).

What is slightly worrying me with the above is that the classical case
of a fork-exec implies that the user has to attach twice for each
fork/exec if --vgdb-error=0 is given.
But not activating the gdbserver in the forked process does not let the user
debug a forked process that does not exec something else.

What do you think about this ?
Comment 80 Bart Van Assche 2011-05-15 10:03:16 UTC
(In reply to comment #79)
> (In reply to comment #78)
> > Running the regression tests (trunk r11750) creates a large number of vgdb-*
> > files in /tmp that do not get cleaned up. That seems like a bug to me.
> 
> These files are left if the Valgrind process calls exec.
> I thought it was not a big deal but in fact, it is better to do something:
> If ever the pid nr is recycled, this could cause collision, and a new
> process with the same pid could have a problem creating its resources.
> 
> If a process P1 running under Valgrind execs an executable E1, depending on
> --trace-children, E1 will or will not run under Valgrind.
> If it does not run under Valgrind, then it is better to cleanup (as otherwise
> it leaves these files).
> If it runs under valgrind, I think a cleanup before the exec is preferrable
> to let a fresh gdbserver start after exec. Depending on --vgdb-error,
> the freshly started gdbserver will stop at the beginning of execution, to
> let the user put breakpoints before startup.
> 
> Now, there is another case to look at in this area.
> If process P1 forks, giving a child P2.
> Should P2 activate its gdbserver ?
> I suggest the following logic:
>    if --trace-children=no  : do not activate the gdbserver
>    else activate gdbserver + if dyn_vgdb_error == 0, gdbserver waits for
>         gdb to attach just after fork.
> If after fork, the process P2 execs another executable, we are back to
> the initial case.
> (NB: dyn_vgdb_error gets its initial value from --vgdb-error, but can after
> be controlled by the user).
> 
> What is slightly worrying me with the above is that the classical case
> of a fork-exec implies that the user has to attach twice for each
> fork/exec if --vgdb-error=0 is given.
> But not activating the gdbserver in the forked process does not let the user
> debug a forked process that does not exec something else.
> 
> What do you think about this ?

I'm not sure what to do with this. Have you considered to use so-called abstract socket addresses on Linux instead of file system path names ? If I'm not mistaken these sockets disappear automatically after the last process in which the socket was bound is gone. For more information, see e.g. http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html.
Comment 81 Philippe Waroquiers 2011-05-15 10:32:37 UTC
(In reply to comment #80)
> (In reply to comment #79)

> > What is slightly worrying me with the above is that the classical case
> > of a fork-exec implies that the user has to attach twice for each
> > fork/exec if --vgdb-error=0 is given.
> > But not activating the gdbserver in the forked process does not let the user
> > debug a forked process that does not exec something else.
> 
> I'm not sure what to do with this. Have you considered to use so-called
> abstract socket addresses on Linux instead of file system path names ? If I'm
> not mistaken these sockets disappear automatically after the last process in
> which the socket was bound is gone. For more information, see e.g.
> http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html.

I do not think this can help: from what I understand, either the socket has
a name in the file system, and then it must be unlinked similarly to the 
FIFOs, or it would not have a name, but then that does not let vgdb discovers
these processes. And that would in any case still let the pathname of the
shared memory to handle.

But I think I have now a change which cleans up the files,
except when Valgrind process is killed -9 (or similar).

For what concerns debugging (or not) a forked process : I think the proposal
above looks the good compromise : I would not like to add a new command line
option, and the above still allows the user to decide to debug or not
a forked process using --trace-children and/or setting dyn_vgdb_error.

I am testing on various platforms, and should attach a patch soon.

Thanks for the problem report (and thanks for the cleanup/enhancements of
the gdbserver tests).
Comment 82 Philippe Waroquiers 2011-05-15 17:34:31 UTC
Created attachment 60033 [details]
ensure proper cleanup of gdbsrv FIFOs/shmem files with untraced fork/exec

* syswrap-{generic|darwin|aix5}.c : in PRE(sys_execve) : terminate gdbserver
* pub_core_gdbserver.h and m_gdbserver.c : add VG_(gdbserver_prerun_action),
  factorising the actions to do by gdbserver at "startup" (i.e. a traced
  fork or a traced exec).
* scheduler.c : implement startup action using VG_(gdbserver_prerun_action)
Comment 83 Philippe Waroquiers 2011-05-15 17:45:01 UTC
Created attachment 60034 [details]
implement delta leak reporting 

* memcheck.h: add client requests to produce delta leaks
* mc_include.h: add typedef LeakCheckDeltaMode
                add old_* values in lossrecord (to implement delta)

* mc_errors.c: add MC_(snprintf_delta) to print a delta if delta leak requested
               MC_(pp_Error) : use MC_(snprintf_delta) to print leak errors
Comment 84 Philippe Waroquiers 2011-05-15 17:54:05 UTC
(In reply to comment #83)
> Created an attachment (id=60034) [details]
Manipulation error caused a part of the description to not be given.
Here is the complete description:
* memcheck.h: add client requests to produce delta leaks
* mc_include.h: add typedef LeakCheckDeltaMode
                add old_* values in lossrecord (to implement delta)
                MC_(detect_memory_leaks) : added a LeakCheckParams struct
                 with LeakCheckDeltaMode + avoid touching VG_(clo_*)
* mc_errors.c: add MC_(snprintf_delta) to print a delta if delta leak requested
               MC_(pp_Error) : use MC_(snprintf_delta) to print leak errors
* mc_leakcheck.c: implemented delta leak report
* mc_main.c : updated to LeakCheckParam
              updated gdbsrv help and mc.leak_check monitor command
* mc-manual.xml : changed mc.leak_check documentation
                  documented the 2 new client requests
* memcheck/tests/leak-delta.vgtest : new test
* gdbserver_tests/mcleak.vgtest : test re-activated and updated
Comment 85 Bart Van Assche 2011-05-16 20:08:11 UTC
(In reply to comment #82)
> Created an attachment (id=60033) [details]
> ensure proper cleanup of gdbsrv FIFOs/shmem files with untraced fork/exec
> 
> * syswrap-{generic|darwin|aix5}.c : in PRE(sys_execve) : terminate gdbserver
> * pub_core_gdbserver.h and m_gdbserver.c : add VG_(gdbserver_prerun_action),
>   factorising the actions to do by gdbserver at "startup" (i.e. a traced
>   fork or a traced exec).
> * scheduler.c : implement startup action using VG_(gdbserver_prerun_action)

The patch helps but apparently does not yet clean up all vgdb pipe instances:
$ ls /tmp/vgdb*
$ rebuild-valgrind # clean source tree, run ./configure, build and run regtests
$ ls /tmp/vgdb*
/tmp/vgdb-pipe-from-vgdb-to-17722     /tmp/vgdb-pipe-to-vgdb-from-17722
/tmp/vgdb-pipe-shared-mem-vgdb-17722
Comment 86 Bart Van Assche 2011-05-16 20:11:30 UTC
Has anyone already tried to run the gdbserver regtests on Ubuntu ? On Ubuntu I see all these tests fail with a diff similar to this:
$ cat mcclean_after_fork.stderr.diff
--- mcclean_after_fork.stderr.exp       2011-05-16 19:54:47.181087996 +0200
+++ mcclean_after_fork.stderr.out       2011-05-16 20:00:05.031087984 +0200
@@ -1,5 +1,7 @@
 
 (action at startup) vgdb me ... 
+error 2 No such file or directory
+error VG_(open) /proc/sys/kernel/yama/ptrace_scope
 
 HEAP SUMMARY:
     in use at exit: 0 bytes in 0 blocks

Note: this might be related to the fact that Ubuntu's default shell (/bin/sh) is dash instead of bash. More info can be found here: http://mywiki.wooledge.org/Bashism.
Comment 87 Julian Seward 2011-05-17 18:35:39 UTC
(In reply to comment #76)
> Created an attachment (id=59948) [details]
> fix arm thumb by transforming an address to its thumb form when needed

Committed, r11768.
Comment 88 Julian Seward 2011-05-17 19:15:39 UTC
(In reply to comment #77)
> Created an attachment (id=59992) [details]
> fix some tests on ppc-debian6,s390x + handled Nick Nethercote, Josef
> Weidendorfer comments

Committed, r11770.
Comment 89 Julian Seward 2011-05-17 20:15:32 UTC
(In reply to comment #82)
> Created an attachment (id=60033) [details]
> ensure proper cleanup of gdbsrv FIFOs/shmem files with untraced fork/exec

Committed, r11771.
Comment 90 Julian Seward 2011-05-17 20:18:06 UTC
(In reply to comment #85)
> The patch helps but apparently does not yet clean up all vgdb pipe instances:
> $ ls /tmp/vgdb*
> $ rebuild-valgrind # clean source tree, run ./configure, build and run regtests
> $ ls /tmp/vgdb*
> /tmp/vgdb-pipe-from-vgdb-to-17722     /tmp/vgdb-pipe-to-vgdb-from-17722
> /tmp/vgdb-pipe-shared-mem-vgdb-17722

Yes, I see that too -- 3 files remaining.  (Ubuntu 10.04 x86_64).
Comment 91 Philippe Waroquiers 2011-05-17 21:32:02 UTC
Created attachment 60090 [details]
add cleanup: line to none/tests/require-text-symbol-2.vgtest

(In reply to comment #85)
> The patch helps but apparently does not yet clean up all vgdb pipe instances:

In case Valgrind terminates abnormally, then no cleanup is done.
In this case, the abnormal termination is in the test which checks
--require-text-symbol=:*libc.so*:doesntexist
I attach a patch which adds a cleanup: line to this test.

It would be possible to implement this cleanup inside Valgrind,
but there are about 60 places where VG_(exit) is called.
If this solution would be preferrable, I can do the patch (e.g. replacing
all calls of VG_(exit) by a function VG_(cleanup_and_exit).

The attached patch adds a cleanup: line to the test to ensure the 2 FIFOs
and the shared mem file are removed.
Comment 92 Philippe Waroquiers 2011-05-17 21:41:46 UTC
(In reply to comment #86)
> Has anyone already tried to run the gdbserver regtests on Ubuntu ?
The tests have been run on an Ubuntu 10.04, 32 and 64 bits,
but only with ptrace_scope file configured.


The commit r11770 should fix these failures, as the below messages
related to unexisting ptrace_scope will be output only when
-d debug option(s) is given.

>  (action at startup) vgdb me ... 
> +error 2 No such file or directory
> +error VG_(open) /proc/sys/kernel/yama/ptrace_scope
Comment 93 Julian Seward 2011-05-17 23:14:13 UTC
(In reply to comment #91)
> Created an attachment (id=60090) [details]
> add cleanup: line to none/tests/require-text-symbol-2.vgtest

Philippe .. a general comment re cleaning up.  Do you know the
open-then-unlink trick?  eg at coregrind/m_main.c:1808 ibid.  The idea
is, for a temporary file that should be deleted at program exit:
create the file, then open it (to get a file descriptor), then unlink
it.  Because the file is open the kernel cannot delete it, but as soon
as the process disappears, the kernel deletes the file because there
are no more references to it.  The deletion happens regardless of how
the process exits (normally via syscall, or with a fatal signal) so
this is very convenient and clean.  Maybe it is useful here?
Comment 94 Philippe Waroquiers 2011-05-17 23:22:54 UTC
(In reply to comment #93)
> (In reply to comment #91)
> > Created an attachment (id=60090) [details] [details]
> > add cleanup: line to none/tests/require-text-symbol-2.vgtest
> 
> Philippe .. a general comment re cleaning up.  Do you know the
> open-then-unlink trick?  eg at coregrind/m_main.c:1808 ibid.  The idea

Yes I am aware of this. However, the files cannot be unlinked, otherwise
vgdb can't list the active Valgrind gdbservers (and after that can't open
the files to contact these V gdbsrv).
Comment 95 Julian Seward 2011-05-17 23:37:17 UTC
(In reply to comment #91)
> Created an attachment (id=60090) [details]
> add cleanup: line to none/tests/require-text-symbol-2.vgtest

Committed, r11772.
Comment 96 Philippe Waroquiers 2011-05-22 08:23:22 UTC
Created attachment 60211 [details]
fix gdbsrv arm thumbness

Fix  gdbsrv behaviour with ARM thumb code.

* Disabled several tests on ARM when gdb version < 7.1
  gdb 7.0 has problems with next/step/... in ARM thumb code.
* Documented in manual-core.xml that ARM thumb code implies
  a gdb version >= 7.1
* m_translate.c: don't chase if chasing will cause a transition arm/thumb.
* m_gdbserver.h/.c : take into account the thumb bit at several places
* Tests redone on f12/x86, debian5/x86_64,Ubuntu9.10/arm
  with gdb 7.0/7.1/7.2/7.3(svn)
  Updated filter_gdb and templates for gdb 7.3
Comment 97 Paul Pluzhnikov 2011-05-22 17:45:15 UTC
I just spent several hours chasing a tricky "use of uninitialized value" bug.

Ability to set breakpoints and query memory state is absolutely AWESOME! Philippe, thank you, thank you, thank you!

At several points in my debug session, while trying to understand whether I am "upstream" or "downstream" from the bug, I wanted to know the defined-ness of a register, e.g. something like "monitor mc.get_vbits $rax" or "monitor mc.check defined $rax".

Don't know how difficult this would be to implement, but I believe the capability would be very handy on architectures that pass parameters in registers.
Comment 98 Philippe Waroquiers 2011-05-22 19:49:18 UTC
(In reply to comment #97)
> register, e.g. something like "monitor mc.get_vbits $rax" or "monitor mc.check
> defined $rax".

If you give --vgdb-shadow-registers=yes at startup, then you can look
at the 2 Valgrind shadow register sets by post-fixing them with s1 or s2.

E.g. print $raxs1 should give you the value of the shadow register
giving the definedness of the register rax.
Comment 99 Philippe Waroquiers 2011-05-27 00:54:36 UTC
Created attachment 60364 [details]
fix gdbsrv arm thumbness : imark delta based solution

Fix  gdbsrv behaviour with ARM thumb code.
This is an alternative approach to attachment 60211 [details].
Instead of having a SB being fully arm or thumb (by not chasing between
arm/thumb), each IMark gets a 'Int delta' which gives the delta
to add to the address to have the real thumb PC.
This then allows gdbsrv to differentiate arm from thumb instructions.
Note that to properly test the arm/thumb behaviour, it is better
to use a gdb version >= 7.1. To run all tests, a version >= 7.2 is needed.

* Disabled several tests on ARM when gdb version < 7.1
  gdb 7.0 has problems with next/step/... in ARM thumb code.
* Documented in manual-core.xml that ARM thumb code implies
  a gdb version >= 7.1
* libvex_ir.h : add a Int delta in IMark.
* ir_defs.c : add Int delta argument to fn IRStmt_IMark
              ppIRStmt : for IMark, print a +d or -d after the addr,
                         if the delta != 0
                       (for a 0 delta, nothing is printed so that
                        the pretty print stays the same for all archs
                        except for thumb instructions)
* ir_opt.c : calls to IRStmt_IMark use the new arg
* guest_generic_bb_to_IR.c : when building IMark, give a delta 0 except for
                             thumb code.
  (NB: I suppose that the new 3rd element in an IMark does not have a
   major impact on performance, but I did not check)
* m_gdbserver.h/.c : take into account the thumb bit at several places
* All tests redone on debian5/x86_64,  gdbserver tests redone on Ubuntu9.10/arm
  with gdb 7.0/7.1/7.2/7.3(svn)
  Updated filter_gdb and templates for gdb 7.3
  Note: on a heavily loaded system, simulate_control_c timeout arg might be
  too small. I will try to make simulate_control_c working better on
  heavily loaded system, and will provide a subsequent patch for that.
Comment 100 Julian Seward 2011-05-27 15:29:17 UTC
(In reply to comment #99)
> Created an attachment (id=60364) [details]
> fix gdbsrv arm thumbness : imark delta based solution

Committed, r2153, r11779.  Thanks.
Comment 101 Philippe Waroquiers 2011-05-29 23:07:42 UTC
Created attachment 60451 [details]
fix mcsig(no)pass on arm Ubuntu10, arm thumb internal doc, improve simulate_control_c

* new file docs/internals/arm_thumb_notes_gdbserver.txt
  documentation about the subtilities of the thumb bit handling in gdbsrv.
* made the SIGFPE backtrace filtering less dependent on gdb/os/libc/...
* improved simulate_control_c : runs faster/less dependent on timeout value

Retested on f12/x86, debian5/amd64, debian5/ppc64, rh4/s390x, Ubuntu9/arm
Comment 102 Philippe Waroquiers 2011-06-11 18:24:44 UTC
Comment on attachment 60451 [details]
fix mcsig(no)pass on arm Ubuntu10, arm thumb internal doc, improve simulate_control_c

patch comment 101 integrated as r11792
Comment 103 Philippe Waroquiers 2011-06-11 18:34:34 UTC
Created attachment 60909 [details]
fix safe_fd exhaustion in fork chain caused by non closing of shared_mem_fd 

Patch that fixes the problem reported by Christian Borntraeger.
The problem was created by keeping the shared memory mapped file opened
without reason till the process does an exec.
In case of a chain of forked processes (without exec), the range of safe_fd
reserved for Valgrind own usage becomes exhausted.

* coregrind/m_gdbserver/remote-utils.c :
   do not VG_(safe_fd) shared_mem_fd (as it is now closed directly)
   close shared_mem_fd once file is mmap-ed and written.
* gdbserver_tests/nlfork_chain.stderr.exp,nlfork_chain.vgtest,
               fork_chain.c,nlfork_chain.stdout.exp:
     new files
* gdbserver_tests/Makefile.am:
  modified for new nlfork_chain test
Comment 104 Julian Seward 2011-06-15 23:36:30 UTC
(In reply to comment #103)
> Created an attachment (id=60909) [details]
> fix safe_fd exhaustion in fork chain caused by non closing of shared_mem_fd 

Committed, r11818.  Thanks for the fix.
Comment 105 Philippe Waroquiers 2011-06-17 23:56:18 UTC
Created attachment 61100 [details]
implement delta leak reporting (updated to new rev 11821, minor doc updates)

* memcheck.h: add client requests to produce delta leaks
* mc_include.h: add typedef LeakCheckDeltaMode
                add old_* values in lossrecord (to implement delta)
                MC_(detect_memory_leaks) : added a LeakCheckParams struct
                 with LeakCheckDeltaMode + avoid modifying VG_(clo_*)
* mc_errors.c: add MC_(snprintf_delta) to print a delta if delta leak requested
               MC_(pp_Error) : use MC_(snprintf_delta) to print leak errors
* mc_leakcheck.c: implemented delta leak report
* mc_main.c : updated to LeakCheckParam
              updated gdbsrv help and mc.leak_check monitor command
* mc-manual.xml : changed mc.leak_check documentation
                  documented the 2 new client requests
* manual-core-adv.xml :  minor update (order of mc.leak_check args)
* memcheck/tests/leak-delta.vgtest : new test
* gdbserver_tests/mcleak.vgtest : test re-activated and updated
Comment 106 Philippe Waroquiers 2011-06-18 01:35:52 UTC
Created attachment 61102 [details]
minor documentation fixes
Comment 107 Philippe Waroquiers 2011-06-21 00:10:08 UTC
Created attachment 61187 [details]
fix error in usability msg (--vgdb= must be --vgdb-prefix=) + make a test more deterministic

* Fix the typo in usability message prefix part (i.e. replaced --vgdb= by
  --vgdb-prefix=)
* make test clean_after_fork more deterministic by executing enough basic
  block in the child to trigger gdbsrv usability msg on amd64 platform.
  (on x86/ppc/s390x, the gdbsrv msg was triggered, making the test fail).

Tested on f12/x86, debian5/amd64

NB: no other change done to the usability msg, I start to get used to it :).
Comment 108 Philippe Waroquiers 2011-06-21 00:22:13 UTC
Created attachment 61188 [details]
Support for PIC executables (e.g. firefox on Ubuntu 11) by adding the "auxv" protocol packet to gdbsrv.

* Implemented the transmission of the auxv from gdbsrv to gdb.
* Added VG_(client_auxv) in pub_core_clientstate.h and m_clientstate.c
* initimg-linux.c
  setup of VG_(client_auxv)
  AT_BASE set to ignore. Note: the AT_BASE in auxv value sent
  by the classical gdbserver is 0 => I have ignored it when setting up.
  But if needed, it could be set to ignored only by gdbsrv
  when sending the auxv rather than when preparing the auxv.
  However, I suspect that setting it to 0 is a better value in general.

Tested on f12/x86, Ubuntu11.04/amd64, debian5/amd64, debian5/ppc64,
rhel4/s390x
Comment 109 Philippe Waroquiers 2011-06-23 01:07:44 UTC
Created attachment 61262 [details]
fix stupid bug (False instead of True) + SIGSTOP iso SIGTRAP (avoid race cond) + 2 new tests

* new tests
  - nlpasssigalrm.vgtest, testing the Qpasssignals signal handling packet
  - nlsigvgdb.vgtest, testing vgdb forcing invoke while gdbserver is busy
* m_gdbserver.c
  - fix stupid bug (return False instead of return True)
  - used SIGSTOP instead of SIGTRAP to "speak" with vgdb.c, 
    as SIGTRAP is masked while gdbserver is running, and we
    need a non-maskable signal to avoid race conditions.
* vgdb.c
  - Similarly, used SIGSTOP instead of SIGTRAP
Comment 110 Julian Seward 2011-06-26 12:50:29 UTC
All outstanding patches committed:

c106 (Minor documentation fixes for the GDB server) as r11834
c107 (fix error in usability msg, etc)              as r11835
c108 (Add support for PIC executables)              as r11836
c109 (Fix bug in logic related to signal-passing)   as r11837
c105 (add delta leak checking functionality)        as r11838
Comment 111 Philippe Waroquiers 2011-06-27 00:30:37 UTC
Created attachment 61353 [details]
valgrind and tool mon. cmds prefixes changes + doc fixes + new vgdb option --cmd-time-out

* changed prefixes of Valgrind core monitor commands from vg. to v.
* removed prefixes of Tool monitor commands
* memcheck leak_check 'leakpossible' arg renamed to 'possibleleak'
* memcheck make_memory 'ifaddressabledefined' arg renamed to 'Definedifaddressable'
    (with uppercase D to avoid confusion with 'defined' arg).
* vgdb options
  - Some doc updates : more logical option order documentation,
       specify 'standalone' for options aimed at standalone usage.
  - added option --cmd-time-out for standalone vgdb
     (comment of Josef Weindendorfer, needed to interface with a callgrind GUI)
* gdbserver_tests/Makefile.am : added some missing files to EXTRA_DIST
* updated tests according to the above.
* updated documentation according to the above.
* some additional minor doc fixes/clarifications

Tested on f12/x86, debian5/amd64
Comment 112 Julian Seward 2011-06-28 08:29:09 UTC
(In reply to comment #111)
> Created an attachment (id=61353) [details]
> valgrind and tool mon. cmds prefixes changes + doc fixes + new vgdb option
> --cmd-time-out

Committed, r11844.
Comment 113 Julian Seward 2011-06-28 08:30:30 UTC
Closing.  Any followup changes can go in new bug reports of their own.