Description
Paul Pluzhnikov
2009-11-17 02:29:02 UTC
Created attachment 42213 [details]
patch that implements a gdbserver in valgrind
patch on valgrind revision 11096, VEX revision 1964.
Attached a patch based on revision 11096 VEX revision 1964 Example of usage: valgrind --vgctl-check=500 --vgctl-errors=0 ./prog gdb ./prog target remote | vgctl After that, info thread, break, next, continue, C-c, ... works. Known limitations: 1 cannot attach to a process blocked in system calls 2 does not support the gdb "handle" command 3 does not support calling code e.g. p hello_world() Unknown limitations: none, otherwise they would be in the previous section :) More seriously: only limited tests done with a small test program + firefox on fedora12 x86 32 bits a previous version of this patch was compiled and moderately tested on fedora12 amd64, on ubuntu x86+amd64 ppc32 and ppc64 not done yet. AIX and Darwin not done yet. (I have no access to such systems). I will try to enhance the patch in the recent future (e.g. to release limitations 2 and 3). More info in file coregrind/m_gdbserver/README_DEVELOPPERS Any feedback/suggestions/criticism/... welcome Created attachment 42215 [details]
patch that implements a gdbserver in valgrind
Previous attachment attached with a wrong content type
I applied this to svn valgrind today and gave it a try. I'm using x86 Fedora 11. The modified valgrind crashed when I ran it on gdb: valgrind: m_hashtable.c:222 (vgPlain_HT_ResetIter): Assertion 'table' failed. ==28597== at 0x38028D85: report_and_quit (m_libcassert.c:191) ==28597== by 0x38028F57: vgPlain_assert_fail (m_libcassert.c:265) ==28597== by 0x380270D4: vgPlain_HT_ResetIter (m_hashtable.c:222) ==28597== by 0x3806580E: vgPlain_is_watched (m_gdbserver.c:278) ==28597== by 0x38022FE2: vgMemCheck_record_address_error (mc_errors.c:835) ==28597== by 0x38014B9B: mc_LOADVn_slow (mc_main.c:1233) ==28597== by 0x38014D7E: vgMemCheck_helperc_LOADV32le (mc_main.c:4150) ==28597== by 0x64131A22: ??? Also, it would be nice if vgctl supported the extended-remote mode, so in gdb I could: (gdb) target extended-remote | vgctl (gdb) run ... and have a new inferior start up under valgrind. It may even be worthwhile to have a special "target valgrind" in gdb. That way we could support I/O redirection and other things not supported (AFAIK) by extended-remote. > valgrind: m_hashtable.c:222 (vgPlain_HT_ResetIter): Assertion 'table' failed. I have reproduced the bug in the following circumstances: run valgrind without activating gdbserver on an executable which dereferences a non-addressable address. A bypass is to give a --vgctl-check= argument. The proper fix is to add the following 2 lines at the beginning of the function VG_(is_watched) (in m_gdbserver.c): if (!gdbserver_called) return False; > Also, it would be nice if vgctl supported the extended-remote mode, > so in gdb I could: > > (gdb) target extended-remote | vgctl > (gdb) run > > ... and have a new inferior start up under valgrind. It looks not trivial or not possible to support extended-remote mode. I understand the extended-remote allows to start a gdbserver and tell it afterward to attach to a process or to start an executable. But as the gdbserver is embedded inside valgrind, there is no gdbserver before you have started valgrind. Even if it would be possible to run valgrind without specifying an executable to run, gdbserver would then have to indicate to its "caller" which executable to run. > > It may even be worthwhile to have a special "target valgrind" > in gdb. That way we could support I/O redirection and other > things not supported (AFAIK) by extended-remote. What do you mean exactly by I/O redirection and other things not supported by extended-remote ? (In reply to comment #5) > It looks not trivial or not possible to support extended-remote mode. Oops, yeah, sorry about that. What I'm really looking for is a way to make it very easy for gdb users to run their program using valgrind without leaving gdb. Something like "target valgrind; run". Thanks for the follow-up patch. I tried it and it seems to work great! I think this is very cool and useful. I did run into a few more bugs. 1. In one window I ran valgrind --vgctl-check=1 ..../gdb In other window I ran plain "gdb" then "target remote | vgctl". This fails to do anything useful: (gdb) target remote | vgctl Remote debugging using | vgctl relaying data between gdb and process 14124 0x0093f37b in ?? () I have to tell gdb about the file, like "gdb .../gdb" before setting the target; otherwise gdb can't seem to figure out what valgrind is running. In this situation, if I then exit gdb, valgrind just uses a lot of CPU but otherwise does nothing. I have to kill it from the command line. 2. If I kill a valgrind which is hung in this way, it leaves its fifo in /tmp. Then gdb complains at startup: target remote | vgctl(gdb) target remote | vgctl Remote debugging using | vgctl read directory: No such process vgctl error: reading directory /tmp/ for vgctl fifo This doesn't appear to affect operation, but it is potentially confusing. 3. Another way to wedge valgrind is to "detach". The valgrind process reports: Detaching from process 14136 ... which is a little weird, since it is itself process 14136. Again the only recourse is to kill valgrind. I should mention that I too played with Philippe's patch, and I think it will make VG 10x more useful when debugging non-obvious bugs. I urge core VG developers to help get this patch into shape for submission! (In reply to comment #7) > 1. In one window I ran valgrind --vgctl-check=1 ..../gdb > In other window I ran plain "gdb" then "target remote | vgctl". > This fails to do anything useful: From what I can see, this is a basic limitation of gdb+gdbserver: when trying with the non embedded gdbserver, gdb also can't figure out the executable. > In this situation, if I then exit gdb, valgrind just uses a lot > of CPU but otherwise does nothing. I have to kill it from the > command line. I suspect that everything works properly (i.e. valgrind continues to execute the program) but *extremely* slowly due to the --vgctl-check=1 argument: this means that every valgrind scheduler return causes a check to see if gdbserver has something to do. This check is done by a select system call. So, basically, what you have is a select system call every few instructions. Try the same with e.g. --vgctl-check=5000. Note that I intend to replace this use of select by checking a counter in a piece of shared memory between vgctl and valgrind. The check will then be quite cheap. > vgctl error: reading directory /tmp/ for vgctl fifo I will fix this. This is effectively harmless, it is just wrong error reporting. > > 3. Another way to wedge valgrind is to "detach". The valgrind > process reports: > > Detaching from process 14136 > > ... which is a little weird, since it is itself process 14136. > Again the only recourse is to kill valgrind. I think this is also caused by the too small value of --vgctl-check. Can you retry with a bigger value ? (e.g. 5000) ? Thanks for reporting these bugs. Created attachment 42319 [details]
patch that implements a gdbserver in valgrind
new version of the patch
* fixing the problems reported by Tom
* replaced polling using "poll" by looking at a shared counter between vgctl
and valgrind
* implemented calling functions from gdb
e.g. p printf("hello from gdb\n") works
!!!! if VG detects errors during this call and so calls back
gdbserver, it can sometimes go bezerk
so it is better to first do
mo vg.s v 100000
tested on fedora 12 x86, somewhat tested on fedora 12 amd64
> From what I can see, this is a basic limitation of gdb+gdbserver: when
> trying with the non embedded gdbserver, gdb also can't figure out the
> executable.
I am not very familiar with gdbserver or the remote protocol, but it
seems to me that if we can transfer the shared library list, then we
can somehow also transfer the executable's name -- at the very least via
a new remote protocol extension.
The new patch does fix the bugs I found. Thanks!
I found a couple more problems.
First, if I valgrind --vgctl-error=1, then when valgrind hits the
first error I can't interrupt it from the tty. I think maybe C-c
should still kill it.
Second, if I use valgrind --vgctl-error=0, I get a valgrind crash
when I "target remote | vgctl" and then "cont" in gdb:
opsy. valgrind --vgctl-error=0 ~/gnu/baseline-gdb/build/gdb/gdb
==30105== Memcheck, a memory error detector
==30105== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==30105== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
==30105== Command: /home/tromey/gnu/baseline-gdb/build/gdb/gdb
==30105==
IR SANITY CHECK FAILURE
IRSB {
t0:I32 t1:I32 t2:I32 t3:I32 t4:I32 t5:I32 t6:I32 t7:I32
t8:I32 t9:I32 t10:I32 t11:I32 t12:I32 t13:I32 t14:I32 t15:I32
t16:I32 t17:I32 t18:I32 t19:I32 t20:I32 t21:I32 t22:I1 t23:I32
t24:I32 t25:I32 t26:I32 t27:I1 t28:I32 t29:I32 t30:I32 t31:I32
t32:I1 t33:I32 t34:I32 t35:I32 t36:I1 t37:I32 t38:I32 t39:I1
t40:I32 t41:I32 t42:I32 t43:I32 t44:I1 t45:I32 t46:I32 t47:I1
t48:I32 t49:I32 t50:I32 t51:I1 t52:I32 t53:I32 t54:I1 t55:I32
t56:I1 t57:I1 t58:I32 t59:I32 t60:I32 t61:I32 t62:I32 t63:I32
t64:I32 t65:I1 t66:I1 t67:I32 t68:I32 t69:I32 t70:I32 t71:I32
t72:I32 t73:I1 t74:I32
------ IMark(0x8F16A1, 6) ------
PUT(60) = 0x8F16A1:I32
DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8F16A1:I32)
t21 = GET:I32(60)
t22 = CmpNE32(0x8F16A1:I32,t21)
if (t22) goto {Boring} 0x3C:I16
t23 = GET:I32(356)
t10 = GET:I32(20)
t24 = Or32(t23,0x0:I32)
t25 = Left32(t24)
t26 = t25
t9 = Add32(t10,0xFFFFFF5C:I32)
t27 = CmpNEZ32(t26)
DIRTY t27 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
t28 = 0x0:I32
t29 = DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_LOADV32le)[rp=1]{0x38014d40}(t9)
t30 = t29
t11 = LDle:I32(t9)
PUT(364) = t30
PUT(28) = t11
------ IMark(0x8F16A7, 7) ------
PUT(60) = 0x8F16A7:I32
DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8F16A7:I32)
t31 = GET:I32(60)
t32 = CmpNE32(0x8F16A7:I32,t31)
if (t32) goto {Boring} 0x3C:I16
PUT(60) = 0x8F16A7:I32
t33 = Or32(t30,0x0:I32)
t34 = Left32(t33)
t35 = t34
t12 = Add32(t11,0xC:I32)
t36 = CmpNEZ32(t35)
DIRTY t36 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
t37 = 0x0:I32
DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_STOREV32le)[rp=2]{0x38014690}(t12,0x0:I32)
STle(t12) = 0x1:I32
------ IMark(0x8F16AE, 5) ------
PUT(60) = 0x8F16AE:I32
DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8F16AE:I32)
t38 = GET:I32(60)
t39 = CmpNE32(0x8F16AE:I32,t38)
if (t39) goto {Boring} 0x3C:I16
PUT(60) = 0x8F16AE:I32
t40 = GET:I32(352)
t15 = GET:I32(16)
t41 = Or32(t40,0x0:I32)
t42 = Left32(t41)
t43 = t42
t14 = Sub32(t15,0x4:I32)
PUT(352) = t43
DIRTY 1:I1 RdFX-gst(16,4) ::: track_new_mem_stack_4[rp=1]{0x38008eb0}(t14)
PUT(16) = t14
t44 = CmpNEZ32(t43)
DIRTY t44 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
t45 = 0x0:I32
DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_STOREV32le)[rp=2]{0x38014690}(t14,0x0:I32)
STle(t14) = 0x8F16B3:I32
------ IMark(0x8FDC60, 1) ------
PUT(60) = 0x8FDC60:I32
DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8FDC60:I32)
t46 = GET:I32(60)
t47 = CmpNE32(0x8FDC60:I32,t46)
if (t47) goto {Boring} 0x3C:I16
PUT(60) = 0x8FDC60:I32
t48 = Or32(t45,0x0:I32)
t49 = Left32(t48)
t50 = t49
t16 = Sub32(t14,0x4:I32)
PUT(352) = t50
DIRTY 1:I1 RdFX-gst(16,4) ::: track_new_mem_stack_4[rp=1]{0x38008eb0}(t16)
PUT(16) = t16
t51 = CmpNEZ32(t50)
DIRTY t51 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
t52 = 0x0:I32
DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_STOREV32le)[rp=2]{0x38014690}(t16,t23)
STle(t16) = t10
------ IMark(0x8FDC61, 2) ------
PUT(60) = 0x8FDC61:I32
DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8FDC61:I32)
t53 = GET:I32(60)
t54 = CmpNE32(0x8FDC61:I32,t53)
if (t54) goto {Boring} 0x3C:I16
PUT(356) = t52
PUT(20) = t16
------ IMark(0x8FDC63, 1) ------
PUT(60) = 0x8FDC63:I32
DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8FDC63:I32)
t55 = GET:I32(60)
t56 = CmpNE32(0x8FDC63:I32,t55)
if (t56) goto {Boring} 0x3C:I16
PUT(60) = 0x8FDC63:I32
t57 = CmpNEZ32(t52)
DIRTY t57 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
t58 = 0x0:I32
t59 = DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_LOADV32le)[rp=1]{0x38014d40}(t16)
t60 = t59
t5 = LDle:I32(t16)
t61 = Or32(t58,0x0:I32)
t62 = Left32(t61)
t63 = t62
t19 = Add32(t16,0x4:I32)
PUT(352) = t63
DIRTY 1:I1 RdFX-gst(16,4) ::: track_die_mem_stack_4[rp=1]{0x38012d60}(t19)
PUT(16) = t19
PUT(356) = t60
PUT(20) = t5
------ IMark(0x8FDC64, 1) ------
PUT(60) = 0x8FDC64:I32
DIRTY 1:I1 RdFX-gst(16,4) MoFX-gst(60,4) ::: VG_(helperc_CallDebugger)[rp=1]{0x380654e0}(0x8FDC64:I32)
t64 = GET:I32(60)
t65 = CmpNE32(0x8FDC64:I32,t64)
if (t65) goto {Boring} 0x3C:I16
PUT(60) = 0x8FDC64:I32
t66 = CmpNEZ32(t63)
DIRTY t66 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
t67 = 0x0:I32
t68 = DIRTY 1:I1 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_LOADV32le)[rp=1]{0x38014d40}(t19)
t69 = t68
t8 = LDle:I32(t19)
t70 = Or32(t67,0x0:I32)
t71 = Left32(t70)
t72 = t71
t20 = Add32(t19,0x4:I32)
PUT(352) = t72
DIRTY 1:I1 RdFX-gst(16,4) ::: track_die_mem_stack_4[rp=1]{0x38012d60}(t20)
PUT(16) = t20
DIRTY 1:I1 ::: VG_(helperc_InvalidateIfNotGdbserved)[rp=1]{0x380649f0}(t8)
t73 = CmpNEZ32(t69)
DIRTY t73 RdFX-gst(16,4) RdFX-gst(60,4) ::: MC_(helperc_value_check4_fail_no_o){0x38014070}()
t74 = 0x0:I32
goto {Return} t8
}
IN STATEMENT:
if (t22) goto {Boring} 0x3C:I16
ERROR = IRStmt.Exit.dst: not :: guest word type
vex: the `impossible' happened:
sanityCheckFail: exiting due to bad IR
vex storage: T total 12935320 bytes allocated
vex storage: P total 576 bytes allocated
valgrind: the 'impossible' happened:
LibVEX called failure_exit().
==30105== at 0x38028DC5: report_and_quit (m_libcassert.c:191)
==30105== by 0x38028E26: panic (m_libcassert.c:275)
==30105== by 0x38028E7D: vgPlain_core_panic_at (m_libcassert.c:280)
==30105== by 0x38028E98: vgPlain_core_panic (m_libcassert.c:285)
==30105== by 0x38043E06: failure_exit (m_translate.c:674)
==30105== by 0x380CA97E: vpanic (main_util.c:237)
==30105== by 0x380CE3CC: sanityCheckFail (ir_defs.c:2245)
==30105== by 0x380CFA98: sanityCheckIRSB (ir_defs.c:2793)
==30105== by 0x380C9307: LibVEX_Translate (main_main.c:549)
==30105== by 0x38041581: vgPlain_translate (m_translate.c:1518)
==30105== by 0x38072351: vgPlain_scheduler (scheduler.c:866)
==30105== by 0x3809F3C4: run_a_thread_NORETURN (syswrap-linux.c:94)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable
==30105== at 0x8F16A1: dl_main (rtld.c:1627)
==30105== by 0x903DAD: _dl_sysdep_start (dl-sysdep.c:243)
==30105== by 0x8EF25B: _dl_start (rtld.c:328)
==30105== by 0x8EE856: ??? (in /lib/ld-2.10.2.so)
Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.
If that doesn't help, please report this bug to: www.valgrind.org
In the bug report, send all the above text, the valgrind
version, and what OS and version you are using. Thanks.
(In reply to comment #11) > > From what I can see, this is a basic limitation of gdb+gdbserver: when > > trying with the non embedded gdbserver, gdb also can't figure out the > > executable. > > I am not very familiar with gdbserver or the remote protocol, but it > seems to me that if we can transfer the shared library list, then we > can somehow also transfer the executable's name -- at the very least via > a new remote protocol extension. Yes, probably changing gdb and/or the protocol will allow to do that. I suppose that the limitation is at gdb side, in the "target remote": probably this command assumes that it knows the executable file and so does not search it on the side of gdbserver. > First, if I valgrind --vgctl-error=1, then when valgrind hits the > first error I can't interrupt it from the tty. I think maybe C-c > should still kill it. The handling of signals is still a limitation but in any case, I think this particular thing cannot be done: I think what is happening is that valgrind is blocking the signals while it is not executing client code. When the gdbserver code is called, this is valgrind code being executed. Once valgrind executes client code (so, after gdb has attached and told it to continue), the signal will be delivered and will interrupt the process. It is not clear to me there is a way to handle the "client" signals while valgrind code is running. So, I believe this will have to stay like that (but if a valgrind developper reads this, he/she might correct my interpretation) > > Second, if I use valgrind --vgctl-error=0, I get a valgrind crash > when I "target remote | vgctl" and then "cont" in gdb: ... > if (t22) goto {Boring} 0x3C:I16 > > ERROR = IRStmt.Exit.dst: not :: guest word type That was a real mystery to me, till I checked the patch content. I thought "svn diff" in valgrind directory would give the diffs for valgrind and VEX, but it only gives the diff for valgrind, not for the VEX subdir. So, I will attach a new patch with the VEX part included. I am amazed that the half-patch is working at all. Thanks for the tests Created attachment 42364 [details]
patch that implements a gdbserver in valgrind (with VEX parts now)
no functional changes compared to previous patch, only including VEX part
+ better msg in call to sanityCheckFail.
Thanks, this patch is much better :-) I have been thinking a little about how to make the gdb experience nicer. In particular I am thinking of adding a new "target valgrind", which will reuse the remote code under the hood. With a real "target valgrind", we could support "attach" and "run" properly. I think the latter may require a small change to vgctl and valgrind (to make it possible to run vgctl and have it start valgrind and attach to it directly). I'd appreciate feedback on this idea. I could start work on it next week. (In reply to comment #14) > With a real "target valgrind", we could support "attach" > and "run" properly. I think the latter may require a > small change to vgctl and valgrind (to make it possible > to run vgctl and have it start valgrind and attach to it > directly). > > I'd appreciate feedback on this idea. I could start work > on it next week. If gdb could somewhat better understand the valgrind target, that would be an excellent idea. Note however that up to now, there was no feedback on this work from the (real :) valgrind developpers. I think that *if* the approach/patch is found relevant, that there will be some serious additional work to do before gdbserver can be added in valgrind code basis: * some bugs to chase and/or nice to have and/or mandatory : 1. a break is "encountered" twice if you 'print hello_world()' from gdb when the break is encountered. This has something to do with the addition of breaks by gdb at a place which is not an instruction boundary. I am busy investigating that. 2. if you 'print hello_world()' from gdb, and the execution of this code gives valgrind errors to be reported, then the whole stuff crashes. The 'mo vg.s v 1000000' bypass works (seems to work?) but is quite horrible. Understanding why it crashes would be a good thing. 3. Mandatory: the tests are for the moment quite basic and manual: only documented in the README_DEVELOPPERS. IMO, we must have automatic tests for all that. 4. Mandatory: writing documentation. 5. Signal handling is "nice to have". So, the work to adapt gdb to a valgrind gdbserver which has not yet been added in valgrind might be somewhat premature :). But for sure, a good thing to put in the "to do list", so here it is: 6. Have a better valgrind target support in gdb, allowing e.g. run or attach. At this phase of the work, have feedback from the valgrind masters (in particular, on the changes in VEX, scheduler.c, aspacemgr.c and the global approach) would be an encouraging sign to start updating gdb. Philippe I am reading all the discussion, but I was waiting for some indication that the work was complete before reviewing it and there seems to still be active discussion. It sounds like there is stuff here that will need Julian's input as well though. (In reply to comment #16) > I am reading all the discussion, but I was waiting for some indication that the > work was complete before reviewing it and there seems to still be active > discussion. The work is for sure not yet complete (see todo list in previous post). But I think it is in a state at which feedback would be very useful. Note that fully completing the work implies to have access to Darwin/AIX/ppc/(arm?) platforms. I only have access to an x86/fedora and an intermittent access to an amd64/fedora. Could someone please add a short summary of what a GDB remote debug server stub does, and what basic functionality is required of the process/system into which the stub is embedded? I know nothing about this. And in particular I'd like to know how the stub interacts with Valgrind's basic control flow for the simulated program. I ask because I would like to figure out if this breaks any underlying structural assumptions in Valgrind's virtual cpu, or in its handling of threads, syscalls and signals. (In reply to comment #18) > I ask because I would like to figure out if this breaks any underlying > structural assumptions in Valgrind's virtual cpu, or in its handling > of threads, syscalls and signals. Ultra short answer: when valgrind wants to be debugged by a gdb (typically when an error is reported by m_errormgr.c), it does read system calls to get commands from gdb, and gives replies using write system call. At selected places in the valgrind code, some calls to gdbserver are inserted to ask gdb what to do for some events (e.g. a signal). Longer answer (the below is information which is in README_DEVELOPPERS of the next patch version I will put here) Functional and technical description of gdbserver in valgrind ------------------------------------------------------------- gdb debugger typically is used to debug a process running on the same machine : gdb uses system calls (such as ptrace) to fetch data from the process being debugged or to change data in the process or interrupt the process or ... gdb can also debug processes running in a different computer (e.g. it can debug a process running on a small real time boards). gdb does this by sending some commands (e.g. using tcp/ip) to a piece of code running on the remote computer. This piece of code (called a gdb stub in small boards, or gdbserver when the remote computer runs an OS such as GNU/linux) will provide a set of commands allowing gdb to remotely debug the process. Examples of commands are: "get the registers", "get the list of running threads", "read len bytes at address xxxx", etc. The definition of all these commands and the associated replies the gdb remote serial protocol, which is documented in Appendix D of gdb user manual. gdb has a standalone gdbserver (a small executable) which implements this protocol and the needed system calls to allow gdb to remotely debug process running on a linux or MacOS or ... The gdbserver code has been modified so as to link it with valgrind and allow the valgrind guest process to be debugged by a gdb speaking to this gdbserver embedded in valgrind. The gdbserver functionality is activated with valgrind command line options. If these options are not given, then the impact on valgrind runtime is minimal: basically just check the command line option to see that there is nothing to do for what concerns gdb server: there is a "if gdbserver is active" check in the instrument function of the tool (if the tool is made "breakable and stepable", and in the valgrind scheduler. The gdbserver can be activated of two (non-exclusive) manners: * --vgdb-error=xxxx allows gdb to debug the guest when an error is reported The xxxx gives the error nr from which valgrind will call the gdbserver code. This allows to debug either right from the begin (give 0 as xxxx) or from a specific error onwards. and/or * --vgdb-check=yyyyy allows gdb to "interrupt" the guest so as to debug it. The scheduler will check every yyyyy blocks if gdbserver wants to interrupt the guest to debug it. Note that the check is not doing a system call: it verifies a counter in shared memory. Whenever gdb writes something to valgrind, this counter is incremented. The same mechanism can be used without gdb: a user request (e.g. "search leaks", "dump callgrind file", ... can be given to the guest using a small standalone executable. How does gdbserver code interacts with valgrind ? When an error is reported, the gdbserver code is called. It reads commands from gdb using read system call on a FIFO (e.g. a command such as "get the registers"). It executes the command (i.e. fetches the registers from the guest state) and writes the reply (a packet containing the register data). When gdb instructs gdbserver to "continue", the control is returned to valgrind, which then continues to execute guest code. How are signals "handled" ? When a signal is to be given to the guest, valgrind first calls gdbserver. If gdb instructs to give the signal to the process, the signal is delivered to the guest. Otherwise, the signal is ignored (not given to the guest). How are "break/step/stepi/next/..." implemented ? When a break is put by gdb on an instruction, a command is sent to the gdbserver in valgrind. This causes the basic block of this instruction is re-instrumented so as to insert calls to dirty helper which call the gdb server code. When a block is instrumented for gdbserver, all the "jump targets" of this block are invalidated, so as to allow step/stepi/next to properly work: these blocks will themselves automatically be re-instrumented for gdbserver if they are jumped to. The embedded gdbserver will read gdb commands on a named pipe having (by default) the name /tmp/vgdb-pipe-from-gdb-to-%d where %d will be replaced by the pid. The embedded gdbserver will reply to gdb commands on a named pipe /tmp/vgdb-pipe-to-gdb-from-%d gdb does not speak directly with gdbserver in valgrind: a relay application called vgdb is needed between gdb and the valgrind-ified process. gdb writes commands on the stdin of vgdb. vgdb reads these commands and writes them on FIFO /tmp/vgdb-pipe-from-gdb-to-%d. vgdb reads replies on FIFO /tmp/vgdb-pipe-to-gdb-from-%d and writes them on its stdout. Note: The solution of named pipes was preferred to tcp ip connections as it allows a discovery of which valgrind-ified processes are ready to accept command by looking at the /tmp. Also, the usual unix protections are protecting the valgrind process against other users sending commands. gdb side is all setup up by the following gdb command: (gdb) target remote | vgdb --pid=1234 assuming the valgrind-ified process you want to debug/attach to has 1234 as pid. If there is only one valgrind-ified process to be debugged, vgdb can find it so it is enough to do: (gdb) target remote | vgdb Implementation is based on the gdbserver code from gdb ------------------------------------------------------ This directory and sub-directories contain files needed for a gdbserver embedded in valgrind. This embedded gdbserver allows a valgrind-ified process to be debugged with an external gdb. The gdbserver implementation is derived from the gdbserver included in the gdb distribution. The modifications to the gdbserver are quite trivial and have been done so as to minimise the differences implied for integrating gdbserver in valgrind. This allows either to include the changes in the official gdbserver upstream or at least allows an easier comparison (and merge) between a newer gdb gdbserver and the version in valgrind. Such minimal changes have been ensured by using the preprocessor to replace calls by valgrind equivalent, e.g. #define read(...) VG_(read) (...). Minimal changes have also be ensured by using "goto" at 3 places, rather than split a big function in different functions. The only non-trivial changes are due to the fact that gdbserver inside valgrind must return the control to valgrind when the 'debugged' process has to run, while in a classical gdbserver usage, the gdbserver process waits for a debugged process to stop on a break or similar. This has implied to have some variables to remember the state of gdbserver before returning to valgrind (search for int resume_phase in server.c) and "goto" the place where gdbserver expects a stopped process to return control to gdbserver. The above allowed to obtain relatively easily a working gdbserver stub in valgrind with relatively few modifs in the gdbserver code itself. However, this has also revealed that there is quite some significant infrastructure framework useful for the standalone gdbserver of gdb, but which is not useful for a gdbserver embedded in valgrind. So, I will see if we can remove some big pieces of code which provides things not useful for valgrind. Created attachment 42850 [details]
patch that implements a gdbserver in valgrind
Functional changes:
* signal handling properly implemented (i.e. gdb "handle" command)
* vgctl replaced everywhere by vgdb (shorter to type and more clear)
* info threads gives additional info about the state of valgrind threads
* massif has been made breakable/stepable + use command to take a snapshot
* gdb can look and modify the shadow registers
(this implies to have the svn version of gdb)
> * --vgdb-check=yyyyy
> allows gdb to "interrupt" the guest so as to debug it.
> The scheduler will check every yyyyy blocks if gdbserver
> wants to interrupt the guest to debug it.
> Note that the check is not doing a system call: it verifies
> a counter in shared memory. Whenever gdb writes something
> to valgrind, this counter is incremented.
> The same mechanism can be used without gdb:
> a user request (e.g. "search leaks", "dump callgrind file", ...
> can be given to the guest using a small standalone executable.
Ah, so gdbserver is another feature needing to handle asynchronous
requests from the outside. You found a slightly better way to handle
this than callgrind does now (using shared memory without system call).
However, this does not solve my basic issue with the current
implementation: if VG is sleeping in some system call, no BBs
are executed, thus VG is unable to react on user requests from
the outside.
Perhaps it is possible to abuse a real time signal for this together with
incrementing of the counter? If the counter is not increased, the signal
would be forwarded to the client application.
(In reply to comment #21) > Ah, so gdbserver is another feature needing to handle asynchronous > requests from the outside. You found a slightly better way to handle > this than callgrind does now (using shared memory without system call). > > However, this does not solve my basic issue with the current > implementation: if VG is sleeping in some system call, no BBs > are executed, thus VG is unable to react on user requests from > the outside. Effectively, there is no big difference between the callgrind way to be "interrupted" and what is used for gdbserver. The two main differences are: * the polling done in scheduler is just checking a counter in shared mem * the mechanism is done at a central place, allowing "user commands" to be written to any tool (via the tool "handle_client_request") (currently, I have added external command handling in memcheck and massif). > > Perhaps it is possible to abuse a real time signal for this together with > incrementing of the counter? If the counter is not increased, the signal > would be forwarded to the client application. Currently, there is only one counter: the counter is increased by gdb (more precisely by the vgdb relay application) and is only looked by valgrind scheduler. Sending a signal to wake up the process if it does not react fast enough would imply to have two counters: one written by vgdb and read by valgrind scheduler, the other counter written by valgrind scheduler and read by vgdb. This is quite easy to do. The tricky part looks to be the signal: if vgdb sends a signal to valgrind, it will interrupt the "user system call". Valgrind has then to ensure that the system call is properly restarted. At least for the initial implementation of gdbserver, I wanted to limit the interaction between valgrind and gdbserver to the strict mininum. In particular, signal handling looks somewhat adventurous. Note that if we can manage to wake up valgrind with a signal, I am not sure we still need the shared memory stuff. But clearly, the "blocked in system call" is an annoying limitation. If/when this gdbserver stuff is integrated in valgrind, then it would be a good idea to try to make the "interrupt the tool" better. Created attachment 43068 [details]
patch that implements a gdbserver in valgrind
Functional changes:
* gdb/vgdb can now properly "interrupt" a valgrind process that has
all its threads blocked in a syscall (NB: this is not using a signal
so guest system calls should not be influenced/EINTR-upted
* callgrind user commands to dump/zero callgrind statistics
(so, Josef, you might try to dump stats using:
vgdb ct.dump
or use vgdb ct.zero to zero_all_cost)
and that should properly dump the stats of a syscall blocked process)
(you need a recent linux kernel to have the above working.
See coregrind/m_gdbserver/README_DEVELOPPERS for some more info)
Created attachment 43412 [details]
patch for a gdbserver in valgrind ("1.0 release ?")
Functionally, I think this patch version can be a "1.0" release.
(see below a short description of the latest changes).
Technically, it has been tested using small tests, and on two
big applications (a.o. firefox 3.5).
If found relevant to integrate in valgrind, to improve the technical
aspects, it would be nice to have
* feedback about the GPL3 versus GPL2 (to see if I must or not
rework on the basis of gdbserver 6.6).
* a review (e.g. of the changes in the valgrind tools
and coregrind). Review of the gdbserver protocol files is probably
less critical : I will further cleanup all this once the GPL3/GPL2
is clarified.
* feedback about the idea to automate the gdb/valgrind tests in vgreg_test
using the perl expect module (see unanswered post on the 13 April
on valgrind developpers list).
--------------------------------------
User visible changes in the latest patch:
* valgrind vgdb args somewhat reworked
--vgdb=no|yes activate external command and gdb server? [no]
--vgdb-poll=<number> poll for gdb/vgdb every <number> basic blocks [5000]
--vgdb-error=<number> wait for gdb/vgdb after <number> errors [1]
--vgdb-shadow-registers=no|yes let gdb see the shadow registers [no]
--vgdb-prefix=<prefix> prefix for vgdb FIFOs [/tmp/vgdb-pipe]
* default length value of 1 byte for mc user memory related commands
(e.g. get_vbits)
* mc.check_memory now describes the address being checked
Most significant technical changes
* simplified the interface to instrument for gdbserver: only one
function call needed in the tool instrument function.
* replaced the read/write threads in vgdb.c by poll system call.
* various bug fixes
I've had a (fairly) quick look over the core stuff - that is to say the changes to coregrind mainly but not including the actual m_gdbserver code. Some major questions: * I don't know what Julian thinks, but I think the idea of a "string request" is probably a bit too generic - a "gdb command" client request might be better. * I think we need to think about the select/poll interface in m_libcfile as the select interface in particular is a pretty kludgy clone of the system header files. In fact, is it even used? Only GDBSERVER_USE_POLL seems to be defined which makes everything use poll instead of select? * I'm not sure exposing VG_(vgdb_next_poll) outside the scheduler is a good idea - maybe have VG_(disable_vgdb_polling) and VG_(force_vgdb_poll) routines in scheduler instead to hide details? * What do the changed in VG_(show_all_errors) have to do with gdb support? * Test is is_sig_ign should be reversed as calling gdb to ask about the signal is much more expensive that checking a local variable. Minor points: * Both VG_(clo_vgdb_shadow_registers) and VGDB_DEFAULT are defined in pub_tool_options.h but neither is used by any tools so pub_core_options.h would be more appropriate. * VGDB_DEFAULT should probably be VG_CLO_VGDB_PREFIX_DEFAULT or something. * You seem to have removed the full stop from the end of the comment that appears before VG_(strncpy_safely) in pub_tool_libcbase.h * I'm not sure it's necessary to explain VG_(strtok) and VG_(strtok_r) in quite such detail in pub_tool_libcbase.h if they behave the same as the normal C library versions. * The "is this the good place" comment in valgrind.h needs to be removed. * If VG_(mknod) needs to be able to return error details it should return a SysRes object, otherwise just -1 on any error. * There seems to be no need for VG_(fd_open) as it just calls VG_(open). * Both VG_(strtoull10) and VG_(sttrtoull16) should return ULong not UWord so that they match the unsigned versions. * In VG_(synth_sigill) etc there is no need to have the resume_scheduler call in both branches of the if - just put it before the if. (In reply to comment #25) > > * I don't know what Julian thinks, but I think the idea of a "string request" is > probably a bit too generic - a "gdb command" client request might be better. VG_USERREQ__STRING_REQUEST = 0x1202, /* GDBTD??? is this the good place */ could be changed to VG_USERREQ__GDB_MONITOR_COMMAND = 0x1202, > * I think we need to think about the select/poll interface in m_libcfile as > the select interface in particular is a pretty kludgy clone of the system > header files. In fact, is it even used? Only GDBSERVER_USE_POLL seems to > be defined which makes everything use poll instead of select? 1st implementation was based on select, was replaced by poll as the select gave problem on a "google modified ubuntu" (which has a huuuuge nr of fd). As poll system call is available on MacOS, I will remove the select interface and the GDBSERVER_USE_POLL config symbol. > * I'm not sure exposing VG_(vgdb_next_poll) outside the scheduler is a good > idea - maybe have VG_(disable_vgdb_polling) and VG_(force_vgdb_poll) routines > in scheduler instead to hide details? Good idea, I will look at this. > * What do the changed in VG_(show_all_errors) have to do with gdb support? The vgdb/gdb monitor command "vg.info all_errors" is implemented by calling VG_(show_all_errors). But this function was up to now only called once (at the end of the run of VG). If called multiple times (interactively in gdb or by vgdb), after the 1st call, VG_(show_all_errors) does not output the errors anymore. The changes done allows VG_(show_all_errors) to be called more than once by ensuring the call has no side effect on the errors. > * Test is is_sig_ign should be reversed as calling gdb to ask about the signal > is much more expensive that checking a local variable. I will reverse the condition order. (note that unless a gdb is connected and is really interested in seeing a specific signal, the call to VG_(gdbserver_report_signal) will not do any interaction with gdb). > Minor points: All minor points looks clear to me, except the below. > * In VG_(synth_sigill) etc there is no need to have the resume_scheduler > call in both branches of the if - just put it before the if. The call to resume_scheduler will (at least sometimes) do a longjmp. So I think the signal check with gdbserver must be done before the call to resume_scheduler. But re-reading m_signals.c, I am not sure I understand how all this works: sometimes, we first have a call to deliver_signal, followed by resume_scheduler, and sometimes it is the other way around. I do not understand the logic for choosing one order or the other. Created attachment 49953 [details]
new version : based on GPLv2+ code, handled Tom Hughes comments
new version of the patch
* handled Tom Hughes comments
* done GPLv2+ switch asked by Julian (so, based on gdb 6.6 code)
* some code clean up and fixes
Created attachment 58192 [details]
gdbserver patch, with regression tests and user manual
New version of the gdbserver patch (most important changes are regression tests, user manual (read e.g. sections 2.7, 2.8, 2.9, and the 'tool monitor commands' e.g. 4.6). Regression tests have been run on a fedora 12 32 bits, fedora 14 64 bits, ubuntu 10.10 32 bits, red-hat 5.2 64 bits. Tests done with gdb 7.2, some tests redone with gdb 7.0. A previous version was compiled and manually tested on MacOS (regression tests not run). ARM, s390x, ppc32/ppc64 : some additional work needed to make it work (mostly a "switch" mapping gdb registers to valgrind registers + dummy call push in vgdb). Read user manual and coregrind/m_gdbserver/README_DEVELOPPERS and gdbserver_tests/README_DEVELOPPERS for more info. How to use the patch: mkdir patch_trial cd patch_trial svn co -r 11657 svn://svn.valgrind.org/valgrind/trunk valgrind cd valgrind patch -p0 -i /tmp/patch.txt ./autogen.sh ./configure --prefix=`pwd`/install make 2>&1 | tee m.out cd docs make html-docs cd .. make install 2>&1 | tee -a m.out # can use the installed valgrind having gdbserver patch e.g. install/bin/valgrind --help # and read the installed doc: firefox install/share/doc/valgrind/html/index.html # e.g. Valgrind gdbserver related user manual sections 2.7, 2.8, 2.9, # and the 'tool monitor commands' e.g. 4.6 # to run the new gdbserver_tests, empty files are not # properly created by svn diff + patch (bug in patch for empty files) # So, create them by touch all the below files (possibly svn add them) cd gdbserver_tests touch \ ./mcbreak.stderr.exp \ ./mchelp.stderr.exp \ ./mcinfcallRU.stderrB.exp \ ./mcinvokeRU.stderr.exp \ ./mcinvokeWS.stderr.exp \ ./mcsignopass.stderrB.exp \ ./mcsigpass.stderrB.exp \ ./mcvabits.stderr.exp \ ./mcwatchpoints.stderrB.exp \ ./mssnapshot.stderr.exp \ ./nlcontrolc.stderrB.exp # it is also needed to make some files executables # (possibly svn propset the executable property) chmod +x filter_* make_local_links simulate_control_c invoker cd .. make regtest 2>&1 | tee r.out Created attachment 58193 [details]
patch for s390x compile
Patch that is needed to get a successful compilation on s390x.
Created attachment 58809 [details]
gdbserver+test+documentation. x86+amd64+arm+ppc32+ppc64+s390x
* ported + tested on linux + x86/amd64/arm/ppc32/ppc64/s390x
* Darwin: not compiled recently, never reg-tested.
basic functionalities should work (were tested manually)
Created attachment 59040 [details]
Upgraded to rev 11700 + compiles on Darwin + small changes (e.g. new ppc registers, doc, ..)
All linux platforms should be ok.
Darwin: it compiled on Darwin 10.7.0.
I had to disable the fixup_macho_loadcmds which crashed and seemed
not necessary (compiled with gcc on darwin):
link_tool_exe_darwin: ../coregrind/fixup_macho_loadcmds 0x134000000 0x800000 memcheck-amd64-darwin
fixup_macho_loadcmds: requested stack_addr (top) 0x134000000, stack_size 0x800000
fixup_macho_loadcmds: examining tool exe: memcheck-amd64-darwin
fixup_macho_loadcmds: fail: unexpected load command in Mach header
Limited manual test of gdbserver shows it seems ok.
However, many gdbserver reg tests are not ok on Darwin (behaviour differs
from linux systems for e.g. stack trace in gdb) but I have only a very
limited access to a Mac, so not time to update the tests for Darwin.
(In reply to comment #32) > Created an attachment (id=59040) [details] > Upgraded to rev 11700 + compiles on Darwin This is a big changeset. I started out by looking at only the changed files, not the new ones, by applying the patch to trunk, then doing "svn diff -rHEAD". Here are some comments related to the changed files only. I did not yet look at the new files. Larger concerns --------------- Mostly looks OK. My only large concern is that it looks like this patch also adds some Memcheck leak checker functionality and therefore is not directly related to implementing a GDB server. Can you clarify about this? If this does add leak checker functionality, it would be better to add that as a separate patch/bug report that can be reviewed/applied separately. Smaller comments ---------------- callgrind/main.c #include <pub_tool_threadstate.h> +#include "pub_tool_gdbserver.h" first one looks like it was always wrong -- < > vs " " (no problem w/ the patch) mc_main.c: why did parse_range get globalised (--> VG_(parse_range) ) ? probably need to rename parse_range to something better mc_get_or_set_vbits_for_client: check this isn't performance critical (on load/store/sarp paths) (I don't think it is) change param name client_request to is_client_request +static Bool handle_gdb_monitor_command (ThreadId tid, Char *req) +{ + Char* wcmd; + Char s[VG_(strlen(req))]; /* copy for strtok_r */ dangerous; possible DOS if strlen(req) is huge? overflow check? + case VG_USERREQ__GDB_MONITOR_COMMAND: { + /* we need to avoid calling gdbserver when handling the + gdb_monitor_command so we temporarily disable clo_vgctl_error + effect */ + Int save_clo_vgdb_error = VG_(clo_vgdb_error); + VG_(clo_vgdb_error) = 999999999; + Bool handled = handle_gdb_monitor_command (tid, (Char*)arg[1]); + VG_(clo_vgdb_error) = save_clo_vgdb_error; seems like a nasty kludge (passing a parameter as a global variable?) can it be changed to be a parameter to handle_gdb_monitor_cmd ? memcheck/mc_translate.c: changes are unnecessary (look at the diff!) include/pub_tool_libcbase.h new fn VG_(memmem) ? what does it do?? At least add some comment. rm __builtin_expect, use LIKELY coregrind/m_aspacemgr.c Why are changes needed here? m_aspacemgr is quite fragile; I prefer not to change it unless absolutely necessary. coregrind/m_scheduler + /* do vgdb initialization (but once). Only the first (main) task + starting up will do the below. */ + if (!vgdb_startup_action_done) { Urr .. couldn't this initialisation be moved into one of the functions that already exist, to initialise the scheduler? VG_(scheduler_init_phase1) or VG_(scheduler_init_phase2) coregrind/m_libcproc.c +Int VG_(prctl) (Int option, + ULong arg2, ULong arg3, ULong arg4, ULong arg5) What's prctl used for (in the GDB server)? IOW, why do we need it? (In reply to comment #33) > Larger concerns > --------------- > > Mostly looks OK. My only large concern is that it looks like this > patch also adds some Memcheck leak checker functionality and therefore > is not directly related to implementing a GDB server. Can you clarify > about this? If this does add leak checker functionality, it would be > better to add that as a separate patch/bug report that can be > reviewed/applied separately. An objective of the gdbserver is to provide an interactive usage of Valgrind. Interactive leak "delta" searching (at least on big applications) is only usable when it provides a "delta leak search", or "delta malloc" i.e. showing new leaks or new malloc since the previous leak search (see user manual 4.6 Memcheck Monitor Commands). The leak search algorithm itself is not touched, only the reporting of results. But if you deem it necessary, it can for sure be splitted and looked at it later (but it implies to remove the first arg of mc.leak_check command). > callgrind/main.c > #include <pub_tool_threadstate.h> > +#include "pub_tool_gdbserver.h" > > first one looks like it was always wrong -- < > vs " " > (no problem w/ the patch) I can fix the < > (unless you prefer to do that in a separate patch). > mc_main.c: > why did parse_range get globalised (--> VG_(parse_range) ) ? > probably need to rename parse_range to something better I thought globalising together parse_Addr and parse_range was a good idea, but there is only one caller of parse_range so globalising is not needed. I can either unglobalise it or rename it VG_(parse_Addr_range). As you prefer ... > > > mc_get_or_set_vbits_for_client: check this isn't > performance critical (on load/store/sarp paths) > (I don't think it is) There are now 3 calls (non performance critical) to this function : the previous 2 calls (client user requests get/set vbits) + a new call (gdbserver mc.get_vbits monitor command). > change param name client_request to is_client_request done. > > > +static Bool handle_gdb_monitor_command (ThreadId tid, Char *req) > +{ > + Char* wcmd; > + Char s[VG_(strlen(req))]; /* copy for strtok_r */ > dangerous; possible DOS if strlen(req) is huge? overflow check? "real" DOS implies collaboration of the VG user (as the access to Valgrind gdbserver is protected using the FIFO rw permissions, which by the way is better than the "classical" gdbserver, which is not protected). gdb checks the packet length before sending it (max slightly above 8Kb). I just added a packet length check in vgdb. These commands are normally small (e.g. "mc.leak_check increased leakpossible"). > + Int save_clo_vgdb_error = VG_(clo_vgdb_error); > + VG_(clo_vgdb_error) = 999999999; > + Bool handled = handle_gdb_monitor_command (tid, (Char*)arg[1]); > + VG_(clo_vgdb_error) = save_clo_vgdb_error; > seems like a nasty kludge (passing a parameter as a global > variable?) can it be changed to be a parameter > to handle_gdb_monitor_cmd ? The idea behind that is that if a monitor command causes an error to be reported, that it does not causes gdbserver to be called again (too confusing). But all the various paths that can lead to an error being reported can't all get an additional argument. In any case, it is better to have this clo_vgb_error (somewhat nasty kludge) be centralised in a place common to all tools (which I will do). > memcheck/mc_translate.c: > changes are unnecessary (look at the diff!) Cleaned up. > include/pub_tool_libcbase.h > new fn VG_(memmem) ? what does it do?? At least add some > comment. I have removed it (a previous version of gdbserver needed a "locate substring"). > coregrind/m_aspacemgr.c > Why are changes needed here? m_aspacemgr is quite fragile; I > prefer not to change it unless absolutely necessary. This allows to map the shared memory used to communicate with vgdb.c. There is no change in the logic of the address space manager. The changes are : - allow to map a file MAP_SHARED (while it was just allowing MAP_PRIVATE) - make it visible by the core (to be called by a gdbserver file). > coregrind/m_scheduler > + /* do vgdb initialization (but once). Only the first (main) task > + starting up will do the below. */ > + if (!vgdb_startup_action_done) { > Urr .. couldn't this initialisation be moved into one of the > functions that already exist, to initialise the scheduler? > VG_(scheduler_init_phase1) or VG_(scheduler_init_phase2) From my memory: trying to init gdbserver earlier was not ok, as it would "show" to gdb a process without a valid thread. I will however look at this more in depth, and add a comment explaining the reason (or initialize it earlier if I see it is ok) > coregrind/m_libcproc.c > +Int VG_(prctl) (Int option, > + ULong arg2, ULong arg3, ULong arg4, ULong arg5) > What's prctl used for (in the GDB server)? IOW, why do we > need it? This is needed to (as best as possible) handle the recent ubuntu ptrace_scope restriction. See user manual 2.7.7 limitations of Valgrind gdbserver (the bullet about ptrace). (In reply to comment #34) > Interactive leak "delta" searching (at least on big applications) is > only usable when it provides a "delta leak search", or "delta malloc" > i.e. showing new leaks or new malloc since the previous leak search (see user > manual 4.6 Memcheck Monitor Commands). Yes .. it is useful, and something similar is proposed for Massif (bug 269067). > But if you deem it necessary, it can for sure be splitted and looked at > it later I would prefer it to be split and made into a separate patch (which for sure we should also integrate before 3.7.0). > > first one looks like it was always wrong -- < > vs " " > > (no problem w/ the patch) I'll fix this in trunk. > > mc_main.c: > > why did parse_range get globalised (--> VG_(parse_range) ) ? > > probably need to rename parse_range to something better > I thought globalising together parse_Addr and parse_range was a good idea, > but there is only one caller of parse_range so globalising is not needed. > > I can either unglobalise it Unglobalise pls. Having stuff global, that doesn't need to be global, is confusing. > > mc_get_or_set_vbits_for_client: check this isn't > > performance critical (on load/store/sarp paths) > > (I don't think it is) > There are now 3 calls (non performance critical) to this function : > the previous 2 calls (client user requests get/set vbits) > + a new call (gdbserver mc.get_vbits monitor command). ok. > > + Int save_clo_vgdb_error = VG_(clo_vgdb_error); > > + VG_(clo_vgdb_error) = 999999999; > > + Bool handled = handle_gdb_monitor_command (tid, (Char*)arg[1]); > > + VG_(clo_vgdb_error) = save_clo_vgdb_error; > > seems like a nasty kludge (passing a parameter as a global > > variable?) can it be changed to be a parameter > > to handle_gdb_monitor_cmd ? > > The idea behind that is that if a monitor command causes an error to be > reported, that it does not causes gdbserver to be called again (too > confusing). > But all the various paths that can lead to an error being reported can't all > get an additional argument. > In any case, it is better to have this clo_vgb_error (somewhat nasty > kludge) be centralised in a place common to all tools (which I will > > do). The VG_(clo_xxx) variables are intended to hold information from the command line options, set once at startup, and not changed after that. Can you do this by exporting from m_gdbserver (in pub_core_gdbserver.h) a global variable that has this role? It's not good for the clo_ variables to hold state that changes as the system runs. > > coregrind/m_aspacemgr.c > > Why are changes needed here? m_aspacemgr is quite fragile; I > > prefer not to change it unless absolutely necessary. > > This allows to map the shared memory used to communicate with vgdb.c. > There is no change in the logic of the address space manager. > The changes are : > - allow to map a file MAP_SHARED (while it was just allowing MAP_PRIVATE) > - make it visible by the core (to be called by a gdbserver file). ok. > > > > > coregrind/m_scheduler > > + /* do vgdb initialization (but once). Only the first (main) task > > + starting up will do the below. */ > > + if (!vgdb_startup_action_done) { > > Urr .. couldn't this initialisation be moved into one of the > > functions that already exist, to initialise the scheduler? > > VG_(scheduler_init_phase1) or VG_(scheduler_init_phase2) > From my memory: trying to init gdbserver earlier was not ok, as > it would "show" to gdb a process without a valid thread. > I will however look at this more in depth, and add a comment > explaining the reason (or initialize it earlier if I see it is ok) Initialize earlier is my preference. Does it work if you put it in VG_(scheduler_init_phase2) ? I think it should. > > coregrind/m_libcproc.c > > +Int VG_(prctl) (Int option, > > + ULong arg2, ULong arg3, ULong arg4, ULong arg5) > > What's prctl used for (in the GDB server)? IOW, why do we > > need it? > > This is needed to (as best as possible) handle the recent ubuntu > ptrace_scope restriction. > See user manual 2.7.7 limitations of Valgrind gdbserver > (the bullet about ptrace). ok. I just looked briefly though the files in coregrind/m_gdbserver. No significant comments, only the following: valgrind_low.h pls add copyright notice m_gdbserver.c /* True means gdbserver is not interested in this signal, so client signal action can be done without involving gdbserver. Otherwise, gdbserver should be called */ extern int pass_signals[]; move to a header file? Created attachment 59184 [details]
handled review comments + upgraded to rev 11705
* Updated to revision 11705, VEX 2127.
* Changed argname client_request of mc_get_or_set_vbits_for_client
to is_client_request
* centralised disabling of VG_(clo_vgdb_error) during monitor command handling
+ defined VG_(dyn_vgdb_error) to avoid modifying VG_(clo_vgdb_error)
* removed the mc_translate.c irrelevant diffs (useless include, white space)
* removed useless VG_(memmem) function
* unglobalised parse_range back to mc_main.c
* some improvements and fixes in comments, READMEs, doc.
* added copyright header to valgrind_low.h
* pass_signals array defined in server.h
* extracted the "delta" leak (to be put in a subsequent patch)
* added a comment in VG_(scheduler) explaining why the gdbserver
is initialized there.
* implemented protection in vgdb.c against huge monitor commands
(In reply to comment #37) > Created an attachment (id=59184) [details] > handled review comments + upgraded to rev 11705 Thanks for the respin. Looks better. One thing though: after applying the patch, doing chmod +x ./gdbserver_tests/make_local_links chmod +x gdbserver_tests/filter_stderr chmod +x gdbserver_tests/filter_gdb chmod +x gdbserver_tests/filter_make_empty chmod +x gdbserver_tests/filter_memcheck_monitor chmod +x gdbserver_tests/filter_vgdb rebuilding and then doing "make regtest", I get a failure like this $ make regtest [...] make[1]: Leaving directory `/home/sewardj/VgTRUNK/gdbsr' gdbserver_tests/make_local_links /usr/bin/gdb gdbserver eval tests suppressed as /usr/bin/gdb version is not >= 7.2: GNU gdb (GDB) 7.1-ubuntu /usr/bin/perl tests/vg_regtest gdbserver_tests memcheck cachegrind callgrind massif lackey none helgrind drd exp-ptrcheck exp-bbv exp-dhat -- Running tests in gdbserver_tests ----------------------------------- mcbreak: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcbreak ./t (progB: ./gdb --quiet -l 10 --nx ./t) Could not find `mcbreak.stderr.exp*' make: *** [regtest] Error 255 There's no file mcbreak.stderr.exp* in the patch; did it get left out maybe? (In reply to comment #37) (more) > Created an attachment (id=59184) [details] > handled review comments + upgraded to rev 11705 Looks good. There are various minor cleanups needed, but I can do them. Two comments though: (1) I saw this coregrind/m_gdbserver/valgrind-low-ppc64.c +#include "libvex_guest_ppc64.h" + +struct reg regs[] = { + { "r0", 0, 64 }, + { "r1", 64, 64 }, + { "r2", 128, 64 }, (and equivalents for other architectures) The magic numbers 0, 64, 128, etc .. are they bit offsets inside the struct "VexGuestPPC64State"? Because if so, this will break if/when the elements of that struct change offset. Or .. do I misunderstand? (2) --vgdb-shadow-registers=no|yes let gdb see the shadow registers [no] Is this necessary (to have as a command line option) ? Why not just let gdb always see the shadow registers? (In reply to comment #38) There is a bunch of files that have to be chmod-ed +x (you already did a part of it) + some empty files are not created due to a bug in the toolset "svn diff & patch". See in comment 29 the list of all files to chmod +x or touch (to create empty files). Note also that in the below, one test is disabled (as it needs gdb 7.2 to run). I have however not tested with gdb 7.1, so the filter_* files might need some changes to properly filter out gdb 7.1 output. > rebuilding and then doing "make regtest", I get a > failure like this > > $ make regtest > [...] > make[1]: Leaving directory `/home/sewardj/VgTRUNK/gdbsr' > gdbserver_tests/make_local_links /usr/bin/gdb > gdbserver eval tests suppressed as /usr/bin/gdb version is not >= 7.2: GNU gdb > (GDB) 7.1-ubuntu > /usr/bin/perl tests/vg_regtest gdbserver_tests memcheck cachegrind callgrind > massif lackey none helgrind drd exp-ptrcheck exp-bbv exp-dhat > -- Running tests in gdbserver_tests ----------------------------------- > mcbreak: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 > --vgdb-prefix=./vgdb-prefix-mcbreak ./t (progB: ./gdb --quiet -l 10 --nx ./t) > Could not find `mcbreak.stderr.exp*' > make: *** [regtest] Error 255 (In reply to comment #39) > > Looks good. There are various minor cleanups needed, but I can do > them. Two comments though: For what concerns needed cleanup: one thing I was intending to do as part of the next patch (to re-add delta leak) is to avoid using VG_(clo_show_*) for mc.leak_check monitor command (memcheck/mc_main.c). Double checking with grep the places where VG_(clo_*) are modified, I saw that the call to VG_(show_all_errors) in coregrind/m_gdbserver/server.c also should have arguments rather than depend on changing VG_(clo_*). > +struct reg regs[] = { > + { "r0", 0, 64 }, ... > The magic numbers 0, 64, 128, etc .. are they bit offsets inside the > struct "VexGuestPPC64State"? Because if so, this will break if/when > the elements of that struct change offset. Or .. do I misunderstand? The mapping between the VexGuest state and gdb is implemented in the valgrind-low-ppc64.c "static void transfer_register" (and equivalent for other archs). The above numbers are offsets and size in the gdbserver "registers packet" and "registers buffer" (see coregrind/m_gdbserver/regdef.h). How to create or get these offsets is documented in coregrind/m_gdbserver/README_DEVELOPPERS (search for hal9000). > > > (2) --vgdb-shadow-registers=no|yes let gdb see the shadow registers [no] > > Is this necessary (to have as a command line option) ? > Why not just let gdb always see the shadow registers? Showing shadow registers is optional as: 1. shadow registers implies to have a gdb >= 7.1 and this gdb must be compiled with xml support. 2. the gdb command 'info registers' is not very ergonomic in case there are a lot of registers. (In reply to comment #40) > See in comment 29 the list of all files to chmod +x or touch (to create > empty files). Oh, sorry. Missed that. Next q is: how long should I expect the test "mcinfcallWSRU" to take? So far it (memcheck-amd64-linux) has accumulated almost 10 CPU minutes on a 3.46 GHz Core i5, and still not finished. Also, mcinvokeWS seemed to hang at 0% cpu and I had to kill it. Same with nlcontrolc. This is on Ubuntu 10.04.2, x86_64. Also on Ubuntu 10.04.2 x86_64, I get memcheck/tests/addressable and memcheck/tests/atomic_incs failing like this +valgrind: m_gdbserver/m_gdbserver.c:519 (call_gdbserver): Assertion 'gs_addresses == NULL' failed. + at 0x........: report_and_quit (m_libcassert.c:210) + by 0x........: vgPlain_assert_fail (m_libcassert.c:284) + by 0x........: call_gdbserver (m_gdbserver.c:519) + by 0x........: vgPlain_gdbserver_activity (m_gdbserver.c:747) + by 0x........: run_thread_for_a_while (scheduler.c:800) + by 0x........: vgPlain_scheduler (scheduler.c:1154) + by 0x........: run_a_thread_NORETURN (syswrap-linux.c:95) + (In reply to comment #43) > Also, mcinvokeWS seemed to hang at 0% cpu and I had to kill it. > Same with nlcontrolc. This is on Ubuntu 10.04.2, x86_64. Have you chmod-ed +x all the needed files in gdbserver_tests ? i.e. chmod +x filter_* make_local_links simulate_control_c invoker I believe the blockage at 0% cpu or at 100% cpu are due to simulate_control_c and/or invoker not being executable. (In reply to comment #44) > Also on Ubuntu 10.04.2 x86_64, I get memcheck/tests/addressable and > memcheck/tests/atomic_incs failing like this > > +valgrind: m_gdbserver/m_gdbserver.c:519 (call_gdbserver): Assertion > 'gs_addresses == NULL' failed. This bug is triggered when a process forks after having initialized gdbserver, and the child does not do an exec. If VG_(scheduler) does a call to gdbserver in the child, the child will initialize gdbserver but the 'at fork cleanup' in m_gdbserver.c did not cleanup enough. This insufficient cleanup is detected by the failing assert. I have done a fix, tested that both addressable and atomic_incs tests are running ok now. These tests are however only partially testing the fix, so I think it is better to have a specific test for this bug. During investigations, I also run the tests with gdb 7.1, and a (complex) new sed expression is needed in filter_gdb as gdb 7.1 is producing two lines instead of one in mcinfcallWSRU. Do you prefer an new "global" patch with the 3 changes (the fix + the new test + the changed filter_gdb) ? Or do you prefer to work incrementally from now on ? (e.g. if you have started to do some cleanups ?) (Posting this despite the fact that it is partially redundant following a mid-air collision with comment #46) (In reply to comment #45) > i.e. chmod +x filter_* make_local_links simulate_control_c invoker Yes, that gets rid of all the hanging. That means the regtest failures are now as follows: == 584 tests, 12 stderr failures, 2 stdout failures, 1 stderrB failure, 1 stdoutB failure, 0 post failures == gdbserver_tests/mcinfcallWSRU (stderrB) gdbserver_tests/nlcontrolc (stdoutB) memcheck/tests/addressable (stdout) memcheck/tests/addressable (stderr) memcheck/tests/atomic_incs (stdout) memcheck/tests/atomic_incs (stderr) memcheck/tests/linux/stack_switch (stderr) none/tests/async-sigs (stderr) none/tests/fdleak_cmsg (stderr) none/tests/fdleak_ipv4 (stderr) none/tests/shell (stderr) helgrind/tests/pth_barrier3 (stderr) helgrind/tests/tc06_two_races_xml (stderr) drd/tests/pth_barrier3 (stderr) drd/tests/threaded-fork (stderr) exp-ptrcheck/tests/bad_percentify (stderr) Of these, the last 5 (helgrind/, drd/, exp-ptrcheck/) also fail pre-patch, so are not interesting. But the rest are new: memcheck/tests/addressable memcheck/tests/atomic_incs memcheck/tests/linux/stack_switch none/tests/async-sigs none/tests/fdleak_cmsg none/tests/fdleak_ipv4 none/tests/shell all fail with the same assertion as comment #44. The two gdb failures are shown below. If it's of any use, the GDB version on this box is "GNU gdb (GDB) 7.1-ubuntu". -------------------------------------------------------- $ cat gdbserver_tests/mcinfcallWSRU.stderrB.diff --- mcinfcallWSRU.stderrB.exp 2011-04-26 13:19:44.000000000 +0200 +++ mcinfcallWSRU.stderrB.out 2011-04-26 23:52:41.000000000 +0200 @@ -22,10 +22,14 @@ Program received signal SIGTRAP, Trace/breakpoint trap. 0x........ in do_burn () at sleepers.c:39 39 for (i = 0; i < burn; i++) loopnr++; -[Switching to thread 1 (Thread ....)]#0 0x........ in do_burn () at sleepers.c:39 +[Switching to thread 1 (Thread ....)]#0 0x........ in do_burn () + at sleepers.c:39 39 for (i = 0; i < burn; i++) loopnr++; $1 = void [Switching to thread 2 (Thread ....)]#0 0x........ in syscall ... + at ../sysdeps/unix/syscall-template.S:82 +82 ../sysdeps/unix/syscall-template.S: No such file or directory. + in ../sysdeps/unix/syscall-template.S Could not write register "xxx"; remote failure reply 'E. ERROR changing register xxx regno y gdb commands changing registers (pc, sp, ...) (e.g. 'jump', @@ -34,6 +38,8 @@ Thread status is VgTs_WaitSys ' [Switching to thread 3 (Thread ....)]#0 0x........ in syscall ... + at ../sysdeps/unix/syscall-template.S:82 +82 in ../sysdeps/unix/syscall-template.S Could not write register "xxx"; remote failure reply 'E. ERROR changing register xxx regno y gdb commands changing registers (pc, sp, ...) (e.g. 'jump', @@ -41,7 +47,8 @@ can only be accepted if the thread is VgTs_Runnable or VgTs_Yielding state Thread status is VgTs_WaitSys ' -[Switching to thread 4 (Thread ....)]#0 0x........ in do_burn () at sleepers.c:39 +[Switching to thread 4 (Thread ....)]#0 0x........ in do_burn () + at sleepers.c:39 39 for (i = 0; i < burn; i++) loopnr++; $2 = void monitor command request to kill this process -------------------------------------------------------- $ cat gdbserver_tests/nlcontrolc.stdoutB.diff --- nlcontrolc.stdoutB.exp 2011-04-26 13:19:44.000000000 +0200 +++ nlcontrolc.stdoutB.out 2011-04-26 23:53:09.000000000 +0200 @@ -2,14 +2,19 @@ 0x........ in _start () from ...start file... Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. -0x........ in syscall ... +0x........ in select () at ../sysdeps/unix/syscall-template.S:82 + in ../sysdeps/unix/syscall-template.S [New Thread ....] [New Thread ....] [New Thread ....] 4 Thread .... (tid 4 VgTs_WaitSys) 0x........ in syscall ... + at ../sysdeps/unix/syscall-template.S:82 3 Thread .... (tid 3 VgTs_WaitSys) 0x........ in syscall ... + at ../sysdeps/unix/syscall-template.S:82 2 Thread .... (tid 2 VgTs_WaitSys) 0x........ in syscall ... + at ../sysdeps/unix/syscall-template.S:82 * 1 Thread .... (tid 1 VgTs_WaitSys) 0x........ in syscall ... + at ../sysdeps/unix/syscall-template.S:82 $1 = 0 $2 = 0 $3 = 0 (In reply to comment #46) > I have done a fix, Ok good. You found VG_(atfork), yes? > These tests are however only partially testing the fix, > so I think it is better to have a specific test for this bug. Yes. > Do you prefer an new "global" patch with the 3 changes (the fix + the > new test + the changed filter_gdb) ? A new global patch would be good. So far I only did reading and testing, but no cleanups. (In reply to comment #48) > (In reply to comment #46) > > I have done a fix, > > Ok good. You found VG_(atfork), yes? yes, the cleanup is based on atfork. Fix and test is done, but (this time :), I will re-run all tests on multiple platforms (and not only on my main dev. platform, i.e. fedora 12 x86, which did not triggered this bug). I will also run the tests with multiple gdb versions (at least 7.1 and 7.2). > A new global patch would be good. So far I only did reading and > testing, but no cleanups. To offload you as much as I can, if you have any specific cleanup easy to describe I can do, I can take them in charge (this evening :). A.o. if that suits you, I will change void VG_(show_all_errors) ( void ) to void VG_(show_all_errors) ( Int verbosity, Bool xml) to avoid setting+re-setting the equivalent VG_(clo_*) (In reply to comment #49) > To offload you as much as I can, if you have any specific cleanup > easy to describe I can do, I can take them in charge (this evening > :). One thing I wrote down, that would be useful, is to print exact commands needed to start and give to the gdb to make it attach. What I mean is, normally when it starts, it prints ==852== (action at startup) vgdb me ... and waits for me to figure out the magic commands to start gdb and do the "target remote | .." thing (by looking in the manual). As a user I would find it much easier if (for example) it printed out the exact gdb invokation and "target remote | .." text required, so I can just copy/paste them in to the shell, eg: ==852== (action at startup) vgdb me .. ==852== In a different shell, run this: ==852== gdb /home/sewardj/VgTRUNK/gdbsr/memcheck/tests/errs1 ==852== and once gdb is started, give it this command: ==852== target remote | /home/sewardj/VgTRUNK/gdbsr/Inst/bin/vgdb (assuming the program under test is /home/sewardj/VgTRUNK/gdbsr/memcheck/tests/errs1 and the V installation is at /home/sewardj/VgTRUNK/gdbsr/Inst ) Makes sense? I think it would be useful. But only if it is easy to do. Don't stall producing a new patch if not easy. > A.o. if that suits you, I will change > void VG_(show_all_errors) ( void ) > to > void VG_(show_all_errors) ( Int verbosity, Bool xml) > to avoid setting+re-setting the equivalent VG_(clo_*) Yes, please do that. For sure we don't want VG_(clo_*) variables changing as we run; they are single-assignment things. Last item of concern is, when I tried running a from-source Firefox build, it ran successfully with "normal" memcheck but hung after a while (circa 20 cpu seconds) when running attached to GDB. However, I am not sure about this -- I need to test it more. Created attachment 59380 [details] upgraded to rev 11713 + some fixes/improvements * fixed bug in cleanup after fork + added a test * cleaned up call to VG_(show_all_errors) to avoid touching VG_(clo*) * tests redone and/or updated for fedora 12 x86, with gdb 7.0/7.1/7.2 debian amd64, with gdb 7.1/7.2 * updated filter_gdb to handle the differences in behaviour reported by Julian in comment 47 (having no access to an Ubuntu 64 bits, it might be that the sed expressions added in filter_gdb are not properly filtering the irrelevant diffs) (In reply to comment #50) > ==852== (action at startup) vgdb me .. > ==852== In a different shell, run this: > ==852== gdb /home/sewardj/VgTRUNK/gdbsr/memcheck/tests/errs1 > ==852== and once gdb is started, give it this command: > ==852== target remote | /home/sewardj/VgTRUNK/gdbsr/Inst/bin/vgdb > Makes sense? I think it would be useful. But only if it is > easy to do. Don't stall producing a new patch if not easy. This would be easier for the user. However, not directly straightforward. > > void VG_(show_all_errors) ( Int verbosity, Bool xml) > Yes, please do that. For sure we don't want VG_(clo_*) variables > changing as we run; they are single-assignment things. Done as part of the last uploaded patch. > Last item of concern is, when I tried running a from-source > Firefox build, it ran successfully with "normal" memcheck but > hung after a while (circa 20 cpu seconds) when running attached > to GDB. However, I am not sure about this -- I need to test > it more. I have put firefox 4.0 on f12/x86 under memcheck + gdb, no problem encountered. As firefox does quite a lot of fork/exec, maybe a consequence of the wrong cleanup after fork ? If this is the case, the last patch version should solve this. NB: the last patch still need the same list of chmod+touch of comment 29. Created attachment 59404 [details]
filter_gdb
I booted an Ubuntu 64 bits 10.10, and the sed expressions were effectively
not ok (as the real output contains tab characters, while the sed expressions
were expecting only space).
I have only attached filter_gdb, but can attach a new full global patch
(after an svn update) if preferrable
Created attachment 59478 [details]
upgraded to rev 11719, integrated filter_gdb, retested, fix ppc64 bug
Updated to Revision 11719
Re-validated on various linux platforms (x86,amd64,arm,ppc64,s390x)
increased gdb timeout values in tests (were too low for overloaded machines)
fixed a ppc64 bug in vgdb.c
modified a test to be less sensitive to scheduling unfairness
(sched_setaffinity in the test helps, but does not fully resolve this)
modified filter_gdb to handle a ubuntu specific diff in syscall backtrace
(In reply to comment #51) > (having no access to an Ubuntu 64 bits, it might be that the sed expressions > added in filter_gdb are not properly filtering the irrelevant diffs) I've tested the latest patch (vg-gdbserver-patch-20110501) on 64-bit Ubuntu 10.04.1 LTS with GDB 7.2. My failures are: == 586 tests, 21 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures == memcheck/tests/linux/stack_switch (stderr) cachegrind/tests/chdir (stderr) cachegrind/tests/clreq (stderr) cachegrind/tests/dlclose (stderr) cachegrind/tests/notpower2 (stderr) cachegrind/tests/wrap5 (stderr) cachegrind/tests/x86/fpu-28-108 (stderr) callgrind/tests/notpower2-hwpref (stderr) callgrind/tests/notpower2-use (stderr) callgrind/tests/notpower2-wb (stderr) callgrind/tests/notpower2 (stderr) callgrind/tests/simwork-both (stderr) callgrind/tests/simwork-cache (stderr) callgrind/tests/simwork1 (stderr) callgrind/tests/simwork2 (stderr) callgrind/tests/simwork3 (stderr) callgrind/tests/threads-use (stderr) helgrind/tests/pth_barrier3 (stderr) helgrind/tests/tc06_two_races_xml (stderr) drd/tests/pth_barrier3 (stderr) exp-ptrcheck/tests/bad_percentify (stderr) These are exact same failures I get without the patch. Also tested using current GDB snapshot (7.3.50.20110502-cvs). One failure in gdbserver_tests due to GDB splitting 'Switching to thread' message into two lines: --- mcinfcallWSRU.stderrB.exp 2011-05-01 17:02:18.000000000 -0700 +++ mcinfcallWSRU.stderrB.out 2011-05-01 17:05:39.000000000 -0700 @@ -22,10 +22,12 @@ Program received signal SIGTRAP, Trace/breakpoint trap. 0x........ in do_burn () at sleepers.c:39 39 for (i = 0; i < burn; i++) loopnr++; -[Switching to thread 1 (Thread ....)]#0 0x........ in do_burn () at sleepers.c:39 +[Switching to thread 1 (Thread ....)] +#0 0x........ in do_burn () at sleepers.c:39 39 for (i = 0; i < burn; i++) loopnr++; $1 = void (In reply to comment #54) > Created an attachment (id=59478) [details] > upgraded to rev 11719, integrated filter_gdb, retested, fix ppc64 bug Committed as r11727. Builds and regtests OK on 64-bit Ubuntu 10.04.2 (x86_64). Builds ok on ppc32-linux and arm-linux, unclear what regtest status is on those platforms yet (tests are still running). I haven't tried Darwin yet. (In reply to comment #58) > Builds and regtests OK on 64-bit Ubuntu 10.04.2 (x86_64). Builds > ok on ppc32-linux and arm-linux, unclear what regtest status is on > those platforms yet (tests are still running). I haven't tried > Darwin yet. On Darwin, I expect that it should compile properly, but there will for sure be some failing tests : I had only access to a Darwin platform during a few hours, and could only make it compile + tested manually + disable two tests which were blocking. Note also that on the Darwin I had access to, the gdb version was very old (gdb 6.3 IIRC). The tests have only been run with a gdb >= 7.0. I believe that a gdb >= 6.5 is needed for the target remote | command. I might obtain some additional hours of access to a Mac to fix the tests on Darwin. (In reply to comment #58) > Builds ok on ppc32-linux and arm-linux, unclear what regtest status is on > those platforms yet (tests are still running). There are failures and (apparent) hangs. What information do you need, to diagnose these? --------------- On arm-linux (Ubuntu 10.04), I get this: [...] gdbserver_tests/make_local_links /usr/bin/gdb gdbserver eval tests suppressed as /usr/bin/gdb version is not >= 7.2: GNU gdb (GDB) 7.1-ubuntu /usr/bin/perl tests/vg_regtest gdbserver_tests memcheck cachegrind callgrind massif lackey none helgrind drd exp-ptrcheck exp-bbv exp-dhat -- Running tests in gdbserver_tests ----------------------------------- mcbreak: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcbreak ./t (progB: ./gdb --quiet -l 60 --nx ./t) *** mcbreak failed (stdout) *** *** mcbreak failed (stdoutB) *** *** mcbreak failed (stderrB) *** mcclean_after_fork: valgrind --tool=memcheck --vgdb=full --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcclean_after_fork ./clean_after_fork (progB: ./gdb --quiet -l 60 --nx ./clean_after_fork) *** mcclean_after_fork failed (stdout) *** *** mcclean_after_fork failed (stderr) *** *** mcclean_after_fork failed (stdoutB) *** *** mcclean_after_fork failed (stderrB) *** mchelp: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mchelp ./t (progB: ./vgdb --wait=60 --vgdb-prefix=./vgdb-prefix-mchelp -c help -c help debug -c vg.kill) mcinfcallRU: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcinfcallRU ./sleepers 1 0 2000000000 B-B-B-B- (progB: ./gdb --quiet -l 60 --nx ./sleepers) and it hangs there at 100% cpu for memcheck-arm-linux. I waited maybe 20 minutes but no progress. sewardj@u1004arm:~/VgTRUNK/trunk$ uname -a Linux u1004arm 2.6.31-608-imx51 #15-Ubuntu Fri Jun 18 10:30:45 UTC 2010 armv7l GNU/Linux sewardj@u1004arm:~/VgTRUNK/trunk$ gdb --version GNU gdb (GDB) 7.1-ubuntu Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "arm-linux-gnueabi". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. sewardj@u1004arm:~/VgTRUNK/trunk$ gcc --version gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3 --------------- On ppc32-linux, Debian 6.0, GNU gdb (GDB) 7.0.1-debian, the GDB tests are very slow but don't hang. gdbserver_tests/make_local_links /usr/bin/gdb gdbserver eval tests suppressed as /usr/bin/gdb version is not >= 7.2: GNU gdb (GDB) 7.0.1-debian /usr/bin/perl tests/vg_regtest gdbserver_tests memcheck cachegrind callgrind massif lackey none helgrind drd exp-ptrcheck exp-bbv exp-dhat -- Running tests in gdbserver_tests ----------------------------------- mcbreak: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcbreak ./t (progB: ./gdb --quiet -l 60 --nx ./t) *** mcbreak failed (stdoutB) *** *** mcbreak failed (stderrB) *** mcclean_after_fork: valgrind --tool=memcheck --vgdb=full --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcclean_after_fork ./clean_after_fork (progB: ./gdb --quiet -l 60 --nx ./clean_after_fork) *** mcclean_after_fork failed (stdoutB) *** *** mcclean_after_fork failed (stderrB) *** mchelp: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mchelp ./t (progB: ./vgdb --wait=60 --vgdb-prefix=./vgdb-prefix-mchelp -c help -c help debug -c vg.kill) mcinfcallRU: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcinfcallRU ./sleepers 1 0 2000000000 B-B-B-B- (progB: ./gdb --quiet -l 60 --nx ./sleepers) *** mcinfcallRU failed (stderr) *** mcinfcallWSRU: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcinfcallWSRU ./sleepers 100 100000000 1000000000 -S-SB-B- (progB: ./gdb --quiet -l 60 --nx 1>&2 ./sleepers) *** mcinfcallWSRU failed (stderrB) *** mcinvokeRU: valgrind --tool=memcheck --vgdb=yes --vgdb-prefix=./vgdb-prefix-mcinvokeRU ./sleepers 1 0 1000000000 B-B-B-B- (progB: ./invoker 10 --vgdb-prefix=./vgdb-prefix-mcinvokeRU --wait=60 -c vg.wait 0) mcinvokeWS: valgrind --tool=memcheck --vgdb=yes --vgdb-prefix=./vgdb-prefix-mcinvokeWS ./sleepers 1 10000000 0 -S-S-S-S (progB: ./invoker 10 --vgdb-prefix=./vgdb-prefix-mcinvokeWS --wait=60 -c vg.wait 0) mcleak: (skipping, prereq failed: test -x ../memcheck/tests/leak-delta) mcsignopass: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcsignopass ./../none/tests/faultstatus (progB: ./gdb --quiet -l 60 --nx ../none/tests/faultstatus) *** mcsignopass failed (stderr) *** *** mcsignopass failed (stdoutB) *** mcsigpass: valgrind --tool=memcheck --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcsigpass ./../none/tests/faultstatus (progB: ./gdb --quiet -l 60 --nx ../none/tests/faultstatus) *** mcsigpass failed (stdoutB) *** mcvabits: (skipping, prereq failed: test -e ./gdb.eval) mcwatchpoints: valgrind --tool=memcheck --vgdb=full --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mcwatchpoints ./watchpoints (progB: ./gdb --quiet -l 60 --nx ./watchpoints) *** mcwatchpoints failed (stdoutB) *** mssnapshot: valgrind --tool=massif --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-mssnapshot ./t (progB: ./gdb --quiet -l 60 --nx ./t) *** mssnapshot failed (stdoutB) *** *** mssnapshot failed (stderrB) *** nlcontrolc: valgrind --tool=none --vgdb=yes --vgdb-error=0 --vgdb-prefix=./vgdb-prefix-nlcontrolc ./sleepers 1000000000 1000000000 1000000000 BSBSBSBS (progB: ./gdb --quiet -l 60 --nx ./sleepers) *** nlcontrolc failed (stdoutB) *** -- Finished tests in gdbserver_tests ----------------------------------- (In reply to comment #60) > (In reply to comment #58) > > Builds ok on ppc32-linux and arm-linux, unclear what regtest status is on > > those platforms yet (tests are still running). > > There are failures and (apparent) hangs. What information do you > need, to diagnose these? a tar.gz of the *.out files for the failing or hanging tests is for sure a good start. In any case, I will launch a rebuild on the arm and ppc systems I have access to. Created attachment 59721 [details] tarfile of gdb_server *.out files for ppc32-linux (Debian 6.0) (In reply to comment #61) The associated failure summary is: gdbserver_tests/mcbreak (stdoutB) gdbserver_tests/mcbreak (stderrB) gdbserver_tests/mcclean_after_fork (stdoutB) gdbserver_tests/mcclean_after_fork (stderrB) gdbserver_tests/mcinfcallRU (stderr) gdbserver_tests/mcinfcallWSRU (stderrB) gdbserver_tests/mcsignopass (stderr) gdbserver_tests/mcsignopass (stdoutB) gdbserver_tests/mcsigpass (stdoutB) gdbserver_tests/mcwatchpoints (stdoutB) gdbserver_tests/mssnapshot (stdoutB) gdbserver_tests/mssnapshot (stderrB) gdbserver_tests/nlcontrolc (stdoutB) (In reply to comment #62) > Created an attachment (id=59721) [details] > tarfile of gdb_server *.out files for ppc32-linux (Debian 6.0) In parallel, I have recompiled last svn and retested on arm, debian 5.0, gdb 7.0 and gdb 7.2 : all ok. I will rebuild a gdb 7.1, but that will take hours (the arm system is very old/slow). For the hangs: this might be due to the scheduler (un-)fairness we discussed. Again here, the *.out files should give hints about what is happening. Waiting for fairness, I can simplify the test (one single thread burning cpu rather than several). In parallel, I can also try to enhance vg_regtest perl to kill a test after a specified timeout: line in the xxxxxx.vgregtest file (default 5 minutes). What do you think about this ? This avoids blocking all tests due to a single hanging tests (at work, on red-hat 5.2, atomic_incs loops often forever, so it will bypass all such problems). on ppc32/ppc64, debian 5.0, gdb 7.0 and gdb 7.2: all tests ok, except on ppc64/gdb 7.2: nlcontrolc and mcinvokeWSRU which needs a filter_gdb sed expression to filter a specific stack trace. Sorry for all these problems, I was expecting the various platform tests I did to cover enough but the behaviour of combinations of valgrind+gdb+glibc are "richer" than expected. (In reply to comment #63) > Waiting for fairness, I can simplify the test (one single thread > burning cpu rather than several). That sounds like a good thing to do. Is it important to have multiple threads using CPU, or would one thread be enough? > In parallel, I can also try to enhance vg_regtest perl to kill a test > after a specified timeout: line in the xxxxxx.vgregtest file > (default 5 minutes). > What do you think about this ? I would prefer to make all test guaranteed non-hanging. > This avoids blocking all tests due to > a single hanging tests (at work, on red-hat 5.2, atomic_incs loops often > forever, so it will bypass all such problems). memcheck/tests/atomic_incs.c, you mean? I am surprised to hear this: it does not have any infinite loop, only a for loop from 0 to NNN where #define NNN 3456987 (In reply to comment #64) > (In reply to comment #63) > > Waiting for fairness, I can simplify the test (one single thread > > burning cpu rather than several). > That sounds like a good thing to do. Is it important to have multiple > threads using CPU, or would one thread be enough? I think it is preferrable to have multiple threads for such tests but a test with a single thread burning cpu should provide a reasonable coverage of the "burning" case. I will look deeper into this. > > > In parallel, I can also try to enhance vg_regtest perl to kill a test > > after a specified timeout: line in the xxxxxx.vgregtest file > > (default 5 minutes). > > What do you think about this ? > I would prefer to make all test guaranteed non-hanging. Yes, for sure, that is the target. The idea however is to avoid blocking all tests due to a single wrong test. Such "fail after timeout" feature is e.g. used in the gdb dejagnu test framework. But in any case, at this stage, I will work on having only one single thread burning cpu, which should then avoid the dependency on fairness. BTW, can you also attach a tar of the *.out of the arm failures ? I have started looking the ppc32/debian6.0 failures. Most of these are failing due to some additional gdb output I will filter. > memcheck/tests/atomic_incs.c, you mean? I am surprised to hear this: My memory about the looping test name might be wrong. Next time I see it, I will dig deeper. Note that IIRC, the test was also looping forever outside of valgrind. Created attachment 59785 [details]
patch to improve testing on various platforms
The changes in this patch should improve the situation on various platforms
and gdb combinations (but there is still things to look at).
* In all gdbserver_tests using gdb:
Made a more general way to remove the initial start message.
* tests using threads burning cpu modified to have only 1 thread.
This makes them independent of the scheduler fairness.
* filter_gdb and filter_vgdb enhanced to anonymise
some debian 6.0/ppc specific things
some s390x/gdb 7.0, gdb 7.1 specific things
* vgdb.c: added an #include <linux/ptrace.h> to fix compilation
on s390x fedora and suse. (Christian Boerntrager)
* fixed a bug in valgrind-low.c debug log :
when a register size is 0, its image cannot be output (and register
should not be transferred).
* added a parameter --keep-unfiltered to vg_regtest.in
This will make it easier to update filter_gdb:
in case gdbserver_tests are failing due to "artificial"
differences to be filtered, re-run the tests using:
perl tests/vg_regtest --keep-unfiltered gdbserver_tests
Then a tar file with all the *.out in gdbserver_tests
will allow me to better/faster update the filter_gdb.
Retested the above on:
x86 * f12 * (gdb 7.0, 7.1, 7.2)
ppc32/ppc64 * debian 5.0 * (gdb 7.0, 7.1, 7.2)
amd64 * debian 5.0 * (gdb 7.0, 7.1, 7.2)
s390x * RHEL 4 * (gdb 7.0, 7.1, 7.2)
Still to do:
------------
* There are still problems on arm, but could not advance on that.
It looks like some of the problems are triggered when
the computer is very slow.
* mcsigpass/mcsignopass tests failure on debian 6.0/ppc:
still to be looked at (but I have only access to a debian 5, where
these are working).
Created attachment 59798 [details] tarfile of gdb_server *.out files for ppc32-linux with c66 diff (Debian 6.0) (In reply to comment #66) That improves it a lot on ppc32-linux (Debian 6.0). No more long delays. As you said in c66, the remaining failures are gdbserver_tests/mcsignopass (stderr) gdbserver_tests/mcsignopass (stdoutB) gdbserver_tests/mcsigpass (stdoutB) Will test on arm-linux next. (In reply to comment #67) > Created an attachment (id=59798) [details] > tarfile of gdb_server *.out files for ppc32-linux with c66 diff (Debian 6.0) > > (In reply to comment #66) > That improves it a lot on ppc32-linux (Debian 6.0). No more > long delays. As you said in c66, the remaining failures are Good to see it improves. This evening, I will work on generalising the fix in vgdb.c so as to unbreak the build on older distro (e.g. ppc SUSE, ppc RHEL 5). I should upload soon a new version of the patch (cfr comment #66). (unless you prefer to commit in the current state of comment 66). > > gdbserver_tests/mcsignopass (stderr) > gdbserver_tests/mcsignopass (stdoutB) > gdbserver_tests/mcsigpass (stdoutB) With a quick reading, not clear to me what is happening. Is none/tests/faultstatus.vgtest working properly on this setup ? (mssignopass and mcsigpass are using none/tests/faultstatus to test gdbserver signal handling). > > Will test on arm-linux next. At least on an extremely loaded arm system, I have got failures in the tests. It looks to be time dependent, because it goes much better when I increase --max-invoke-ms to 500 ms or it works properly when I type the gdb commands myself, rather than using a script file. Created attachment 59820 [details]
patch to improve testing on various platforms + PTRACE_GETREGS better conditional compilation
* made a better detection of a working PTRACE_GETREGS at compile time
and/or at run-time.
* updated to revision 11735
Apart of this, it is the same as 59785.
Recompiled and re-run gdbserver tests on:
f12/x86, arm/ubuntu9.10, amd64/debian5.0, ppc32/64/debian5.0, s390x/RHEL4
(In reply to comment #69) > Created an attachment (id=59820) [details] > patch to improve testing on various platforms + PTRACE_GETREGS better > conditional compilation Committed, r11740. Created attachment 59887 [details]
gdbsrv cleanup: --vgdb=full for mcsig[no]pass, fix 2 bugs (detected by IBM beam),
* fixed two bugs reported by the IBM BEAM checker:
fd leak in vgdb.c
break missing in valgrind-low-s390x.c
* use option --vgdb=full for the tests mcsigpass.vgtest and mcsignopass.vgtest
This might improve these tests on ppc32/debian 6.0
* added a paragraph in gdbserver_tests/README_DEVELOPPERS to indicate
how to report problems about failing gdbserver tests.
Testing on Ubuntu 11.04. Just a few remarks: * running "vgdb" standalone (ok, makes no sense, but could be a typo), I can not terminate the relaying by pressing ctrl+c. * running a kernel from kernel-ppa which does not have the ptrace scoping compiled in, I get at end of "vgdb -h": "Could not determine ptrace scope from /proc/sys/kernel/yama/ptrace_scope" Running any monitor command such as "vgdb foo" gives the same warning in the terminal of the VG run. Why this warning at all? BTW, I did not know about this security thing up to now ;-) * I think it is useful to talk about "vgdb help" in the output of "vgdb -h" * With callgrind, vgdb help gives callgrind monitor commands: ct.dump [<dump_hint>] dump counters But when running "vgdb ct.dump bla", I get sending command cg.dump bla to pid 11064 command 'cg.dump bla' not recognised (In reply to comment #72) > Testing on Ubuntu 11.04. Just a few remarks: > > * running "vgdb" standalone (ok, makes no sense, but could be a typo), > I can not terminate the relaying by pressing ctrl+c. When used as relay application, vgdb must resist to a SIGINT (as gdb might send a SIGINT to terminate, but vgdb must ensure a clean termination, e.g. must ptrace detach if needed). Maybe what I can do is to check if stdin is a tty, and then report an error to the user if no args have been given. > * running a kernel from kernel-ppa which does not have > the ptrace scoping compiled in, I get at end of "vgdb -h": > "Could not determine ptrace scope from /proc/sys/kernel/yama/ptrace_scope" > Running any monitor command such as "vgdb foo" gives the same warning in the > terminal of the VG run. Why this warning at all? BTW, I did not know about > this security thing up to now ;-) vgdb.c and m_gdbserver/remote-utils.c are guessing ptrace_scope is to be looked at if the pre-processor symbol PR_SET_PTRACER is defined. In this case, the ptrace_scope setting will be read by vgdb and/or remote-utils.c If it cannot be read, an error message is given back, and it is assumed PR_SET_PTRACER is not needed. Do you have a suggestion for an alternative behaviour you would prefer ? The msg could be changed to: warning: could not determine ptrace scope from /proc/sys/kernel/yama/ptrace_scope warning: a process blocked in a system call might not be commanded. Note however that this message is only supposed to be output by vgdb when asking for help or when an erroneous usage is done. The valgrind gdbserver side will however output the warning all the time if it can't read the ptrace scope and PR_SET_PTRACER is defined. It does this for all "new connections" (as this means there is a new vgdb which must become the ptracer). I might put a static local to output the warning only once. > > * I think it is useful to talk about "vgdb help" in the output of "vgdb -h" Yes, that looks a good idea. > > * With callgrind, vgdb help gives > callgrind monitor commands: > ct.dump [<dump_hint>] > dump counters > But when running "vgdb ct.dump bla", I get > sending command cg.dump bla to pid 11064 > command 'cg.dump bla' not recognised It looks to be a typo (cg. instead of ct.). Here, it works: [philippe@soleil valgrind]$ coregrind/vgdb help sending command help to pid 2320 general valgrind monitor commands: .... callgrind monitor commands: ct.dump [<dump_hint>] dump counters ct.zero zero counters [philippe@soleil valgrind]$ ls -lrt | tail -4 drwxrwxr-x. 6 philippe philippe 4096 2011-05-10 23:36 exp-bbv drwxrwxr-x. 6 philippe philippe 4096 2011-05-10 23:36 exp-dhat drwxrwxr-x. 4 philippe philippe 12288 2011-05-11 00:43 gdbserver_tests -rw-------. 1 philippe philippe 0 2011-05-11 23:34 callgrind.out.2320 [philippe@soleil valgrind]$ coregrind/vgdb ct.dump sending command ct.dump to pid 2320 [philippe@soleil valgrind]$ ls -lrt | tail -4 drwxrwxr-x. 6 philippe philippe 4096 2011-05-10 23:36 exp-dhat drwxrwxr-x. 4 philippe philippe 12288 2011-05-11 00:43 gdbserver_tests -rw-------. 1 philippe philippe 0 2011-05-11 23:34 callgrind.out.2320 -rw-------. 1 philippe philippe 53559 2011-05-11 23:34 callgrind.out.2320.1 [philippe@soleil valgrind]$ (In reply to comment #71) > Created an attachment (id=59887) [details] > gdbsrv cleanup: --vgdb=full for mcsig[no]pass, fix 2 bugs (detected by IBM > beam), Committed, r11748. (In reply to comment #73) > > * running "vgdb" standalone (ok, makes no sense, but could be a typo), > > I can not terminate the relaying by pressing ctrl+c. > When used as relay application, vgdb must resist to a SIGINT (as gdb might > send a SIGINT to terminate, but vgdb must ensure a clean termination, > e.g. must ptrace detach if needed). Isn't this the same situation? When pressing ctrl+c after starting vgdb in a shell, it also should do a clean termination... > Maybe what I can do is to check if stdin is a tty, and then report an error > to the user if no args have been given. Sounds good. Another possibility would be to have to specify e.g. "--relay" for vgdb to go into relay mode. > > * running a kernel from kernel-ppa which does not have > > the ptrace scoping compiled in, I get at end of "vgdb -h": > > "Could not determine ptrace scope from /proc/sys/kernel/yama/ptrace_scope" > > Running any monitor command such as "vgdb foo" gives the same warning in the > > terminal of the VG run. Why this warning at all? BTW, I did not know about > > this security thing up to now ;-) > vgdb.c and m_gdbserver/remote-utils.c are guessing ptrace_scope is to be > looked at if the pre-processor symbol PR_SET_PTRACER is defined. > In this case, the ptrace_scope setting will be read by vgdb and/or > remote-utils.c > > If it cannot be read, an error message is given back, and it is assumed > PR_SET_PTRACER is not needed. > > Do you have a suggestion for an alternative behaviour you would prefer ? Perhaps my setup is a little bit strange, but I was wondering why printing out a warning when everything actually works. > > * With callgrind, vgdb help gives > > callgrind monitor commands: > > ct.dump [<dump_hint>] > > dump counters > > But when running "vgdb ct.dump bla", I get > > sending command cg.dump bla to pid 11064 > > command 'cg.dump bla' not recognised > It looks to be a typo (cg. instead of ct.). Oops. You are right, thanks! Created attachment 59948 [details]
fix arm thumb by transforming an address to its thumb form when needed
* added a function thumb_pc transforming a pc to its thumb form if needed
(using an heuristic to guess if this is a thumb address)
* when program counter is modified by gdb, use thumb_pc
* use thumb_pc in monitor command vg.translate
(I was able to check that this improves inferior call on a small
thumb compiled executable + mcinfcallRU test) but I could not compile
all tests with thumb).
Created attachment 59992 [details]
fix some tests on ppc-debian6,s390x + handled Nick Nethercote, Josef Weidendorfer comments
* improved testing & related doc
- added option --vex-iropt-precise-memory-exns=yes to mcsig(no)pass.vgtest
+ updated manual-core.xml
- cleanup some comments in *.vgtest
- modified filter_gdb and filter_memcheck_monitor to
handle specific ppc/debian6.0 mcsig(no)pass output
handle specific s390x 'missing debug info'
- added more information in README_DEVELOPPERS on how to
investigate failing gdbserver tests.
* handled Nick Nethercote comment:
Replaced kludgy ms.snapshot detailed
by ms.detailed_snaphot
Updated documentation and test.
* handled Josef Weindendorfer comments:
- do not report an error if ptrace_scope file can't be read.
Instead, a debug trace is done if -d (debug) option given
- added an option -l to give the list of active Valgrind
gdbserver. Useful a.o. to support callgrind_control.
Updated documentation
- added ref. to vgdb help in the vgdb --help message
Hello Philippe, Running the regression tests (trunk r11750) creates a large number of vgdb-* files in /tmp that do not get cleaned up. That seems like a bug to me. (In reply to comment #78) > Hello Philippe, > > Running the regression tests (trunk r11750) creates a large number of vgdb-* > files in /tmp that do not get cleaned up. That seems like a bug to me. Hello Bart, These files are left if the Valgrind process calls exec. I thought it was not a big deal but in fact, it is better to do something: If ever the pid nr is recycled, this could cause collision, and a new process with the same pid could have a problem creating its resources. If a process P1 running under Valgrind execs an executable E1, depending on --trace-children, E1 will or will not run under Valgrind. If it does not run under Valgrind, then it is better to cleanup (as otherwise it leaves these files). If it runs under valgrind, I think a cleanup before the exec is preferrable to let a fresh gdbserver start after exec. Depending on --vgdb-error, the freshly started gdbserver will stop at the beginning of execution, to let the user put breakpoints before startup. Now, there is another case to look at in this area. If process P1 forks, giving a child P2. Should P2 activate its gdbserver ? I suggest the following logic: if --trace-children=no : do not activate the gdbserver else activate gdbserver + if dyn_vgdb_error == 0, gdbserver waits for gdb to attach just after fork. If after fork, the process P2 execs another executable, we are back to the initial case. (NB: dyn_vgdb_error gets its initial value from --vgdb-error, but can after be controlled by the user). What is slightly worrying me with the above is that the classical case of a fork-exec implies that the user has to attach twice for each fork/exec if --vgdb-error=0 is given. But not activating the gdbserver in the forked process does not let the user debug a forked process that does not exec something else. What do you think about this ? (In reply to comment #79) > (In reply to comment #78) > > Running the regression tests (trunk r11750) creates a large number of vgdb-* > > files in /tmp that do not get cleaned up. That seems like a bug to me. > > These files are left if the Valgrind process calls exec. > I thought it was not a big deal but in fact, it is better to do something: > If ever the pid nr is recycled, this could cause collision, and a new > process with the same pid could have a problem creating its resources. > > If a process P1 running under Valgrind execs an executable E1, depending on > --trace-children, E1 will or will not run under Valgrind. > If it does not run under Valgrind, then it is better to cleanup (as otherwise > it leaves these files). > If it runs under valgrind, I think a cleanup before the exec is preferrable > to let a fresh gdbserver start after exec. Depending on --vgdb-error, > the freshly started gdbserver will stop at the beginning of execution, to > let the user put breakpoints before startup. > > Now, there is another case to look at in this area. > If process P1 forks, giving a child P2. > Should P2 activate its gdbserver ? > I suggest the following logic: > if --trace-children=no : do not activate the gdbserver > else activate gdbserver + if dyn_vgdb_error == 0, gdbserver waits for > gdb to attach just after fork. > If after fork, the process P2 execs another executable, we are back to > the initial case. > (NB: dyn_vgdb_error gets its initial value from --vgdb-error, but can after > be controlled by the user). > > What is slightly worrying me with the above is that the classical case > of a fork-exec implies that the user has to attach twice for each > fork/exec if --vgdb-error=0 is given. > But not activating the gdbserver in the forked process does not let the user > debug a forked process that does not exec something else. > > What do you think about this ? I'm not sure what to do with this. Have you considered to use so-called abstract socket addresses on Linux instead of file system path names ? If I'm not mistaken these sockets disappear automatically after the last process in which the socket was bound is gone. For more information, see e.g. http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html. (In reply to comment #80) > (In reply to comment #79) > > What is slightly worrying me with the above is that the classical case > > of a fork-exec implies that the user has to attach twice for each > > fork/exec if --vgdb-error=0 is given. > > But not activating the gdbserver in the forked process does not let the user > > debug a forked process that does not exec something else. > > I'm not sure what to do with this. Have you considered to use so-called > abstract socket addresses on Linux instead of file system path names ? If I'm > not mistaken these sockets disappear automatically after the last process in > which the socket was bound is gone. For more information, see e.g. > http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html. I do not think this can help: from what I understand, either the socket has a name in the file system, and then it must be unlinked similarly to the FIFOs, or it would not have a name, but then that does not let vgdb discovers these processes. And that would in any case still let the pathname of the shared memory to handle. But I think I have now a change which cleans up the files, except when Valgrind process is killed -9 (or similar). For what concerns debugging (or not) a forked process : I think the proposal above looks the good compromise : I would not like to add a new command line option, and the above still allows the user to decide to debug or not a forked process using --trace-children and/or setting dyn_vgdb_error. I am testing on various platforms, and should attach a patch soon. Thanks for the problem report (and thanks for the cleanup/enhancements of the gdbserver tests). Created attachment 60033 [details]
ensure proper cleanup of gdbsrv FIFOs/shmem files with untraced fork/exec
* syswrap-{generic|darwin|aix5}.c : in PRE(sys_execve) : terminate gdbserver
* pub_core_gdbserver.h and m_gdbserver.c : add VG_(gdbserver_prerun_action),
factorising the actions to do by gdbserver at "startup" (i.e. a traced
fork or a traced exec).
* scheduler.c : implement startup action using VG_(gdbserver_prerun_action)
Created attachment 60034 [details]
implement delta leak reporting
* memcheck.h: add client requests to produce delta leaks
* mc_include.h: add typedef LeakCheckDeltaMode
add old_* values in lossrecord (to implement delta)
* mc_errors.c: add MC_(snprintf_delta) to print a delta if delta leak requested
MC_(pp_Error) : use MC_(snprintf_delta) to print leak errors
(In reply to comment #83) > Created an attachment (id=60034) [details] Manipulation error caused a part of the description to not be given. Here is the complete description: * memcheck.h: add client requests to produce delta leaks * mc_include.h: add typedef LeakCheckDeltaMode add old_* values in lossrecord (to implement delta) MC_(detect_memory_leaks) : added a LeakCheckParams struct with LeakCheckDeltaMode + avoid touching VG_(clo_*) * mc_errors.c: add MC_(snprintf_delta) to print a delta if delta leak requested MC_(pp_Error) : use MC_(snprintf_delta) to print leak errors * mc_leakcheck.c: implemented delta leak report * mc_main.c : updated to LeakCheckParam updated gdbsrv help and mc.leak_check monitor command * mc-manual.xml : changed mc.leak_check documentation documented the 2 new client requests * memcheck/tests/leak-delta.vgtest : new test * gdbserver_tests/mcleak.vgtest : test re-activated and updated (In reply to comment #82) > Created an attachment (id=60033) [details] > ensure proper cleanup of gdbsrv FIFOs/shmem files with untraced fork/exec > > * syswrap-{generic|darwin|aix5}.c : in PRE(sys_execve) : terminate gdbserver > * pub_core_gdbserver.h and m_gdbserver.c : add VG_(gdbserver_prerun_action), > factorising the actions to do by gdbserver at "startup" (i.e. a traced > fork or a traced exec). > * scheduler.c : implement startup action using VG_(gdbserver_prerun_action) The patch helps but apparently does not yet clean up all vgdb pipe instances: $ ls /tmp/vgdb* $ rebuild-valgrind # clean source tree, run ./configure, build and run regtests $ ls /tmp/vgdb* /tmp/vgdb-pipe-from-vgdb-to-17722 /tmp/vgdb-pipe-to-vgdb-from-17722 /tmp/vgdb-pipe-shared-mem-vgdb-17722 Has anyone already tried to run the gdbserver regtests on Ubuntu ? On Ubuntu I see all these tests fail with a diff similar to this: $ cat mcclean_after_fork.stderr.diff --- mcclean_after_fork.stderr.exp 2011-05-16 19:54:47.181087996 +0200 +++ mcclean_after_fork.stderr.out 2011-05-16 20:00:05.031087984 +0200 @@ -1,5 +1,7 @@ (action at startup) vgdb me ... +error 2 No such file or directory +error VG_(open) /proc/sys/kernel/yama/ptrace_scope HEAP SUMMARY: in use at exit: 0 bytes in 0 blocks Note: this might be related to the fact that Ubuntu's default shell (/bin/sh) is dash instead of bash. More info can be found here: http://mywiki.wooledge.org/Bashism. (In reply to comment #76) > Created an attachment (id=59948) [details] > fix arm thumb by transforming an address to its thumb form when needed Committed, r11768. (In reply to comment #77) > Created an attachment (id=59992) [details] > fix some tests on ppc-debian6,s390x + handled Nick Nethercote, Josef > Weidendorfer comments Committed, r11770. (In reply to comment #82) > Created an attachment (id=60033) [details] > ensure proper cleanup of gdbsrv FIFOs/shmem files with untraced fork/exec Committed, r11771. (In reply to comment #85) > The patch helps but apparently does not yet clean up all vgdb pipe instances: > $ ls /tmp/vgdb* > $ rebuild-valgrind # clean source tree, run ./configure, build and run regtests > $ ls /tmp/vgdb* > /tmp/vgdb-pipe-from-vgdb-to-17722 /tmp/vgdb-pipe-to-vgdb-from-17722 > /tmp/vgdb-pipe-shared-mem-vgdb-17722 Yes, I see that too -- 3 files remaining. (Ubuntu 10.04 x86_64). Created attachment 60090 [details] add cleanup: line to none/tests/require-text-symbol-2.vgtest (In reply to comment #85) > The patch helps but apparently does not yet clean up all vgdb pipe instances: In case Valgrind terminates abnormally, then no cleanup is done. In this case, the abnormal termination is in the test which checks --require-text-symbol=:*libc.so*:doesntexist I attach a patch which adds a cleanup: line to this test. It would be possible to implement this cleanup inside Valgrind, but there are about 60 places where VG_(exit) is called. If this solution would be preferrable, I can do the patch (e.g. replacing all calls of VG_(exit) by a function VG_(cleanup_and_exit). The attached patch adds a cleanup: line to the test to ensure the 2 FIFOs and the shared mem file are removed. (In reply to comment #86) > Has anyone already tried to run the gdbserver regtests on Ubuntu ? The tests have been run on an Ubuntu 10.04, 32 and 64 bits, but only with ptrace_scope file configured. The commit r11770 should fix these failures, as the below messages related to unexisting ptrace_scope will be output only when -d debug option(s) is given. > (action at startup) vgdb me ... > +error 2 No such file or directory > +error VG_(open) /proc/sys/kernel/yama/ptrace_scope (In reply to comment #91) > Created an attachment (id=60090) [details] > add cleanup: line to none/tests/require-text-symbol-2.vgtest Philippe .. a general comment re cleaning up. Do you know the open-then-unlink trick? eg at coregrind/m_main.c:1808 ibid. The idea is, for a temporary file that should be deleted at program exit: create the file, then open it (to get a file descriptor), then unlink it. Because the file is open the kernel cannot delete it, but as soon as the process disappears, the kernel deletes the file because there are no more references to it. The deletion happens regardless of how the process exits (normally via syscall, or with a fatal signal) so this is very convenient and clean. Maybe it is useful here? (In reply to comment #93) > (In reply to comment #91) > > Created an attachment (id=60090) [details] [details] > > add cleanup: line to none/tests/require-text-symbol-2.vgtest > > Philippe .. a general comment re cleaning up. Do you know the > open-then-unlink trick? eg at coregrind/m_main.c:1808 ibid. The idea Yes I am aware of this. However, the files cannot be unlinked, otherwise vgdb can't list the active Valgrind gdbservers (and after that can't open the files to contact these V gdbsrv). (In reply to comment #91) > Created an attachment (id=60090) [details] > add cleanup: line to none/tests/require-text-symbol-2.vgtest Committed, r11772. Created attachment 60211 [details]
fix gdbsrv arm thumbness
Fix gdbsrv behaviour with ARM thumb code.
* Disabled several tests on ARM when gdb version < 7.1
gdb 7.0 has problems with next/step/... in ARM thumb code.
* Documented in manual-core.xml that ARM thumb code implies
a gdb version >= 7.1
* m_translate.c: don't chase if chasing will cause a transition arm/thumb.
* m_gdbserver.h/.c : take into account the thumb bit at several places
* Tests redone on f12/x86, debian5/x86_64,Ubuntu9.10/arm
with gdb 7.0/7.1/7.2/7.3(svn)
Updated filter_gdb and templates for gdb 7.3
I just spent several hours chasing a tricky "use of uninitialized value" bug. Ability to set breakpoints and query memory state is absolutely AWESOME! Philippe, thank you, thank you, thank you! At several points in my debug session, while trying to understand whether I am "upstream" or "downstream" from the bug, I wanted to know the defined-ness of a register, e.g. something like "monitor mc.get_vbits $rax" or "monitor mc.check defined $rax". Don't know how difficult this would be to implement, but I believe the capability would be very handy on architectures that pass parameters in registers. (In reply to comment #97) > register, e.g. something like "monitor mc.get_vbits $rax" or "monitor mc.check > defined $rax". If you give --vgdb-shadow-registers=yes at startup, then you can look at the 2 Valgrind shadow register sets by post-fixing them with s1 or s2. E.g. print $raxs1 should give you the value of the shadow register giving the definedness of the register rax. Created attachment 60364 [details] fix gdbsrv arm thumbness : imark delta based solution Fix gdbsrv behaviour with ARM thumb code. This is an alternative approach to attachment 60211 [details]. Instead of having a SB being fully arm or thumb (by not chasing between arm/thumb), each IMark gets a 'Int delta' which gives the delta to add to the address to have the real thumb PC. This then allows gdbsrv to differentiate arm from thumb instructions. Note that to properly test the arm/thumb behaviour, it is better to use a gdb version >= 7.1. To run all tests, a version >= 7.2 is needed. * Disabled several tests on ARM when gdb version < 7.1 gdb 7.0 has problems with next/step/... in ARM thumb code. * Documented in manual-core.xml that ARM thumb code implies a gdb version >= 7.1 * libvex_ir.h : add a Int delta in IMark. * ir_defs.c : add Int delta argument to fn IRStmt_IMark ppIRStmt : for IMark, print a +d or -d after the addr, if the delta != 0 (for a 0 delta, nothing is printed so that the pretty print stays the same for all archs except for thumb instructions) * ir_opt.c : calls to IRStmt_IMark use the new arg * guest_generic_bb_to_IR.c : when building IMark, give a delta 0 except for thumb code. (NB: I suppose that the new 3rd element in an IMark does not have a major impact on performance, but I did not check) * m_gdbserver.h/.c : take into account the thumb bit at several places * All tests redone on debian5/x86_64, gdbserver tests redone on Ubuntu9.10/arm with gdb 7.0/7.1/7.2/7.3(svn) Updated filter_gdb and templates for gdb 7.3 Note: on a heavily loaded system, simulate_control_c timeout arg might be too small. I will try to make simulate_control_c working better on heavily loaded system, and will provide a subsequent patch for that. (In reply to comment #99) > Created an attachment (id=60364) [details] > fix gdbsrv arm thumbness : imark delta based solution Committed, r2153, r11779. Thanks. Created attachment 60451 [details]
fix mcsig(no)pass on arm Ubuntu10, arm thumb internal doc, improve simulate_control_c
* new file docs/internals/arm_thumb_notes_gdbserver.txt
documentation about the subtilities of the thumb bit handling in gdbsrv.
* made the SIGFPE backtrace filtering less dependent on gdb/os/libc/...
* improved simulate_control_c : runs faster/less dependent on timeout value
Retested on f12/x86, debian5/amd64, debian5/ppc64, rh4/s390x, Ubuntu9/arm
Comment on attachment 60451 [details] fix mcsig(no)pass on arm Ubuntu10, arm thumb internal doc, improve simulate_control_c patch comment 101 integrated as r11792 Created attachment 60909 [details]
fix safe_fd exhaustion in fork chain caused by non closing of shared_mem_fd
Patch that fixes the problem reported by Christian Borntraeger.
The problem was created by keeping the shared memory mapped file opened
without reason till the process does an exec.
In case of a chain of forked processes (without exec), the range of safe_fd
reserved for Valgrind own usage becomes exhausted.
* coregrind/m_gdbserver/remote-utils.c :
do not VG_(safe_fd) shared_mem_fd (as it is now closed directly)
close shared_mem_fd once file is mmap-ed and written.
* gdbserver_tests/nlfork_chain.stderr.exp,nlfork_chain.vgtest,
fork_chain.c,nlfork_chain.stdout.exp:
new files
* gdbserver_tests/Makefile.am:
modified for new nlfork_chain test
(In reply to comment #103) > Created an attachment (id=60909) [details] > fix safe_fd exhaustion in fork chain caused by non closing of shared_mem_fd Committed, r11818. Thanks for the fix. Created attachment 61100 [details]
implement delta leak reporting (updated to new rev 11821, minor doc updates)
* memcheck.h: add client requests to produce delta leaks
* mc_include.h: add typedef LeakCheckDeltaMode
add old_* values in lossrecord (to implement delta)
MC_(detect_memory_leaks) : added a LeakCheckParams struct
with LeakCheckDeltaMode + avoid modifying VG_(clo_*)
* mc_errors.c: add MC_(snprintf_delta) to print a delta if delta leak requested
MC_(pp_Error) : use MC_(snprintf_delta) to print leak errors
* mc_leakcheck.c: implemented delta leak report
* mc_main.c : updated to LeakCheckParam
updated gdbsrv help and mc.leak_check monitor command
* mc-manual.xml : changed mc.leak_check documentation
documented the 2 new client requests
* manual-core-adv.xml : minor update (order of mc.leak_check args)
* memcheck/tests/leak-delta.vgtest : new test
* gdbserver_tests/mcleak.vgtest : test re-activated and updated
Created attachment 61102 [details]
minor documentation fixes
Created attachment 61187 [details]
fix error in usability msg (--vgdb= must be --vgdb-prefix=) + make a test more deterministic
* Fix the typo in usability message prefix part (i.e. replaced --vgdb= by
--vgdb-prefix=)
* make test clean_after_fork more deterministic by executing enough basic
block in the child to trigger gdbsrv usability msg on amd64 platform.
(on x86/ppc/s390x, the gdbsrv msg was triggered, making the test fail).
Tested on f12/x86, debian5/amd64
NB: no other change done to the usability msg, I start to get used to it :).
Created attachment 61188 [details]
Support for PIC executables (e.g. firefox on Ubuntu 11) by adding the "auxv" protocol packet to gdbsrv.
* Implemented the transmission of the auxv from gdbsrv to gdb.
* Added VG_(client_auxv) in pub_core_clientstate.h and m_clientstate.c
* initimg-linux.c
setup of VG_(client_auxv)
AT_BASE set to ignore. Note: the AT_BASE in auxv value sent
by the classical gdbserver is 0 => I have ignored it when setting up.
But if needed, it could be set to ignored only by gdbsrv
when sending the auxv rather than when preparing the auxv.
However, I suspect that setting it to 0 is a better value in general.
Tested on f12/x86, Ubuntu11.04/amd64, debian5/amd64, debian5/ppc64,
rhel4/s390x
Created attachment 61262 [details]
fix stupid bug (False instead of True) + SIGSTOP iso SIGTRAP (avoid race cond) + 2 new tests
* new tests
- nlpasssigalrm.vgtest, testing the Qpasssignals signal handling packet
- nlsigvgdb.vgtest, testing vgdb forcing invoke while gdbserver is busy
* m_gdbserver.c
- fix stupid bug (return False instead of return True)
- used SIGSTOP instead of SIGTRAP to "speak" with vgdb.c,
as SIGTRAP is masked while gdbserver is running, and we
need a non-maskable signal to avoid race conditions.
* vgdb.c
- Similarly, used SIGSTOP instead of SIGTRAP
All outstanding patches committed: c106 (Minor documentation fixes for the GDB server) as r11834 c107 (fix error in usability msg, etc) as r11835 c108 (Add support for PIC executables) as r11836 c109 (Fix bug in logic related to signal-passing) as r11837 c105 (add delta leak checking functionality) as r11838 Created attachment 61353 [details]
valgrind and tool mon. cmds prefixes changes + doc fixes + new vgdb option --cmd-time-out
* changed prefixes of Valgrind core monitor commands from vg. to v.
* removed prefixes of Tool monitor commands
* memcheck leak_check 'leakpossible' arg renamed to 'possibleleak'
* memcheck make_memory 'ifaddressabledefined' arg renamed to 'Definedifaddressable'
(with uppercase D to avoid confusion with 'defined' arg).
* vgdb options
- Some doc updates : more logical option order documentation,
specify 'standalone' for options aimed at standalone usage.
- added option --cmd-time-out for standalone vgdb
(comment of Josef Weindendorfer, needed to interface with a callgrind GUI)
* gdbserver_tests/Makefile.am : added some missing files to EXTRA_DIST
* updated tests according to the above.
* updated documentation according to the above.
* some additional minor doc fixes/clarifications
Tested on f12/x86, debian5/amd64
(In reply to comment #111) > Created an attachment (id=61353) [details] > valgrind and tool mon. cmds prefixes changes + doc fixes + new vgdb option > --cmd-time-out Committed, r11844. Closing. Any followup changes can go in new bug reports of their own. |