gdb has the ability to start a "gdbserver"-like program and use its stdio to communicate. This is documented here: https://sourceware.org/gdb/current/onlinedocs/gdb/Connecting.html but essentially it looks like: (gdb) target remote | some-server ... It would be nice if valgrind supported this mode. This would make it simpler to run it under gdb.
Note that valgrind gdbserver is very special: it is not a 'separate' process that e.g. do ptrace system calls to debug an inferior. The valgrind gdbserver is embedded in valgrind itself. Due to that, when the valgrind guest process is blocked in a syscall, the valgrind gdbserver cannot 'read' packets from gdb. The intermediate vgdb process is reading the packets from gdb, and then wakes up (in a very special way) the valgrind runtime to get the valgrind guest process out of the blocking system call. As such, a direct connnection between gdb and the valgrind gdbserver will not work properly. A possible implementation might be a new vgdb option such as: --valgrind [VALGRIND_OPTIONS ...] PROGRAM_TO_RUN_UNDER_VALGRIND [PROGRAM_ARGS ...] When vgdb gets this argument, it would launch valgrind itself and connect to it. That avoids thus to have to launch valgrind in one window, and have gdb/vgdb launched in another window.
So having the full gdbserver embedded in valgrind makes things a bit tricky. vgdb is a "simple" (but not really simple) relay between gdb and the valgrind embedded gdbserver. But would it be possible to make vgdb a smart rely? So vgdb would handle all "extended remote" packets (and start, restart, stop valgrind when requested) and only rely non-extended packets to the embedded valgrind gdbserver? I haven't really studied the remote protocol, so I don't really know if this can be practically done.
(In reply to Philippe Waroquiers from comment #1) > A possible implementation might be a new vgdb option such as: > --valgrind [VALGRIND_OPTIONS ...] PROGRAM_TO_RUN_UNDER_VALGRIND > [PROGRAM_ARGS ...] > > When vgdb gets this argument, it would launch valgrind itself and connect to > it. An alternative, or extension to the idea, would be to have a "vgdb --multi" (similar to gdbserver --multi, but name it however you wish), and make vgdb not start valgrind until the user types "run". You'd likely want to decouple this from 'socket vs stdio', so you'd have something like: (gdb) target extended-remote | vgdb --stdio --multi PROGRAM_TO_RUN_UNDER_VALGRIND (gdb) set args foo bar (gdb) run this way, re-running the program is just another "run".
We have been looking at this and for the valgrind side it would involve something like Pedro suggests in comment #3. So vgdb would intercept the extended-remote packages (which aren't currently implemented in the valgrind gdbserver) and on an Run packet it would launch valgrind (and connect to it). As a bonus one could implement the multiprocess extensions so it would be easier to follow child processes when valgrind uses --track-children=yes. This is only half of the work though. Although you can change the way gdb and vgdb communicate (it doesn't need to be through stdio, but we could also add a "normal" socket communication channel). We also need a way for gdb and the valgrind process launched through vgdb to share a tty stdin/stdout. This needs an extension to the way a "local" remote process is launched. We cannot use stdin/stdout for the gdb/vgdb communication if we want to start valgrind within vgdb, and have valgrind's in/out share with gdb's tty. So this probably means a special way to start an extended-remote target on the gdb side.
(In reply to Mark Wielaard from comment #4) > This is only half of the work though. Although you can change the way gdb > and vgdb communicate (it doesn't need to be through stdio, but we could also > add a "normal" socket communication channel). We also need a way for gdb and > the valgrind process launched through vgdb to share a tty stdin/stdout. This > needs an extension to the way a "local" remote process is launched. We > cannot use stdin/stdout for the gdb/vgdb communication if we want to start > valgrind within vgdb, and have valgrind's in/out share with gdb's tty. So > this probably means a special way to start an extended-remote target on the > gdb side. Filed https://sourceware.org/bugzilla/show_bug.cgi?id=28916 "tty sharing mode for target remote" against gdb for this part.
There is a fork on sourcehut that contains the start of work on a --multi option for vgdb that implements extended-remote mode so that vgdb can start a valgrind process in response to a vrun packet. Currently the vgdb part gets as far as launching valgrind --gdb-errors=0 but the work on the embedded valgrind gdbserver hasn't started yet. https://git.sr.ht/~sasshka/valgrind gdb_valgrind_integration branch
Created attachment 153923 [details] patch vgdb: Add multi mode In this mode, Vgdb uses extended remote protocol to communicate with Gdb. in the extended-remote protocol Vgdb has to interpret the initial packets to setup the connection till it gets a vRun packet. Then it has to actually start up valgrind, connect to Valgrind's internal gdbserver, reinitialize the connection (setup noack-mode for example) before it can start simply forwarding packets. This commit also adds new options to Vgdb: - --valgrind - allows to specify the path to Valgrind, when not specified, system Valgrind is used - --vargs - allows to specify options to run Valgrind with More details in my post about the gdb/valgrind integration project: http://sasshkas.blogspot.com/search/label/gdb%0Avalgrind
We reviewed (about half) of the patch and found a couple of things to cleanup: - There is a sleep(3) after fork, before execve in the parent (vgdb) to wait for valgrind to fully startup This should be replaced with looking for the shared_mem file to appear - write_to_gdb should retry short writes. - send_packet is badly indented. - receive_packet should try again if the checksum is bad (very unlikely since we only use reliable transports) - do_multi_mode () closes down too soon. When the run is over (the valgrind side closes), it should restart listening on the gdb side. That way the user can run again without setting up the target again. - QEnvironmentHexEncoded, QEnvironmentReset, QEnvironmentUnset and QSetWorkingDir aren't really handled. Specifically split_hexdecode only checks the arguments can be decoded and split. But the arguments aren't actually returned. This might be fine if setting the environment and cwd will be handled differently (on the gdb side). But could use a comment in that case.
Created attachment 156128 [details] patch The new updated version
Created attachment 157529 [details] patch The updated version.
When I saw this +/* For accept4 and execvpe. */ +#define _GNU_SOURCE #include "vgdb.h" I had a bit of a bad feeling, which was confirmed when I tried to build. vgdb.c:1185:16: warning: implicit declaration of function 'execvpe' is invalid in C99 [-Wimplicit-function-declaration] res = execvpe ("valgrind", ^ vgdb.c:1186:35: error: use of undeclared identifier 'environ' val_argv, environ); ^ vgdb.c:1190:16: warning: implicit declaration of function 'execvpe' is invalid in C99 [-Wimplicit-function-declaration] res = execvpe (val_argv[0], val_argv, environ); ^ vgdb.c:1190:48: error: use of undeclared identifier 'environ' res = execvpe (val_argv[0], val_argv, environ); MUSL has a manpage for execvpe but not macOS, FreeBSD or Illumos. There is 'execle' it's POSIX and has been around forever, but there's no clean way to use it with the unknown number of args. macOS has "execve", MUSL also has "execvpe". FreeBSD does have exect, I gave that a quick try but I get execve valgrind: No such file or directory 09:05:24.560189 Running "./malloc1" on the remote target failed (gdb) OOPS! couldn't launch valgrind No such file or directory I tried plain execvp (since environ isn't uyet being used) and I could get as far as a gdb prompt but couldn't do much, getting errors like relaying data between gdb and process 57204 Protocol error: qXfer:features:read (target-features) conflicting enabled responses. Cannot find bounds of current function + DEBUG(1, "packed recieved: '%s'\n", buf); ^^^ That should be "packet received" Will the option "--launched-with-multi" ever be used by humans? If not no worry, but if it is then it should be more self explanatory.
(In reply to Paul Floyd from comment #11) > When I saw this > > +/* For accept4 and execvpe. */ > +#define _GNU_SOURCE > #include "vgdb.h" > > I had a bit of a bad feeling, which was confirmed when I tried to build. > > vgdb.c:1185:16: warning: implicit declaration of function 'execvpe' is > invalid in C99 [-Wimplicit-function-declaration] > res = execvpe ("valgrind", > ^ > vgdb.c:1186:35: error: use of undeclared identifier 'environ' > val_argv, environ); > ^ Aha, yeah execvpe and environ are GNU extensions. But since we aren't really handling changes in environment at the moment (although there is some code there in case gdb tells us about it) we can use execvp for now. I see accept4 (which is used to set SOCK_CLOEXEC) is also a GNU extension. But that does also work on other systems? > I tried plain execvp (since environ isn't uyet being used) and I could get > as far as a gdb prompt but couldn't do much, getting errors like > > relaying data between gdb and process 57204 > Protocol error: qXfer:features:read (target-features) conflicting enabled > responses. > > Cannot find bounds of current function That is interesting. We have an XXX for propagating the qSupported response and it might be this is caused by that missing. > + DEBUG(1, "packed recieved: '%s'\n", buf); > ^^^ > That should be "packet received" Thanks, fixed. > Will the option "--launched-with-multi" ever be used by humans? If not no > worry, but if it is then it should be more self explanatory. --launched-with-multi is only supposed to be used by vgdb when running valgrind, so it is only documented with valgrind --help-debug: --launched-with-multi=no|yes valgrind launched in vgdb multi mode [no]
> I see accept4 (which is used to set SOCK_CLOEXEC) is also a GNU extension. > But that does also work on other systems? It compiles and there is a manpage for accept4 which is always a good start. I thought the new arg would be 'hidden' so no real need to make it self explanatory.
Created attachment 157862 [details] patch Thank you for the review! We tried to address the concerns.
Created attachment 157873 [details] patch - Handle qfThreadInfo by replying there are no threads - Propagate ackmode and qsupported in poll loop - Sync qsupported replies, explicitly add --vgdb-shadow-registers=yes - Make sure len is always freed and that decoded_string[i] can be freed
Thanks, last patch looks pretty nice. I threw it at the try builder and it at least compiles cleanly: https://builder.sourceware.org/buildbot/#/changes/23263 That of course is all different GNU/Linux setups, maybe Paul can try it on FreeBSD. P.S. It is probably more accurate to make you the main author and add me as Co-authored-by:
I'll have a go over the weekend.
Builds cleanly on FreeBSD But I still get errors like (gdb) start Temporary breakpoint 4 at 0x201ab8: file new_delete_mismatch_size.cpp, line 12. Starting program: /usr/home/paulf/scratch/valgrind/memcheck/tests/new_delete_mismatch_size valgrind: $PWD/../valgrind/memcheck/tests/new_delete_mismatch_size valgrind: : No such file or directory syscall failed: No such file or directory error opening /tmp/vgdb-pipe-shared-mem-vgdb-14834-by-paulf-on-euler shared memory file Remote connection closed
I tried again (with lots of -d / -v) and it worked.
I wonder if there is still a timing issue if it works with extra verbose/debug output. We tried to remove any explicit waiting/spinning. Could you show a full session (gdb command line plus arguments and gdb prompts you did) where things fail, and where you added the extra -v and -d to make it work?
It seems quite intermittent. I've only reproduced it once, and without debug/verbose. I had a very quick look on Illumos and Darwin and Illumos has a man page for accept4 but not Darwin. I also tried to make the startup even easier and after searching a bit and getting some info on SO I came up with define set-program-name set logging file tmp.gdb set logging overwrite on set logging redirect on set logging enabled on python print(f"set $programname = \"{gdb.current_progspace().filename}\"") set logging enabled off set logging redirect off set logging overwrite off source tmp.gdb end define start_vg set-program-name eval "set remote exec-file %s", $programname show remote exec-file set sysroot / target extended-remote | vgdb --multi --vargs -q start end I can't find an easy way to get the inferior in gdb (in general there isn't always just one). Again I don't know how to do it, but this gdb function would be nicer if there's a way to pass args to the vgdb line to pass on to valgrind. Something like if $argc >= 2 while $i < $argc eval "set $args = $args arg%d", $i set $i = $i + 1 end eval "target extended-remote | vgdb --multi --vargs -q %s", $args
(In reply to Paul Floyd from comment #21) > I also tried to make the startup even easier and after searching a bit and > getting some info on SO I came up with I don't know the exact problem here -- I don't use "extended-remote" much if at all -- but we could write a new "target valgrind" command in Python (shipped with valgrind) that automates some of this. > eval "set remote exec-file %s", $programname Like this bit could be done automatically. Though it occurs to me that it might be even nicer to give gdb some mode like "gdbserver but we know it shares the filesystem", so that "remote exec-file" and "sysroot" aren't needed at all. > I can't find an easy way to get the inferior in gdb (in general there isn't > always just one). Shenanigans with "pipe" and "info inferiors", or gdb.selected_inferior() in Python. > Again I don't know how to do it, but this gdb function would be nicer if > there's a way to pass args to the vgdb line to pass on to valgrind. I would have thought this was handled by extended-remote? Unless you mean some kind of valgrind-specific args, not args to the inferior? The latter could be done by the putative "target valgrind" easily - it could parse its own arguments and do whatever it likes.
To give a bit more context, the example in the xml doc is # gdb prog (gdb) set remote exec-file prog (gdb) set sysroot / (gdb) target extended-remote | vgdb --multi --vargs -q (gdb) start And what I was trying to to was to simplify that to something more like # gdb prog (gdb) start-valgrind [valgrind args]
(In reply to Paul Floyd from comment #21) > It seems quite intermittent. I've only reproduced it once, and without > debug/verbose. Too bad, those are the worst bugs to debug. > I had a very quick look on Illumos and Darwin and Illumos has a man page for > accept4 but not Darwin. Does that also mean it doesn't build on Darwin? We can use accept on Darwin in that case. It just means it will possibly leak a file descriptor on exec.
(In reply to Tom Tromey from comment #22) > (In reply to Paul Floyd from comment #21) > > > I also tried to make the startup even easier and after searching a bit and > > getting some info on SO I came up with > > I don't know the exact problem here -- I don't use "extended-remote" much > if at all The basic difference between target remote and target extended-remote is whether the target can (re)start the program. Basically with target remote gdb connects to an already running/starting program and the target disconnects when the program ends, while with target extended-remote gdb connects to a target that can (re)start a program and keeps connected to the target even after to program ends, so it can restart the program or run another. > -- but we could write a new "target valgrind" command in Python > (shipped with valgrind) that automates some of this. Doing it as a python script is an interesting idea. The next valgrind will ship with some gdb python scripts already. The question is how to auto-load it when gdb starts up. > > eval "set remote exec-file %s", $programname > > Like this bit could be done automatically. > Though it occurs to me that it might be even nicer to give > gdb some mode like "gdbserver but we know it shares the filesystem", > so that "remote exec-file" and "sysroot" aren't needed at all. Yes, that plus a way to share the tty or have stdin/stdout redirected to some alternative file descriptors (currently they are used by the pipe to talk the so you cannot directly get input/output to/from the program) https://sourceware.org/bugzilla/show_bug.cgi?id=28916 > > Again I don't know how to do it, but this gdb function would be nicer if > > there's a way to pass args to the vgdb line to pass on to valgrind. > > I would have thought this was handled by extended-remote? Unless you mean > some kind of valgrind-specific args, not args to the inferior? > > The latter could be done by the putative "target valgrind" easily - it could > parse its own arguments and do whatever it likes. The idea is that any arguments passed to vgdb --vargs .... will be passed to valgrind when it starts up.
(In reply to Mark Wielaard from comment #24) > (In reply to Paul Floyd from comment #21) > Does that also mean it doesn't build on Darwin? > We can use accept on Darwin in that case. It just means it will possibly > leak a file descriptor on exec. No ti doesn't build checking for the kernel version... Darwin 17.x (17.7.0) / macOS 10.13 High Sierra gcc -DHAVE_CONFIG_H -I. -I.. -I.. -I../include -I../include -I../VEX/pub -I../VEX/pub -DVGA_amd64=1 -DVGO_darwin=1 -DVGP_amd64_darwin=1 -DVGPV_amd64_darwin_vanilla=1 -I../coregrind -DVG_LIBDIR="\"/usr/local/libexec/valgrind"\" -DVG_PLATFORM="\"amd64-darwin\"" -arch x86_64 -O2 -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wcast-align -Wcast-qual -Wwrite-strings -Wempty-body -Wformat -Wformat-security -Wignored-qualifiers -Wenum-conversion -finline-functions -fno-stack-protector -fno-strict-aliasing -fno-builtin -Wno-cast-align -Wno-self-assign -Wno-tautological-compare -mmacosx-version-min=10.6 -MT vgdb-vgdb.o -MD -MP -MF .deps/vgdb-vgdb.Tpo -c -o vgdb-vgdb.o `test -f 'vgdb.c' || echo './'`vgdb.c mv -f .deps/valgrind-launcher-darwin.Tpo .deps/valgrind-launcher-darwin.Po gcc -DHAVE_CONFIG_H -I. -I.. -I.. -I../include -I../include -I../VEX/pub -I../VEX/pub -DVGA_amd64=1 -DVGO_darwin=1 -DVGP_amd64_darwin=1 -DVGPV_amd64_darwin_vanilla=1 -I../coregrind -DVG_LIBDIR="\"/usr/local/libexec/valgrind"\" -DVG_PLATFORM="\"amd64-darwin\"" -arch x86_64 -O2 -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wcast-align -Wcast-qual -Wwrite-strings -Wempty-body -Wformat -Wformat-security -Wignored-qualifiers -Wenum-conversion -finline-functions -fno-stack-protector -fno-strict-aliasing -fno-builtin -Wno-cast-align -Wno-self-assign -Wno-tautological-compare -mmacosx-version-min=10.6 -MT vgdb-vgdb-invoker-none.o -MD -MP -MF .deps/vgdb-vgdb-invoker-none.Tpo -c -o vgdb-vgdb-invoker-none.o `test -f 'vgdb-invoker-none.c' || echo './'`vgdb-invoker-none.c vgdb.c:91:32: warning: format specifies type 'long' but the argument has type '__darwin_suseconds_t' (aka 'int') [-Wformat] sprintf(ptr, ".%6.6ld ", dbgtv.tv_usec); ~~~~~~ ^~~~~~~~~~~~~ %6.6d /usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf' __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__) ^~~~~~~~~~~ vgdb.c:511:51: error: use of undeclared identifier 'SOCK_CLOEXEC' int listen_gdb = socket(PF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); ^ vgdb.c:540:19: warning: implicit declaration of function 'accept4' is invalid in C99 [-Wimplicit-function-declaration] gdb_connect = accept4(listen_gdb, NULL, NULL, SOCK_CLOEXEC); vgdb.c:1119:8: warning: implicit declaration of function 'pipe2' is invalid in C99 [-Wimplicit-function-declaration] (and a few more format warnings)
Created attachment 158030 [details] patch this version updates: - fix Darwin compilation - Correct vargv arguments, last element should be NULL - cleanup_fifos_and_shared_mem - read_from_gdb_write_to_pid let gdb know it there was an error writing
(In reply to Alexandra Hajkova from comment #27) > Created attachment 158030 [details] > patch > > this version updates: > - fix Darwin compilation > - Correct vargv arguments, last element should be NULL > - cleanup_fifos_and_shared_mem > - read_from_gdb_write_to_pid let gdb know it there was an error writing This looks good. I like to integrate it now. I'll fix up the Author, the commit message and add a NEWS entry.
commit 0432ce486d61f84f6fcbeab0d812bb1f61c57561 Author: Alexandra Petlanova Hajkova <ahajkova@redhat.com> Date: Thu Mar 3 04:46:03 2022 -0500 vgdb: implement the extended-remote protocol Executing vgdb --multi makes vgdb talk the gdb extended-remote protocol. This means that the gdb run command is supported and vgdb will start up the program under valgrind. Which means you don't need to run gdb and valgrind from different terminals. Also vgdb keeps being connected to gdb after valgrind exits. So you can easily rerun the program with the same breakpoints in place. vgdb now implements a minimal gdbserver that just recognizes a few extended-remote protocol packets. Once it starts up valgrind it sets up noack and qsupported then it will forward packets between gdb and valgrind gdbserver. After valgrind shutsdown it resumes handling gdb packets itself. https://bugs.kde.org/show_bug.cgi?id=434057 Co-authored-by: Mark Wielaard <mark@klomp.org>