In coregrind/vgdb-invoker-ptrace.c we have this code: /* pid received a signal which is not the signal we are waiting for. If we have not (yet) changed the registers of the inferior or we have (already) reset them, we can transmit the signal. If we have already set the registers of the inferior, we cannot transmit the signal, as this signal would arrive when the gdbserver code runs. And valgrind only expects signals to arrive in a small code portion around client syscall logic, where signal are unmasked (see e.g. m_syswrap/syscall-x86-linux.S ML_(do_syscall_for_client_WRK). As ptrace is forcing a call to gdbserver by jumping 'out of this region', signals are not masked, but will arrive outside of the allowed/expected code region. So, if we have changed the registers of the inferior, we rather queue the signal to transmit them when detaching, after having restored the registers to the initial values. */ if (pid_of_save_regs) { siginfo_t *newsiginfo; // realloc a bigger queue, and store new signal at the end. // This is not very efficient but we assume not many sigs are queued. if (signal_queue_sz >= 64) { DEBUG(0, "too many queued signals while waiting for SIGSTOP\n"); return False; } signal_queue_sz++; signal_queue = vrealloc(signal_queue, sizeof(siginfo_t) * signal_queue_sz); newsiginfo = signal_queue + (signal_queue_sz - 1); res = ptrace (PTRACE_GETSIGINFO, pid, NULL, newsiginfo); This is inside a while (1) loop and could run infinitely when valgrind itself is crashing (getting a SIGSEGV over and over again). I haven't identified precisely why valgrind is failing (it is only on s390x during the gdbserver_tests/nlvgdbsigqueue testcase), but I propose to limit this loop and bail out after having seen 64 non SIGSTOP signals, so that vgdb isn't stuck inside this loop slowly eating all memory: t a/coregrind/vgdb-invoker-ptrace.c b/coregrind/vgdb-invoker-ptrace.c index 389748960..07f3400f9 100644 --- a/coregrind/vgdb-invoker-ptrace.c +++ b/coregrind/vgdb-invoker-ptrace.c @@ -300,6 +300,10 @@ Bool waitstopped (pid_t pid, int signal_expected, const char *msg) // realloc a bigger queue, and store new signal at the end. // This is not very efficient but we assume not many sigs are queued. + if (signal_queue_sz >= 64) { + DEBUG(0, "too many queued signals while waiting for SIGSTOP\n"); + return False; + } signal_queue_sz++; signal_queue = vrealloc(signal_queue, sizeof(siginfo_t) * signal_queue_sz); Note that this is different from bug #434035 since that involved a fatal signal, in this case the signal (SIGSEGV) isn't fatal since valgrind tries to handle it (but fails).
(In reply to Mark Wielaard from comment #0) > [...] I haven't > identified precisely why valgrind is failing (it is only on s390x during the > gdbserver_tests/nlvgdbsigqueue testcase), [...] Does this mean the test case is failing for you? It isn't for me. If you have more information, I'd look into that.
(In reply to Andreas Arnez from comment #1) > (In reply to Mark Wielaard from comment #0) > > [...] I haven't > > identified precisely why valgrind is failing (it is only on s390x during the > > gdbserver_tests/nlvgdbsigqueue testcase), [...] > Does this mean the test case is failing for you? It isn't for me. If you > have more information, I'd look into that. Unfortunately the issue occurs on an remote test machine that checks against latest gcc and glibc, where before this workaround it blows up the machine, because vgdb eats up all memory and afterwards the nlvgdbsigqueue does indeed FAIL. So I would at least like to get this workaround in to not break the testing setup. I am trying to get access to a s390x setup where this happens. It might be similar to the other issue I have seen with latest glibc, where if we get a fatal signal, try to terminate and call __libc_freeres we get a SIGSEGV.
I pushed the workaround: commit 970820852e542506dd7a4c722fecd73e34363fde Author: Mark Wielaard <mark@klomp.org> Date: Tue Oct 12 23:25:32 2021 +0200 vgdb: only queue up to 64 pending signals when waiting for SIGSTOP We should not queue infinite pending signals so we won't run out of memory when the SIGSTOP never arrives. But keep this bug open because the root cause isn't known yet.
(In reply to Mark Wielaard from comment #3) > I pushed the workaround: > > commit 970820852e542506dd7a4c722fecd73e34363fde > Author: Mark Wielaard <mark@klomp.org> > Date: Tue Oct 12 23:25:32 2021 +0200 > > vgdb: only queue up to 64 pending signals when waiting for SIGSTOP > > We should not queue infinite pending signals so we won't run out of > memory when the SIGSTOP never arrives. > > But keep this bug open because the root cause isn't known yet. Here is a bug with more logs about failing gdb_server tests on s390x: https://bugs.kde.org/show_bug.cgi?id=444481 Lets use that one to track s390x gdb_server issues and close this one since the specific workaround for the "eat all memory" issue has been pushed.