With valgrind 3.18.1 on Fedora 34 s390x with: glibc-2.33-20.fc34.s390x gdb-10.2-3.fc34.s390x gcc-11.2.1-1.fc34.s390x binutils-2.35.2-6.fc34.s390x I am seeing the following failures: gdbserver_tests/hgtls (stdoutB) gdbserver_tests/nlsigvgdb (stderr) gdbserver_tests/nlsigvgdb (stderrB) gdbserver_tests/nlvgdbsigqueue (stderr) gdbserver_tests/nlvgdbsigqueue (stdoutB) Which I don't observe on RHEL8 s390x with: glibc-2.28-167.el8.s390x gdb-8.2-16.el8.s390x gcc-8.5.0-3.el8.s390x binutils-2.30-108.el8.s390x The f34 log files: --- hgtls.stdoutB.exp 2021-10-26 12:44:44.997207954 -0400 +++ hgtls.stdoutB.out 2021-10-26 12:49:28.717217935 -0400 @@ -9,37 +9,5 @@ Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 55 int here = 0; test local tls_ip 0x........ ip 0x........ equal 1 -Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 -55 int here = 0; -test local tls_ip 0x........ ip 0x........ equal 1 -Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 -55 int here = 0; -test global tls_ip 0x........ ip 0x........ equal 1 -Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 -55 int here = 0; -test global tls_ip 0x........ ip 0x........ equal 1 -Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 -55 int here = 0; -test static_extern tls_ip 0x........ ip 0x........ equal 1 -Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 -55 int here = 0; -test static_extern tls_ip 0x........ ip 0x........ equal 1 -Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 -55 int here = 0; -test so_extern tls_ip 0x........ ip 0x........ equal 1 -Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 -55 int here = 0; -test so_extern tls_ip 0x........ ip 0x........ equal 1 -Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 -55 int here = 0; -test so_local tls_ip 0x........ ip 0x........ equal 1 -Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 -55 int here = 0; -test so_local tls_ip 0x........ ip 0x........ equal 1 -Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 -55 int here = 0; -test so_global tls_ip 0x........ ip 0x........ equal 1 -Breakpoint 1, tls_ptr (p=0x........) at tls.c:55 -55 int here = 0; -test so_global tls_ip 0x........ ip 0x........ equal 1 -Program exited normally. +Program terminated with signal 0, Signal 0. +The program no longer exists. --- nlsigvgdb.stderr.exp 2021-10-26 12:44:44.997207954 -0400 +++ nlsigvgdb.stderr.out 2021-10-26 12:50:13.507232568 -0400 @@ -3,4 +3,3 @@ (action at startup) vgdb me ... -Reset valgrind output to log (orderly_finish) gdbserver_tests/nlsigvgdb.stderr.diff (END) --- nlsigvgdb.stderrB.exp 2021-10-26 12:44:44.997207954 -0400 +++ nlsigvgdb.stderrB.out 2021-10-26 12:50:13.767232568 -0400 @@ -1,5 +1,6 @@ vgdb-error value changed from 0 to 999999 gdbserver: continuing in 5000 ms ... -gdbserver: continuing after wait ... -monitor command request to kill this process +syscall failed: No such process +invoke_gdbserver_in_valgrind: check for pid .... existence failed Remote connection closed +"monitor" command not supported by this target. --- nlvgdbsigqueue.stderr.exp 2021-10-26 12:44:44.997207954 -0400 +++ nlvgdbsigqueue.stderr.out 2021-10-26 12:50:16.967232568 -0400 @@ -8,4 +8,64 @@ London ready to sleep and/or burn Petaouchnok ready to sleep and/or burn main ready to sleep and/or burn -Gdb request to kill this process +VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting +si_code=1; Faulting address: 0x3FFBAF7E000; sp: 0x1002e401e8 + +valgrind: the 'impossible' happened: + Killed by fatal signal + +host stacktrace: + at 0x3FFBAF7E480: ??? + by 0x800027FE7: vgPlain_poll (m_libcfile.c:765) + by 0xFFFFFFFFFFFFFFFF: ??? + +sched status: + running_tid=0 + +Thread 1: status = VgTs_WaitSys syscall 301 (lwpid 47398) + at 0x495A8E8: select (in /...libc...) + by 0x1001163: sleeper_or_burner (sleepers.c:85) + by 0x1001849: main (sleepers.c:193) +client stack range: [0x1FFEFFD000 0x1FFF000FFF] client SP: 0x1FFEFFF890 +valgrind stack range: [0x1002D41000 0x1002E40FFF] top usage: 12384 of 1048576 + +Thread 2: status = VgTs_WaitSys syscall 301 (lwpid 47428) + at 0x495A8E8: select (in /...libc...) + by 0x1001163: sleeper_or_burner (sleepers.c:85) + by 0x4847175: start_thread (in /usr/lib64/libpthread-2.33.so) + by 0x49629D5: ??? (in /...libc...) + by 0xFFFFFFFFFFFFFFFF: ??? +client stack range: [0x4A0D000 0x520CFFF] client SP: 0x520BDD0 +valgrind stack range: [0x1003FD2000 0x10040D1FFF] top usage: 5016 of 1048576 + +Thread 3: status = VgTs_WaitSys syscall 301 (lwpid 47429) + at 0x495A8E8: select (in /...libc...) + by 0x1001163: sleeper_or_burner (sleepers.c:85) + by 0x4847175: start_thread (in /usr/lib64/libpthread-2.33.so) + by 0x49629D5: ??? (in /...libc...) + by 0xFFFFFFFFFFFFFFFF: ??? +client stack range: [0x520E000 0x5A0DFFF] client SP: 0x5A0CDD0 +valgrind stack range: [0x10040D6000 0x10041D5FFF] top usage: 4872 of 1048576 + +Thread 4: status = VgTs_WaitSys syscall 301 (lwpid 47430) + at 0x495A8E8: select (in /...libc...) + by 0x1001163: sleeper_or_burner (sleepers.c:85) + by 0x4847175: start_thread (in /usr/lib64/libpthread-2.33.so) + by 0x49629D5: ??? (in /...libc...) + by 0xFFFFFFFFFFFFFFFF: ??? +client stack range: [0x5A0F000 0x620EFFF] client SP: 0x620DDD0 +valgrind stack range: [0x10041DA000 0x10042D9FFF] top usage: 2232 of 1048576 --- nlvgdbsigqueue.stdoutB.exp 2021-10-26 12:44:44.997207954 -0400 +++ nlvgdbsigqueue.stdoutB.out 2021-10-26 12:50:20.447232568 -0400 @@ -7,10 +7,3 @@ sending signal continuing to receive first SIGUSR1 Continuing. -Program received signal SIGUSR1, User defined signal 1. -0x........ in syscall ... -continuing to receive second SIGUSR1 -Continuing. -Program received signal SIGUSR1, User defined signal 1. -0x........ in syscall ... -Kill the program being debugged? (y or n) [answered Y; input not from terminal]
I've investigated the failure with nlvgdbsigqueue, and here's what I found out so far. The failure happens when trying to continue execution from an unmapped address: +host stacktrace: + at 0x3FFBAF7E480: ??? + by 0x800027FE7: vgPlain_poll (m_libcfile.c:765) + by 0xFFFFFFFFFFFFFFFF: ??? It turns out that the failing address, in this case 0x3FFBAF7E480, lies within the address range that previously held the vDSO. The kernel seems to jump there when restarting a "poll" syscall after a signal occurred. But Valgrind doesn't keep the vDSO mapping, so the syscall restart doesn't work and causes a SIGSEGV instead. So, why and since when does the kernel jump to the vDSO? This has obviously been introduced by this s390-specific patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df29a7440c4b5c65765c8f60396b3b13063e24e9 Considering this change in the kernel's behavior, it now becomes necessary for all user space processes to keep the vDSO mapping intact, and Valgrind currently violates that.
Created attachment 144394 [details] Keep vDSO mapping on s390x In my testing this patch fixes the issue. Some other architectures already have similar logic to keep the vDSO mapping intact. Note that this shouldn't affect the AUXV entry for the vDSO, which should still be removed from the AUXV.
(In reply to Andreas Arnez from comment #2) > [...] Note that this shouldn't > affect the AUXV entry for the vDSO, which should still be removed from the > AUXV. Oops, that's not true. This version of the patch leaves the AUXV entry intact as well. Sorry for the confusion.
Applied as commit 99bf5dabf7865aaea7f2192373633e026c6fb16e.
Thanks. Tests look good.