The regression testing for ppc64 reports a significant number of failed tests with commit: commit 3bac39a10abf292d332bb20ab58c6dd5c28f9108 Author: Eugene Syromyatnikov <evgsyr@gmail.com> Date: Fri Mar 8 04:07:00 2019 +0100 include/vki: fix vki_siginfo_t definition on amd64, arm64, and ppc64 As it turned out, the size of vki_siginfo_t is incorrect on these 64-bit architectures: (gdb) p sizeof(vki_siginfo_t) $1 = 136 (gdb) ptype struct vki_siginfo type = struct vki_siginfo { int si_signo; int si_errno; int si_code; union { int _pad[29]; struct {...} _kill; struct {...} _timer; struct {...} _rt; struct {...} _sigchld; struct {...} _sigfault; struct {...} _sigpoll; } _sifields; } It looks like that for this architecture, __VKI_ARCH_SI_PREAMBLE_SIZE hasn't been defined properly, which resulted in incorrect VKI_SI_PAD_SIZE calculation (29 instead of 28). <6a9e4> DW_AT_name : (indirect string, offset: 0xcf59): _sifields <6a9ef> DW_AT_data_member_location: 16 This issue has been discovered with strace's "make check-valgrind-memcheck", which produced false out-of-bounds writes on ptrace(PTRACE_GETSIGINFO) calls: SYSCALL[24264,1](101) sys_ptrace ( 16898, 24283, 0x0, 0x606bd40 ) ==24264== Syscall param ptrace(getsiginfo) points to unaddressable byte(s) ==24264== at 0x575C06E: ptrace (ptrace.c:45) ==24264== by 0x443244: next_event (strace.c:2431) ==24264== by 0x443D30: main (strace.c:2845) ==24264== Address 0x606bdc0 is 0 bytes after a block of size 144 alloc'd (Note that the address passed is 0x606bd40 and the address reported is 0x606bdc0). After the patch, no such errors observed. * include/vki/vki-amd64-linux.h [__x86_64__ && __ILP32__] (__vki_kernel_si_clock_t): New typedef. [__x86_64__ && __ILP32__] (__VKI_ARCH_SI_CLOCK_T, __VKI_ARCH_SI_ATTRIBUTES): New macros. [__x86_64__ && !__ILP32__] (__VKI_ARCH_SI_PREAMBLE_SIZE): New macro, define to 4 ints. * include/vki/vki-arm64-linux.h (__VKI_ARCH_SI_PREAMBLE_SIZE): Likewise. * include/vki/vki-ppc64-linux.h [__powerpc64__] (__VKI_ARCH_SI_PREAMBLE_SIZE): Likewise. * include/vki/vki-linux.h [!__VKI_ARCH_SI_CLOCK_T] (__VKI_ARCH_SI_CLOCK_T): New macro, define to vki_clock_t. [!__VKI_ARCH_SI_ATTRIBUTES] (__VKI_ARCH_SI_ATTRIBUTES): New macro, define to nil. (struct vki_siginfo): Use __VKI_ARCH_SI_CLOCK_T type for _utime and _stime fields. Add __VKI_ARCH_SI_ATTRIBUTES. Resolves: https://bugs.kde.org/show_bug.cgi?id=405201 Reported-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com> Prior to the patch the number of failed tests was: == 647 tests, 3 stderr failures, 0 stdout failures, 0 stderrB failures, 1 stdou\ tB failure, 2 post failures == gdbserver_tests/nlcontrolc (stdoutB) memcheck/tests/bug340392 (stderr) memcheck/tests/leak_cpp_interior (stderr) memcheck/tests/linux/rfcomm (stderr) massif/tests/new-cpp (post) massif/tests/overloaded-new (post) Currently we have == 649 tests, 38 stderr failures, 13 stdout failures, 1 stderrB failure, 5 stdo\ utB failures, 2 post failures == gdbserver_tests/mcinfcallRU (stderr) gdbserver_tests/mcsignopass (stderr) gdbserver_tests/mcsignopass (stdoutB) gdbserver_tests/mcsigpass (stderr) gdbserver_tests/mcsigpass (stdoutB) gdbserver_tests/nlcontrolc (stdoutB) gdbserver_tests/nlpasssigalrm (stderr) gdbserver_tests/nlpasssigalrm (stdoutB) gdbserver_tests/nlpasssigalrm (stderrB) gdbserver_tests/nlvgdbsigqueue (stderr) gdbserver_tests/nlvgdbsigqueue (stdoutB) memcheck/tests/badjump2 (stderr) memcheck/tests/bug340392 (stderr) memcheck/tests/descr_belowsp (stderr) memcheck/tests/leak_cpp_interior (stderr) memcheck/tests/linux/rfcomm (stderr) memcheck/tests/post-syscall (stderr) memcheck/tests/sigaltstack (stderr) memcheck/tests/signal2 (stdout) memcheck/tests/signal2 (stderr) memcheck/tests/vcpu_fnfns (stdout) memcheck/tests/vcpu_fnfns (stderr) helgrind/tests/tc18_semabuse (stderr) helgrind/tests/tc20_verifywrap (stderr) drd/tests/pth_cancel_locked (stderr) drd/tests/sigalrm (stderr) drd/tests/sigaltstack (stderr) drd/tests/tc18_semabuse (stderr) massif/tests/new-cpp (post) massif/tests/overloaded-new (post) none/tests/async-sigs (stderr) none/tests/bug234814 (stdout) none/tests/bug234814 (stderr) none/tests/coolo_sigaction (stdout) none/tests/coolo_sigaction (stderr) none/tests/faultstatus (stderr) none/tests/linux/pthread-stack (stderr) none/tests/pending (stdout) none/tests/pending (stderr) none/tests/ppc64/test_isa_2_06_part3 (stdout) none/tests/ppc64/test_isa_2_06_part3 (stderr) none/tests/ppc64/test_isa_2_07_part2 (stdout) none/tests/ppc64/test_isa_2_07_part2 (stderr) none/tests/ppc64/tw_td (stdout) none/tests/ppc64/tw_td (stderr) none/tests/ppc64/twi_tdi (stdout) none/tests/ppc64/twi_tdi (stderr) none/tests/pth_cancel1 (stdout) none/tests/pth_cancel1 (stderr) none/tests/pth_cancel2 (stderr) none/tests/scripts/shell (stdout) none/tests/scripts/shell (stderr) none/tests/scripts/shell_valid4 (stderr) none/tests/sigstackgrowth (stdout) none/tests/sigstackgrowth (stderr) none/tests/syscall-restart1 (stderr) none/tests/syscall-restart2 (stderr) none/tests/thread-exits (stdout) none/tests/thread-exits (stderr)
Would like to get this issue fixed before the next Valgrind release. I will look at the commit further to help determine the root cause of the breakage.
This is also https://bugs.kde.org/show_bug.cgi?id=416760 The original bug fixed was https://bugs.kde.org/show_bug.cgi?id=405201
The following patch seems to clean up most of the issues: Fix for regression error --- coregrind/m_sigframe/sigframe-ppc64-linux.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/coregrind/m_sigframe/sigframe-ppc64-linux.c b/coregrind/m_sigframe/ sigframe-ppc64-linux.c index 0406f3c..b54c4e0 100644 --- a/coregrind/m_sigframe/sigframe-ppc64-linux.c +++ b/coregrind/m_sigframe/sigframe-ppc64-linux.c @@ -112,7 +112,7 @@ struct rt_sigframe { vki_siginfo_t info; struct vg_sig_private priv; UChar abigap[288]; // unused -}; +} __attribute__ ((aligned (16))); #define SET_SIGNAL_LR(zztst, zzval) \ do { tst->arch.vex.guest_LR = (zzval); \ -- 2.7.4 With this patch the number of regression errors is reduced to: == 649 tests, 6 stderr failures, 3 stdout failures, 0 stderrB failures, 2 stdou\ tB failures, 2 post failures == gdbserver_tests/nlgone_abrt (stdoutB) gdbserver_tests/nlpasssigalrm (stdoutB) memcheck/tests/bug340392 (stderr) memcheck/tests/leak_cpp_interior (stderr) memcheck/tests/linux/rfcomm (stderr) memcheck/tests/vcpu_fnfns (stdout) memcheck/tests/vcpu_fnfns (stderr) massif/tests/new-cpp (post) massif/tests/overloaded-new (post) none/tests/ppc64/test_isa_2_06_part3 (stdout) none/tests/ppc64/test_isa_2_06_part3 (stderr) none/tests/ppc64/test_isa_2_07_part2 (stdout) none/tests/ppc64/test_isa_2_07_part2 (stderr) Continuing to look at what additional changes are needed. But it looks like a start.
Rolled back to Eugene's commit 3bac39a10abf292d332bb20ab58c6dd5c28f9108 and applied my patch. That does fix all the regression error at that point. == 647 tests, 3 stderr failures, 0 stdout failures, 0 stderrB failures, 2 stdou\ tB failures, 2 post failures == gdbserver_tests/nlgone_abrt (stdoutB) gdbserver_tests/nlpasssigalrm (stdoutB) memcheck/tests/bug340392 (stderr) memcheck/tests/leak_cpp_interior (stderr) memcheck/tests/linux/rfcomm (stderr) massif/tests/new-cpp (post) massif/tests/overloaded-new (post) So, it looks like there is another commit that breaks a few things later. Looking....
Looks like the few additional errors that the fix given for this bug is for an unrelated commit after Eugene's changes. If we can get some testing of the fix on other platforms that would be helpful to decide if this is a good fix or not. I am a little concerned how universally the __attribute__ ((aligned (16))) works. Thanks.
Created attachment 126280 [details] PPC64 alginment fix for struct rt_sigframe PPC64 alginment fix for struct rt_sigframe The attached patch fixes the alignment for the rt_sigframe structure on PPC64. This fixes numerous regresion errors. On power 7, 8 and 9, it fixes 31 stderr failures, 10 stdout failures.
The patch was committed to Valgrind mainline commit 6f8920fd8f9381cf75f5dd0c3c65ac5d86c0a537 Author: Carl Love <carll@us.ibm.com> Date: Fri Feb 21 17:22:26 2020 -0600 PPC64, fix for alignment of the rt_sigframe data structure. The PPC64 implementation checks that the data structure is aligned. The changes in commit listed below breaks the alignment. This patch adds an explicit alignment directive to ensure the data structure is allocated with the required alignment. This fixes 31 stderr failures, 10 stdout failures on the Power 7, Power 8 and Power 9 platforms. commit 3bac39a10abf292d332bb20ab58c6dd5c28f9108 Author: Eugene Syromyatnikov <evgsyr@gmail.com> Date: Fri Mar 8 04:07:00 2019 +0100
Nightly regression tests errors were reduced as expected. No unexpected issues/errors were seen. Closing.