valgrind cannot run glibc binaries with SCV support on current kernels: ==3891888== Command: /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64/ld-2.32.9000.so --library-path /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64 /usr/bin/true ==3891888== dis_syslink(ppc)(theInstr) disInstr(ppc): unhandled instruction: 0x44000001 primary 17(0x11), secondary 1(0x1) ==3891888== valgrind: Unrecognised instruction at address 0x4999f80. ==3891888== at 0x4999F80: _Exit (in /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64/libc-2.32.9000.so) ==3891888== by 0x48ED76B: __run_exit_handlers (in /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64/libc-2.32.9000.so) ==3891888== by 0x48C9FBB: generic_start_main.constprop.0.isra.0 (in /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64/libc-2.32.9000.so) ==3891888== by 0x48CA19F: (below main) (in /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64/libc-2.32.9000.so) AT_HWCAP2 contains the SCV bit even though valgrind does not support the SCV instruction. Seen with valgrind-3.16.1-11.fc33.ppc64le.
Created attachment 134552 [details] patch to add support for the scv instruction Please test the attached patch to see if it fixes your issue. Thanks.
I think this does not work because the calling convention for “sc” and “scv 0” are different. The glibc test case misc/test-errno-linux is a good way to exercise syscall errors. Here's how you can run it under an valgrind binary which is found via $PATH. Use bash -x to see the valgrind invocation if you have to tweak the invocation. The --direct option tells the test harness to skip the outer fork. git clone --depth 1 git://sourceware.org/git/glibc.git git mkdir build cd build # Build glibc. The --prefix=/usr is important. ../git/configure --prefix=/usr make -j`nproc` -O # Build the test and run it without valgrind. make t=misc/test-errno-linux test # Run the built test under valgrind. bash testrun.sh --tool=valgrind misc/test-errno-linux --direct
So this is basically two bugs. 1) Newer kernels set the auxv AT_HWCAP2 bit which indicates SCV syscall support and could potentially set other capability bits for cpu instructions/features valgrind doesn't support, so we might want to proactively filter/clean AT_HWCAPS2 as is don't for some other architectures in coregrind/m_initimg/initimg-linux.c (setup_client_stack). 2) We want to support SCV syscall support. Which is the attached patch, but since the calling convention/return/error value reporting is different between sc and scv 0, the attached patch is not a complete implementation. When we have a complete implementation, then the filtering for 1) would change.
Created attachment 134659 [details] 0001-ppc64-Mask-unrecognized-AT_HWCAP2-values.patch This is what I came up with. It filters out DARN as well, which I think is actually required because there is no valgrind support for it yet.
That looks plausible: + /* Mask unrecognized HWCAP bits. Only keep the bits that have + * been mentioned above. + */ + auxv->u.a_val &= 0xfec40000ULL; So if I understand correctly that means (of those not mentioned above) explicitly filtered out: PPC_FEATURE2_HTM_NOSC 0x01000000 PPC_FEATURE2_DARN 0x00200000 PPC_FEATURE2_SCV 0x00100000 explicitly included: PPC_FEATURE2_ARCH_3_1 0x00040000 It might make sense to have defines of these in include/vki/vki-ppc64-linux.h as VKI_PPC_FEATURE2_* so that we can more easily see which ones we include/exclude.
Hmm. If we filter HTM_NOSC, we should also filter HTM.
For now I have backported Florian's patch to the Fedora rawhide valgrind package to get things unstuck: https://koji.fedoraproject.org/koji/buildinfo?buildID=1666682 I had to tweak the patch a little to remove the references to isa 3.1 (power10) plus the PREFIX checks. I did remove/filter the PPC_FEATURE2_ARCH_3_1 flag, but didn't touch any other (I believe valgrind does support HTM on power, but I don't know what HTM_NOSC is). Also, see https://bugs.kde.org/show_bug.cgi?id=422683 do we also need to filter AT_PLATFORM/AT_BASE_PLATFORM ?
For reference, this explains the (newer) powerpc syscall conventions of the linux kernel: https://www.kernel.org/doc/html/v5.10/powerpc/syscall64-abi.html
I think including the HTM bit is wrong, it should always be cleared along with HTM_NOSC. I doubt valgrind supports HTM, so this ought to be the right thing to do. It's definitely better than what we had before, though. I will try another rawhide glibc build. Yes, rewriting AT_PLATFORM would be desirable as well.
What's the status here? The first glibc release with scv support is coming up. Thanks.
(In reply to Florian Weimer from comment #10) > What's the status here? The first glibc release with scv support is coming > up. Thanks. The status is still as in comment #7. There is a workaround (filtering out the HWCAP2 bits) in the valgrind Fedora package, but nothing yet is upstream. Carl, what do you think? Should we go with the HWCAP2 filtering, can we make the SCV syscall working and/or both at the same time.
The Power support for Valgrind treats the Hardware Transactional Memory (HTM) instructions effectively as NOPs. The transaction begin instruction (tbegin) is always failed in Valgrind causing the code to execute the failure path. Given that, I don't see any issues with filtering out the HAS_HTM bit. From what I can find, the HTM_NOSC is for use by the Kernel to indicate it will abort the transaction before entering the kernel. Valgrind only works with User code so again there should not be any issues with filtering it out. I see no issues at this point with enabling the scv instruction and filtering out the HTM hardware caps bits.
From discussions with IBM and RedHat, the plan for the short term is mask the HWCAPS2 to force the use of the sc instruction instead of the newer scv0 instruction. I will work on a proper fix to support the scv0 instruction. The proper fix will consist of the existing scv0 patch I created plus the needed changes to the PPC Valgrind system call interface to handle the sc and scv0 instructions and 64-bit Linux system call ABI, as given in https://www.kernel.org/doc/html/v5.10/powerpc/syscall64-abi.html
Created attachment 135358 [details] patch to add support for the scv instruction I have updated the scv support patch. The new patch adds the needed support to parse the scv instruction in guest_ppc_toIR.c. It adds the needed support to call the sc or the scv instruction in the power assembly system call wrapper function, fixes up the setting of the at_restart and in_complete_to_committed variables and finally adds a test case for the sc and scv instructions. The patch has been tested on Power 8 BE, Power 8 LE, Power 9 (ISA 3.0) with kernel support for the scv instruction, Power 9 (ISA 3.0) without kernel support for the scv instruction and on a prototype ISA 3.1 hardware. The patch works without adding new regression errors. Julian can you please review the patch. Thanks.
Created attachment 135371 [details] patch to add support for the scv instruction Cleaned up formatting, spacing and a minor functional fix.
(In reply to Carl Love from comment #15) > Created attachment 135371 [details] > patch to add support for the scv instruction > > Cleaned up formatting, spacing and a minor functional fix. Unfortunately this patch doesn't seem to work correctly on a ppc64le setup with linux 5.11.0 and glibc 2.32.9000, which uses SCV: LD_SHOW_AUXV=1 /bin/true AT_DCACHEBSIZE: 0x80 AT_ICACHEBSIZE: 0x80 AT_UCACHEBSIZE: 0x0 AT_SYSINFO_EHDR: 0x7fff905c0000 AT_L1I_CACHESIZE: 32768 AT_L1I_CACHEGEOMETRY: 128B line size, 32-way set associative AT_L1D_CACHESIZE: 32768 AT_L1D_CACHEGEOMETRY: 128B line size, 32-way set associative AT_L2_CACHESIZE: 524288 AT_L2_CACHEGEOMETRY: 32B line size, 2048-way set associative AT_L3_CACHESIZE: 10485760 AT_L3_CACHEGEOMETRY: 32B line size, 40960-way set associative AT_HWCAP: true_le archpmu vsx arch_2_06 dfp ic_snoop smt mmu fpu altivec ppc64 ppc32 AT_PAGESZ: 65536 AT_CLKTCK: 100 AT_PHDR: 0x10f1b0040 AT_PHENT: 56 AT_PHNUM: 9 AT_BASE: 0x7fff905d0000 AT_FLAGS: 0x0 AT_ENTRY: 0x10f1b1ad8 AT_UID: 0 AT_EUID: 0 AT_GID: 0 AT_EGID: 0 AT_SECURE: 0 AT_RANDOM: 0x7fffcd3e3272 AT_HWCAP2: scv darn ieee128 arch_3_00 vcrypto tar isel ebb dscr arch_2_07 AT_EXECFN: /bin/true AT_PLATFORM: power9 AT_BASE_PLATFORM: power9 A simple testcase: # LANG=en.UTF-8 ./vg-in-place /bin/true --help ==108495== Memcheck, a memory error detector ==108495== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==108495== Using Valgrind-3.17.0.GIT and LibVEX; rerun with -h for copyright info ==108495== Command: /bin/true --help ==108495== ==108495== Warning: invalid file descriptor -2 in syscall openat() ==108495== Invalid read of size 4 ==108495== at 0x492B7C0: _nl_load_locale_from_archive (in /usr/lib64/power9/libc-2.32.9000.so) ==108495== by 0x4929ECF: _nl_find_locale (in /usr/lib64/power9/libc-2.32.9000.so) ==108495== by 0x492945F: setlocale (in /usr/lib64/power9/libc-2.32.9000.so) ==108495== by 0x181963: ??? (in /usr/bin/true) ==108495== by 0x49195B3: generic_start_main.constprop.0.isra.0 (in /usr/lib64/power9/libc-2.32.9000.so) ==108495== by 0x491978F: (below main) (in /usr/lib64/power9/libc-2.32.9000.so) ==108495== Address 0x26 is not stack'd, malloc'd or (recently) free'd ==108495== ==108495== ==108495== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==108495== Access not within mapped region at address 0x26 ==108495== at 0x492B7C0: _nl_load_locale_from_archive (in /usr/lib64/power9/libc-2.32.9000.so) ==108495== by 0x4929ECF: _nl_find_locale (in /usr/lib64/power9/libc-2.32.9000.so) ==108495== by 0x492945F: setlocale (in /usr/lib64/power9/libc-2.32.9000.so) ==108495== by 0x181963: ??? (in /usr/bin/true) ==108495== by 0x49195B3: generic_start_main.constprop.0.isra.0 (in /usr/lib64/power9/libc-2.32.9000.so) ==108495== by 0x491978F: (below main) (in /usr/lib64/power9/libc-2.32.9000.so) ==108495== If you believe this happened as a result of a stack ==108495== overflow in your program's main thread (unlikely but ==108495== possible), you can try to increase the size of the ==108495== main thread stack using the --main-stacksize= flag. ==108495== The main thread stack size used in this run was 8388608. ==108495== ==108495== HEAP SUMMARY: ==108495== in use at exit: 0 bytes in 0 blocks ==108495== total heap usage: 1 allocs, 1 frees, 5 bytes allocated ==108495== ==108495== All heap blocks were freed -- no leaks are possible ==108495== ==108495== For lists of detected and suppressed errors, rerun with: -s ==108495== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ./vg-in-place: line 31: 108495 Segmentation fault (core dumped) VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@" make regtest also shows various failures: == 638 tests, 44 stderr failures, 14 stdout failures, 0 stderrB failures, 0 stdoutB failures, 3 post failures == memcheck/tests/buflen_check (stderr) memcheck/tests/bug340392 (stderr) memcheck/tests/erringfds (stdout) memcheck/tests/erringfds (stderr) memcheck/tests/file_locking (stderr) memcheck/tests/leak_cpp_interior (stderr) memcheck/tests/linux/memfd (stderr) memcheck/tests/linux/rfcomm (stderr) memcheck/tests/linux/sys-execveat (stderr) memcheck/tests/linux/sys-openat (stderr) memcheck/tests/linux/sys-preadv_pwritev (stderr) memcheck/tests/linux/timerfd-syscall (stderr) memcheck/tests/null_socket (stdout) memcheck/tests/post-syscall (stderr) memcheck/tests/sigkill (stderr) memcheck/tests/supp_unknown (stderr) memcheck/tests/writev1 (stderr) drd/tests/sem_open (stderr) drd/tests/sem_open2 (stderr) drd/tests/sem_open3 (stderr) drd/tests/sem_open_traced (stderr) drd/tests/std_list (stderr) drd/tests/swapcontext (stderr) massif/tests/deep-D (post) massif/tests/new-cpp (post) massif/tests/overloaded-new (post) none/tests/coolo_sigaction (stdout) none/tests/coolo_sigaction (stderr) none/tests/empty-exe (stderr) none/tests/linux/mremap (stderr) none/tests/linux/mremap2 (stdout) none/tests/linux/mremap2 (stderr) none/tests/map_unaligned (stderr) none/tests/nocwd (stdout) none/tests/nocwd (stderr) none/tests/pth_cancel2 (stderr) none/tests/rlimit64_nofile (stderr) none/tests/rlimit_nofile (stderr) none/tests/scripts/bug231357 (stdout) none/tests/scripts/bug231357 (stderr) none/tests/scripts/nointerp1 (stdout) none/tests/scripts/nointerp1 (stderr) none/tests/scripts/nointerp2 (stdout) none/tests/scripts/nointerp2 (stderr) none/tests/scripts/nointerp3 (stdout) none/tests/scripts/nointerp3 (stderr) none/tests/scripts/nointerp4 (stderr) none/tests/scripts/nointerp5 (stderr) none/tests/scripts/relative1 (stdout) none/tests/scripts/relative1 (stderr) none/tests/scripts/relative2 (stdout) none/tests/scripts/relative2 (stderr) none/tests/scripts/shell (stdout) none/tests/scripts/shell (stderr) none/tests/scripts/shell_valid1 (stderr) none/tests/scripts/shell_valid4 (stdout) none/tests/scripts/shell_valid4 (stderr) none/tests/scripts/shell_zerolength (stderr) none/tests/syscall-restart1 (stderr) none/tests/syscall-restart2 (stderr) none/tests/threadederrno (stdout)
For now I have pushed this commit: commit ea98cccb4d50a8740708507c4c72cfb1e6c88e38 Author: Mark Wielaard <mark@klomp.org> Date: Tue Feb 23 16:19:26 2021 +0100 Filter out unsupported instructions from HWCAP2 on powerpc. Valgrind currently doesn't support the DARN random number instruction and the SCV syscall instruction. Filter them out of HWCAP2 so glibc and applications don't try to use them when running under valgrind. Also suppress printing a log message for scv instructions in the instruction stream. Reported by: Florian Weimer <fweimer@redhat.com> DARN bug: https://bugs.kde.org/show_bug.cgi?id=411189 SCV bug: https://bugs.kde.org/show_bug.cgi?id=431157 When SCV is implemented it can be added to HWCAPS2 again.
Created attachment 136175 [details] patch to add support for the scv instruction Rebased patch on Vaglrind mainline, enabled SCV which was masked out in Mark's committed patch. Tested on Power 8 LE, Power BE, Power 9 and prototype hardware using latest glibc that I can get. The patch seems to be working and regression tests pass. Note, with the latest glibc there are a number of new regression test failures specifically in helgrind without applying the scv patch. The SCV patch doesn't seem to introduce any additional issues. Will look into the new baseline regression test failures.
(In reply to Carl Love from comment #18) > Created attachment 136175 [details] > patch to add support for the scv instruction > > Rebased patch on Vaglrind mainline, enabled SCV which was masked out in > Mark's committed patch. Tested on Power 8 LE, Power BE, Power 9 and > prototype hardware using latest glibc that I can get. The patch seems to be > working and regression tests pass. Tested on a linux 5.11.0 and glibc 2.33 p9 machine (basically what Fedora 34 has). With current valgrind git, commit 9e9d1a171 (so SCV masked out): $ make nonexp-regtest == 633 tests, 6 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 3 post failures == memcheck/tests/bug340392 (stderr) memcheck/tests/leak_cpp_interior (stderr) memcheck/tests/linux/rfcomm (stderr) memcheck/tests/linux/sys-execveat (stderr) memcheck/tests/supp_unknown (stderr) drd/tests/swapcontext (stderr) massif/tests/deep-D (post) massif/tests/new-cpp (post) massif/tests/overloaded-new (post) With the attached patch (so with SCV enabled) on the same setup: == 634 tests, 44 stderr failures, 14 stdout failures, 0 stderrB failures, 0 stdoutB failures, 3 post failures == memcheck/tests/buflen_check (stderr) memcheck/tests/bug340392 (stderr) memcheck/tests/erringfds (stdout) memcheck/tests/erringfds (stderr) memcheck/tests/file_locking (stderr) memcheck/tests/leak_cpp_interior (stderr) memcheck/tests/linux/memfd (stderr) memcheck/tests/linux/rfcomm (stderr) memcheck/tests/linux/sys-execveat (stderr) memcheck/tests/linux/sys-openat (stderr) memcheck/tests/linux/sys-preadv_pwritev (stderr) memcheck/tests/linux/timerfd-syscall (stderr) memcheck/tests/null_socket (stdout) memcheck/tests/post-syscall (stderr) memcheck/tests/sigkill (stderr) memcheck/tests/supp_unknown (stderr) memcheck/tests/writev1 (stderr) drd/tests/sem_open (stderr) drd/tests/sem_open2 (stderr) drd/tests/sem_open3 (stderr) drd/tests/sem_open_traced (stderr) drd/tests/std_list (stderr) drd/tests/swapcontext (stderr) massif/tests/deep-D (post) massif/tests/new-cpp (post) massif/tests/overloaded-new (post) none/tests/coolo_sigaction (stdout) none/tests/coolo_sigaction (stderr) none/tests/empty-exe (stderr) none/tests/linux/mremap (stderr) none/tests/linux/mremap2 (stdout) none/tests/linux/mremap2 (stderr) none/tests/map_unaligned (stderr) none/tests/nocwd (stdout) none/tests/nocwd (stderr) none/tests/pth_cancel2 (stderr) none/tests/rlimit64_nofile (stderr) none/tests/rlimit_nofile (stderr) none/tests/scripts/bug231357 (stdout) none/tests/scripts/bug231357 (stderr) none/tests/scripts/nointerp1 (stdout) none/tests/scripts/nointerp1 (stderr) none/tests/scripts/nointerp2 (stdout) none/tests/scripts/nointerp2 (stderr) none/tests/scripts/nointerp3 (stdout) none/tests/scripts/nointerp3 (stderr) none/tests/scripts/nointerp4 (stderr) none/tests/scripts/nointerp5 (stderr) none/tests/scripts/relative1 (stdout) none/tests/scripts/relative1 (stderr) none/tests/scripts/relative2 (stdout) none/tests/scripts/relative2 (stderr) none/tests/scripts/shell (stdout) none/tests/scripts/shell (stderr) none/tests/scripts/shell_valid1 (stderr) none/tests/scripts/shell_valid4 (stdout) none/tests/scripts/shell_valid4 (stderr) none/tests/scripts/shell_zerolength (stderr) none/tests/syscall-restart1 (stderr) none/tests/syscall-restart2 (stderr) none/tests/threadederrno (stdout) So there still seem to be some issue when SCV is enabled under valgrind. Here is a syscall trace of /bin/true --help the simplest program I found that crashes under valgrind with SCV support enabled: # ./vg-in-place -q --trace-syscalls=yes --tool=none /bin/true --help SYSCALL[279836,1](45) sys_brk ( 0x0 ) --> [pre-success] Success(0x4070000) SYSCALL[279836,1](122) sys_newuname ( 0x1fff00e0b0 )[sync] --> Success(0x0) SYSCALL[279836,1](286) sys_openat ( -100, 0x4061a40(/root/valgrind/./.in_place/vgpreload_core-ppc64le-linux.so), 524288 ) --> [async] ... SYSCALL[279836,1](286) ... [async] --> Success(0x3) SYSCALL[279836,1](3) sys_read ( 3, 0x1fff00c938, 832 ) --> [async] ... SYSCALL[279836,1](3) ... [async] --> Success(0x340) SYSCALL[279836,1](291) sys_newfstatat ( 3, 0x403fc08(), 0x1fff00c750 )[sync] --> Success(0x0) SYSCALL[279836,1](90) sys_mmap ( 0x0, 131128, 5, 2050, 3, 0 ) --> [pre-success] Success(0x4870000) SYSCALL[279836,1](90) sys_mmap ( 0x4880000, 65536, 3, 2066, 3, 0 ) --> [pre-success] Success(0x4880000) SYSCALL[279836,1](90) sys_mmap ( 0x4890000, 56, 3, 50, 18446744073709551615, 0 ) --> [pre-success] Success(0x4890000) SYSCALL[279836,1](6) sys_close ( 3 )[sync] --> Success(0x0) SYSCALL[279836,1](33) sys_access ( 0x403c9d0(/etc/ld.so.preload), 4 )[sync] --> Failure(0x2) SYSCALL[279836,1](286) sys_openat ( -100, 0x403ea78(/etc/ld.so.cache), 524288 ) --> [async] ... SYSCALL[279836,1](286) ... [async] --> Success(0x3) SYSCALL[279836,1](291) sys_newfstatat ( 3, 0x403fc08(), 0x1fff00d230 )[sync] --> Success(0x0) SYSCALL[279836,1](90) sys_mmap ( 0x0, 25160, 1, 2, 3, 0 ) --> [pre-success] Success(0x48a0000) SYSCALL[279836,1](6) sys_close ( 3 )[sync] --> Success(0x0) SYSCALL[279836,1](286) sys_openat ( -100, 0x4061fb0(/lib64/power9/libc.so.6), 524288 ) --> [async] ... SYSCALL[279836,1](286) ... [async] --> Success(0x3) SYSCALL[279836,1](3) sys_read ( 3, 0x1fff00d4e8, 832 ) --> [async] ... SYSCALL[279836,1](3) ... [async] --> Success(0x340) SYSCALL[279836,1](291) sys_newfstatat ( 3, 0x403fc08(), 0x1fff00d300 )[sync] --> Success(0x0) SYSCALL[279836,1](90) sys_mmap ( 0x0, 2272232, 5, 2050, 3, 0 ) --> [pre-success] Success(0x48b0000) SYSCALL[279836,1](125) sys_mprotect ( 0x4ab0000, 65536, 0 )[sync] --> Success(0x0) SYSCALL[279836,1](90) sys_mmap ( 0x4ac0000, 131072, 3, 2066, 3, 2097152 ) --> [pre-success] Success(0x4ac0000) SYSCALL[279836,1](6) sys_close ( 3 )[sync] --> Success(0x0) SYSCALL[279836,1](125) sys_mprotect ( 0x4ac0000, 65536, 1 )[sync] --> Success(0x0) SYSCALL[279836,1](125) sys_mprotect ( 0x4880000, 65536, 1 )[sync] --> Success(0x0) SYSCALL[279836,1](125) sys_mprotect ( 0x190000, 65536, 1 )[sync] --> Success(0x0) SYSCALL[279836,1](125) sys_mprotect ( 0x4050000, 65536, 1 )[sync] --> Success(0x0) SYSCALL[279836,1](91) sys_munmap ( 0x48a0000, 25160 )[sync] --> Success(0x0) SYSCALL[279836,1](45) sys_brk ( 0x0 ) --> [pre-success] Success(0x4070000) SYSCALL[279836,1](45) sys_brk ( 0x40a0000 ) --> [pre-success] Success(0x40a0000) SYSCALL[279836,1](286) sys_openat ( -100, 0x4a6ee28(/usr/lib/locale/locale-archive), 524288 ) --> [async] ... SYSCALL[279836,1](286) ... [async] --> Success(0xfffffffffffffffe) SYSCALL[279836,1](291) sys_newfstatat ( 24, 0x4a73980(), 0x4ad2578 )[sync] --> Failure(0x9) SYSCALL[279836,1](90) sys_mmap ( 0x0, 0, 1, 2, 24, 0 ) --> [pre-fail] Failure(0x16) ==279836== ==279836== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==279836== Access not within mapped region at address 0x26 ==279836== at 0x48F1D00: _nl_load_locale_from_archive (in /usr/lib64/power9/libc-2.33.so) ==279836== by 0x48F03DF: _nl_find_locale (in /usr/lib64/power9/libc-2.33.so) ==279836== by 0x48EF95F: setlocale (in /usr/lib64/power9/libc-2.33.so) ==279836== by 0x181963: ??? (in /usr/bin/true) ==279836== by 0x48DF953: generic_start_main.constprop.0.isra.0 (in /usr/lib64/power9/libc-2.33.so) ==279836== by 0x48DFB2F: (below main) (in /usr/lib64/power9/libc-2.33.so) ==279836== If you believe this happened as a result of a stack ==279836== overflow in your program's main thread (unlikely but ==279836== possible), you can try to increase the size of the ==279836== main thread stack using the --main-stacksize= flag. ==279836== The main thread stack size used in this run was 8388608. ./vg-in-place: line 31: 279836 Segmentation fault (core dumped) VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@"
Looking at the end of the log there: SYSCALL[279836,1](291) sys_newfstatat ( 24, 0x4a73980(), 0x4ad2578 )[sync] --> Failure(0x9) SYSCALL[279836,1](90) sys_mmap ( 0x0, 0, 1, 2, 24, 0 ) --> [pre-fail] Failure(0x16) ==279836== ==279836== Process terminating with default action of signal 11 (SIGSEGV): dumping core The mmap call is marked "[pre-fail]", which means that V never passed it to the kernel. Instead it decided to "fail" it itself, for the reason 0x16, whatever that is. Might be worth investigating. Also I wonder if the failure of the immediately preceding sys_newfstatat syscall is related.
Well 0x16 is EINVAL which doesn't tell us much. I notice that the mmap is on FD 24 though and the previous line is a newfstatat on the same descriptor that failed with EBADF so it looks like it was trying to map a file that wasn't open?
This just before that is pretty odd: SYSCALL[279836,1](286) sys_openat ( -100, 0x4a6ee28(/usr/lib/locale/locale-archive), 524288 ) --> [async] ... SYSCALL[279836,1](286) ... [async] --> Success(0xfffffffffffffffe) So if thinks the open was a success with a negative file descriptor returned?
(In reply to Tom Hughes from comment #22) > So if thinks the open was a success with a negative file descriptor returned? Ah, maybe it's trying to tell us something :-) I vaguely remember reading that this new syscall convention denotes failure by returning a value between -1 and -4096 [bizarrely, like x86-32 in the earliest days of Linux]. If that's true, it would imply that the call was passed to the kernel using the new convention, it failed, but we mistakenly believe, in the post-syscall handling, that it succeeded.
Indeed if that is -2 that would be errno 2 aka ENOENT which would make sense.
(In reply to Tom Hughes from comment #24) > Indeed if that is -2 that would be errno 2 aka ENOENT which would make sense. Agreed. So then it would seem to me that the problem is that the patch doesn't update getSyscallStatusFromGuestState. How this works is: the assembly code in syscall-ppc64{le,be}-linux.S hands the syscall to the host, and when it returns, it copies the result back into the guest state: /* put the result back in the threadstate */ 3: std 3,OFFSET_ppc64_GPR3(30) /* gst->GPR3 = sc result */ /* copy cr0.so back to simulated state */ mfcr 5 /* r5 = CR */ rlwinm 5,5,4,31,31 /* r5 = (CR >> 28) & 1 */ stb 5,OFFSET_ppc64_CR0_0(30) /* gst->CR0.SO = cr0.so */ So far so good. No problem there. The problem occurs slightly later, when the main driver logic in syswrap-main.c calls getSyscallStatusFromGuestState to find out whether the syscall failed or not, in a target-independent way. And I am guessing this needs to be updated too. That currently computes the result as canonical->sres = VG_(mk_SysRes_ppc64_linux)( gst->guest_GPR3, cr0so ); meaning, it is using the legacy (R3 + CR0.S0) scheme to figure out whether the call failed, regardless of which scheme (sc or scv) the guest originally requested. Hence also, the syscall appears to fail or succeed depending on what junk the kernel left for us in CR0.S0, hence the apparently random failures that have been observed. Fixing it would be pretty simple if getSyscallStatusFromGuestState can find out for sure which syscall scheme was requested. That is probably safest done by passing it sci->orig_args. Another complication (sigh) is that this only covers the cases where we run the syscall async: if (sci->flags & SfMayBlock) { /* Syscall may block, so run it asynchronously */ vki_sigset_t mask; But this case will also need fixing } else { /* run the syscall directly */ and hence it looks like convert_SysRes_to_SyscallStatus will need fixing too.
Julian: Yes, that is correct, the return value for an error is negative. The condition code register is not used. On Fri, 2021-03-05 at 09:57 +0000, Julian Seward wrote: > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.kde.org_show-5Fbug.cgi-3Fid-3D431157&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=RFEmMkZAk--_wFGN5tkM_A&m=LDVfEmUd-H_cdiL0yAJIDD4f9GlVpf4oXZSJe8xr5xU&s=xZmlKz03m2k_wUJhzUNSKgExgb7_Drdzz2_0WAG3yrg&e= > > --- Comment #23 from Julian Seward <jseward@acm.org> --- > (In reply to Tom Hughes from comment #22) > > So if thinks the open was a success with a negative file descriptor > > returned? > > Ah, maybe it's trying to tell us something :-) I vaguely remember > reading > that this new syscall convention denotes failure by returning a value > between > -1 and -4096 [bizarrely, like x86-32 in the earliest days of > Linux]. If that's > true, it would imply that the call was passed to the kernel using the > new > convention, it failed, but we mistakenly believe, in the post-syscall > handling, > that it succeeded. >
Created attachment 137063 [details] patch to add support for the scv instruction Updated the scv instruction support patch. The scv instruction requires Linux kernel 5.10.11 or newer. The initial patch development was on an older kernel so wasn't actually exercising the scv paths in the original patch. That is why Mark ran into issues with the patch. One issue with the patch is the SC_FLAG and SCV_FLAG defines. These #defines were created to set a flag indicating which system call instruction is to be used. Unfortunately I could not find an include file that is visible in VEX as well as coregrind/m_syswrap and in coregrind/pub_core_syscall.h. As a result, I had to add the defines for these values in three different places with a comment pointing at the other locations. Rather ugly. Wasn't sure if we wanted to create an include file for these that would then be globally available?? Suggestions on how to better was to make these defines more globally visible would be nice. The patch has been tested on Power 8LE, Power 8BE, Power 9 (with scv support) and prototype ISA 3.1 hardware.
Mark, I was wondering if you have had a chance to try out the updated version of the scv patch. The updated patch takes care of the synchronous and asynchronous call paths. You do need to make sure you are testing with a kernel that is new enough to support scv and is not a partitioned system. Currently scv is not supported on partitioned systems from what I understand. Please let me know if see any issues with the updated patch. Thanks.
(In reply to Carl Love from comment #27) > Created attachment 137063 [details] > patch to add support for the scv instruction This looks OK to me. > One issue with the patch is the SC_FLAG and SCV_FLAG defines. These > #defines were created to set a flag indicating which system call instruction > is to be used. Unfortunately I could not find an include file that is > visible in VEX as well as coregrind/m_syswrap and in > coregrind/pub_core_syscall.h. As a result, I had to add the defines for > these values in three different places with a comment pointing at the other > locations. Rather ugly. Wasn't sure if we wanted to create an include file > for these that would then be globally available?? Suggestions on how to > better was to make these defines more globally visible would be nice. Nothing better immediately springs to mind.
patch committed commit 8f69756a2215e396b93573c9568cab00cc6c116d (HEAD -> master, origin/master, origin/HEAD) Author: Carl Love <cel@us.ibm.com> Date: Tue May 4 14:49:49 2021 -0500 PPC64: add support for the vectored system call instruction scv.
closing