431157 – PPC_FEATURE2_SCV needs to be masked in AT_HWCAP2

Bug 431157 - PPC_FEATURE2_SCV needs to be masked in AT_HWCAP2

Summary: PPC_FEATURE2_SCV needs to be masked in AT_HWCAP2

Status:	CLOSED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	vex (show other bugs)
Version:	unspecified
Platform:	Fedora RPMs Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-01-04 19:57 UTC by Florian Weimer
Modified:	2021-05-04 20:36 UTC (History)
CC List:	5 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
patch to add support for the scv instruction (1.46 KB, patch) 2021-01-04 21:52 UTC, Carl Love	Details
0001-ppc64-Mask-unrecognized-AT_HWCAP2-values.patch (2.13 KB, text/plain) 2021-01-08 14:11 UTC, Florian Weimer	Details
patch to add support for the scv instruction (20.45 KB, patch) 2021-02-01 23:57 UTC, Carl Love	Details
patch to add support for the scv instruction (18.99 KB, patch) 2021-02-02 17:20 UTC, Carl Love	Details
patch to add support for the scv instruction (18.97 KB, patch) 2021-02-26 01:08 UTC, Carl Love	Details
patch to add support for the scv instruction (36.54 KB, patch) 2021-03-25 18:01 UTC, Carl Love	Details
Show Obsolete (4) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Florian Weimer 2021-01-04 19:57:22 UTC

valgrind cannot run glibc binaries with SCV support on current kernels:

==3891888== Command: /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64/ld-2.32.9000.so --library-path /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64 /usr/bin/true
==3891888== 
dis_syslink(ppc)(theInstr)
disInstr(ppc): unhandled instruction: 0x44000001
                 primary 17(0x11), secondary 1(0x1)
==3891888== valgrind: Unrecognised instruction at address 0x4999f80.
==3891888==    at 0x4999F80: _Exit (in /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64/libc-2.32.9000.so)
==3891888==    by 0x48ED76B: __run_exit_handlers (in /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64/libc-2.32.9000.so)
==3891888==    by 0x48C9FBB: generic_start_main.constprop.0.isra.0 (in /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64/libc-2.32.9000.so)
==3891888==    by 0x48CA19F: (below main) (in /builddir/build/BUILDROOT/glibc-2.32.9000-24.fc34.ppc64le/lib64/libc-2.32.9000.so)

AT_HWCAP2 contains the SCV bit even though valgrind does not support the SCV instruction.

Seen with valgrind-3.16.1-11.fc33.ppc64le.

Comment 1 Carl Love 2021-01-04 21:52:38 UTC

Created attachment 134552 [details]
patch to add support for the scv instruction

Please test the attached patch to see if it fixes your issue.  Thanks.

Comment 2 Florian Weimer 2021-01-05 10:12:29 UTC

I think this does not work because the calling convention for “sc” and “scv 0” are different.

The glibc test case misc/test-errno-linux is a good way to exercise syscall errors.  Here's how you can run it under an valgrind binary which is found via $PATH.  Use bash -x to see the valgrind invocation if you have to tweak the invocation.  The --direct option tells the test harness to skip the outer fork.

git clone --depth 1 git://sourceware.org/git/glibc.git git
mkdir build
cd build
# Build glibc. The --prefix=/usr is important.
../git/configure --prefix=/usr
make -j`nproc` -O
# Build the test and run it without valgrind.
make t=misc/test-errno-linux test
# Run the built test under valgrind.
bash testrun.sh --tool=valgrind misc/test-errno-linux --direct

Comment 3 Mark Wielaard 2021-01-08 12:44:46 UTC

So this is basically two bugs.

1) Newer kernels set the auxv AT_HWCAP2 bit which indicates SCV syscall support and could potentially set other capability bits for cpu instructions/features valgrind doesn't support, so we might want to proactively filter/clean AT_HWCAPS2 as is don't for some other architectures in coregrind/m_initimg/initimg-linux.c (setup_client_stack).

2) We want to support SCV syscall support. Which is the attached patch, but since the calling convention/return/error value reporting is different between sc and scv 0, the attached patch is not a complete implementation. When we have a complete implementation, then the filtering for 1) would change.

Comment 4 Florian Weimer 2021-01-08 14:11:44 UTC

Created attachment 134659 [details]
0001-ppc64-Mask-unrecognized-AT_HWCAP2-values.patch

This is what I came up with.

It filters out DARN as well, which I think is actually required because there is no valgrind support for it yet.

Comment 5 Mark Wielaard 2021-01-08 15:06:55 UTC

That looks plausible:

+            /* Mask unrecognized HWCAP bits.  Only keep the bits that have
+             * been mentioned above.
+             */
+            auxv->u.a_val &= 0xfec40000ULL;

So if I understand correctly that means (of those not mentioned above)

explicitly filtered out:

PPC_FEATURE2_HTM_NOSC 0x01000000
PPC_FEATURE2_DARN     0x00200000
PPC_FEATURE2_SCV      0x00100000

explicitly included:

PPC_FEATURE2_ARCH_3_1 0x00040000

It might make sense to have defines of these in include/vki/vki-ppc64-linux.h as VKI_PPC_FEATURE2_* so that we can more easily see which ones we include/exclude.

Comment 6 Florian Weimer 2021-01-08 15:13:30 UTC

Hmm. If we filter HTM_NOSC, we should also filter HTM.

Comment 7 Mark Wielaard 2021-01-08 16:18:56 UTC

For now I have backported Florian's patch to the Fedora rawhide valgrind package to get things unstuck: https://koji.fedoraproject.org/koji/buildinfo?buildID=1666682

I had to tweak the patch a little to remove the references to isa 3.1 (power10) plus the PREFIX checks.

I did remove/filter the PPC_FEATURE2_ARCH_3_1 flag, but didn't touch any other (I believe valgrind does support HTM on power, but I don't know what HTM_NOSC is).

Also, see https://bugs.kde.org/show_bug.cgi?id=422683 do we also need to filter AT_PLATFORM/AT_BASE_PLATFORM ?

Comment 8 Mark Wielaard 2021-01-08 16:23:37 UTC

For reference, this explains the (newer) powerpc syscall conventions of the linux kernel: https://www.kernel.org/doc/html/v5.10/powerpc/syscall64-abi.html

Comment 9 Florian Weimer 2021-01-08 16:25:18 UTC

I think including the HTM bit is wrong, it should always be cleared along with HTM_NOSC. I doubt valgrind supports HTM, so this ought to be the right thing to do. It's definitely better than what we had before, though. I will try another rawhide glibc build.

Yes, rewriting AT_PLATFORM would be desirable as well.

Comment 10 Florian Weimer 2021-01-22 10:41:40 UTC

What's the status here? The first glibc release with scv support is coming up. Thanks.

Comment 11 Mark Wielaard 2021-01-22 12:51:16 UTC

(In reply to Florian Weimer from comment #10)
> What's the status here? The first glibc release with scv support is coming
> up. Thanks.

The status is still as in comment #7. There is a workaround (filtering out the HWCAP2 bits) in the valgrind Fedora package, but nothing yet is upstream.

Carl, what do you think?
Should we go with the HWCAP2 filtering, can we make the SCV syscall working and/or both at the same time.

Comment 12 Carl Love 2021-01-22 16:36:10 UTC

The Power support for Valgrind treats the Hardware Transactional Memory (HTM) instructions effectively as NOPs.  The transaction begin instruction (tbegin) is always failed in Valgrind causing the code to execute the failure path.  

Given that, I don't see any issues with filtering out the HAS_HTM bit.  From what I can find, the HTM_NOSC is for use by the Kernel to indicate it will abort the transaction before entering the kernel.  Valgrind only works with User code so again there should not be any issues with filtering it out.

I see no issues at this point with enabling the scv instruction and filtering out the HTM hardware caps bits.

Comment 13 Carl Love 2021-01-25 19:45:30 UTC

From discussions with IBM and RedHat, the plan for the short term is mask the HWCAPS2 to force the use of the sc instruction instead of the newer scv0 instruction.  I will work on a proper fix to support the scv0 instruction.  The proper fix will consist of the existing scv0 patch I created plus the needed changes to the PPC Valgrind system call interface to handle the sc and scv0 instructions and 64-bit Linux system call ABI, as given in 

https://www.kernel.org/doc/html/v5.10/powerpc/syscall64-abi.html

Comment 14 Carl Love 2021-02-01 23:57:38 UTC

Created attachment 135358 [details]
patch to add support for the scv instruction

I have updated the scv support patch.  The new patch adds the needed support to parse the scv instruction in guest_ppc_toIR.c.  It adds the needed support to call the sc or the scv instruction in the power assembly system call wrapper function, fixes up the setting of the at_restart and in_complete_to_committed variables and finally adds a test case for the sc and scv instructions.  

The patch has been tested on Power 8 BE, Power 8 LE, Power 9 (ISA 3.0) with kernel support for the scv instruction, Power 9 (ISA 3.0) without kernel support for the scv instruction and on a prototype ISA 3.1 hardware.  The patch works without adding new regression errors.

Julian can you please review the patch.  Thanks.

Comment 15 Carl Love 2021-02-02 17:20:39 UTC

Created attachment 135371 [details]
patch to add support for the scv instruction

Cleaned up formatting, spacing and a minor functional fix.

Comment 16 Mark Wielaard 2021-02-23 12:10:05 UTC

(In reply to Carl Love from comment #15)
> Created attachment 135371 [details]
> patch to add support for the scv instruction
> 
> Cleaned up formatting, spacing and a minor functional fix.

Unfortunately this patch doesn't seem to work correctly on a ppc64le setup with linux 5.11.0 and glibc 2.32.9000, which uses SCV:

 LD_SHOW_AUXV=1 /bin/true
AT_DCACHEBSIZE:       0x80
AT_ICACHEBSIZE:       0x80
AT_UCACHEBSIZE:       0x0
AT_SYSINFO_EHDR:      0x7fff905c0000
AT_L1I_CACHESIZE:     32768
AT_L1I_CACHEGEOMETRY: 128B line size, 32-way set associative
AT_L1D_CACHESIZE:     32768
AT_L1D_CACHEGEOMETRY: 128B line size, 32-way set associative
AT_L2_CACHESIZE:      524288
AT_L2_CACHEGEOMETRY:  32B line size, 2048-way set associative
AT_L3_CACHESIZE:      10485760
AT_L3_CACHEGEOMETRY:  32B line size, 40960-way set associative
AT_HWCAP:             true_le archpmu vsx arch_2_06 dfp ic_snoop smt mmu fpu altivec ppc64 ppc32
AT_PAGESZ:            65536
AT_CLKTCK:            100
AT_PHDR:              0x10f1b0040
AT_PHENT:             56
AT_PHNUM:             9
AT_BASE:              0x7fff905d0000
AT_FLAGS:             0x0
AT_ENTRY:             0x10f1b1ad8
AT_UID:               0
AT_EUID:              0
AT_GID:               0
AT_EGID:              0
AT_SECURE:            0
AT_RANDOM:            0x7fffcd3e3272
AT_HWCAP2:            scv darn ieee128 arch_3_00 vcrypto tar isel ebb dscr arch_2_07
AT_EXECFN:            /bin/true
AT_PLATFORM:          power9
AT_BASE_PLATFORM:     power9

A simple testcase:

# LANG=en.UTF-8 ./vg-in-place /bin/true --help
==108495== Memcheck, a memory error detector
==108495== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==108495== Using Valgrind-3.17.0.GIT and LibVEX; rerun with -h for copyright info
==108495== Command: /bin/true --help
==108495== 
==108495== Warning: invalid file descriptor -2 in syscall openat()
==108495== Invalid read of size 4
==108495==    at 0x492B7C0: _nl_load_locale_from_archive (in /usr/lib64/power9/libc-2.32.9000.so)
==108495==    by 0x4929ECF: _nl_find_locale (in /usr/lib64/power9/libc-2.32.9000.so)
==108495==    by 0x492945F: setlocale (in /usr/lib64/power9/libc-2.32.9000.so)
==108495==    by 0x181963: ??? (in /usr/bin/true)
==108495==    by 0x49195B3: generic_start_main.constprop.0.isra.0 (in /usr/lib64/power9/libc-2.32.9000.so)
==108495==    by 0x491978F: (below main) (in /usr/lib64/power9/libc-2.32.9000.so)
==108495==  Address 0x26 is not stack'd, malloc'd or (recently) free'd
==108495== 
==108495== 
==108495== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==108495==  Access not within mapped region at address 0x26
==108495==    at 0x492B7C0: _nl_load_locale_from_archive (in /usr/lib64/power9/libc-2.32.9000.so)
==108495==    by 0x4929ECF: _nl_find_locale (in /usr/lib64/power9/libc-2.32.9000.so)
==108495==    by 0x492945F: setlocale (in /usr/lib64/power9/libc-2.32.9000.so)
==108495==    by 0x181963: ??? (in /usr/bin/true)
==108495==    by 0x49195B3: generic_start_main.constprop.0.isra.0 (in /usr/lib64/power9/libc-2.32.9000.so)
==108495==    by 0x491978F: (below main) (in /usr/lib64/power9/libc-2.32.9000.so)
==108495==  If you believe this happened as a result of a stack
==108495==  overflow in your program's main thread (unlikely but
==108495==  possible), you can try to increase the size of the
==108495==  main thread stack using the --main-stacksize= flag.
==108495==  The main thread stack size used in this run was 8388608.
==108495== 
==108495== HEAP SUMMARY:
==108495==     in use at exit: 0 bytes in 0 blocks
==108495==   total heap usage: 1 allocs, 1 frees, 5 bytes allocated
==108495== 
==108495== All heap blocks were freed -- no leaks are possible
==108495== 
==108495== For lists of detected and suppressed errors, rerun with: -s
==108495== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
./vg-in-place: line 31: 108495 Segmentation fault      (core dumped) VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@"

make regtest also shows various failures:

== 638 tests, 44 stderr failures, 14 stdout failures, 0 stderrB failures, 0 stdoutB failures, 3 post failures ==
memcheck/tests/buflen_check              (stderr)
memcheck/tests/bug340392                 (stderr)
memcheck/tests/erringfds                 (stdout)
memcheck/tests/erringfds                 (stderr)
memcheck/tests/file_locking              (stderr)
memcheck/tests/leak_cpp_interior         (stderr)
memcheck/tests/linux/memfd               (stderr)
memcheck/tests/linux/rfcomm              (stderr)
memcheck/tests/linux/sys-execveat        (stderr)
memcheck/tests/linux/sys-openat          (stderr)
memcheck/tests/linux/sys-preadv_pwritev  (stderr)
memcheck/tests/linux/timerfd-syscall     (stderr)
memcheck/tests/null_socket               (stdout)
memcheck/tests/post-syscall              (stderr)
memcheck/tests/sigkill                   (stderr)
memcheck/tests/supp_unknown              (stderr)
memcheck/tests/writev1                   (stderr)
drd/tests/sem_open                       (stderr)
drd/tests/sem_open2                      (stderr)
drd/tests/sem_open3                      (stderr)
drd/tests/sem_open_traced                (stderr)
drd/tests/std_list                       (stderr)
drd/tests/swapcontext                    (stderr)
massif/tests/deep-D                      (post)
massif/tests/new-cpp                     (post)
massif/tests/overloaded-new              (post)
none/tests/coolo_sigaction               (stdout)
none/tests/coolo_sigaction               (stderr)
none/tests/empty-exe                     (stderr)
none/tests/linux/mremap                  (stderr)
none/tests/linux/mremap2                 (stdout)
none/tests/linux/mremap2                 (stderr)
none/tests/map_unaligned                 (stderr)
none/tests/nocwd                         (stdout)
none/tests/nocwd                         (stderr)
none/tests/pth_cancel2                   (stderr)
none/tests/rlimit64_nofile               (stderr)
none/tests/rlimit_nofile                 (stderr)
none/tests/scripts/bug231357             (stdout)
none/tests/scripts/bug231357             (stderr)
none/tests/scripts/nointerp1             (stdout)
none/tests/scripts/nointerp1             (stderr)
none/tests/scripts/nointerp2             (stdout)
none/tests/scripts/nointerp2             (stderr)
none/tests/scripts/nointerp3             (stdout)
none/tests/scripts/nointerp3             (stderr)
none/tests/scripts/nointerp4             (stderr)
none/tests/scripts/nointerp5             (stderr)
none/tests/scripts/relative1             (stdout)
none/tests/scripts/relative1             (stderr)
none/tests/scripts/relative2             (stdout)
none/tests/scripts/relative2             (stderr)
none/tests/scripts/shell                 (stdout)
none/tests/scripts/shell                 (stderr)
none/tests/scripts/shell_valid1          (stderr)
none/tests/scripts/shell_valid4          (stdout)
none/tests/scripts/shell_valid4          (stderr)
none/tests/scripts/shell_zerolength      (stderr)
none/tests/syscall-restart1              (stderr)
none/tests/syscall-restart2              (stderr)
none/tests/threadederrno                 (stdout)

Comment 17 Mark Wielaard 2021-02-23 15:24:36 UTC

For now I have pushed this commit:

commit ea98cccb4d50a8740708507c4c72cfb1e6c88e38
Author: Mark Wielaard <mark@klomp.org>
Date:   Tue Feb 23 16:19:26 2021 +0100

    Filter out unsupported instructions from HWCAP2 on powerpc.
    
    Valgrind currently doesn't support the DARN random number instruction
    and the SCV syscall instruction. Filter them out of HWCAP2 so glibc
    and applications don't try to use them when running under valgrind.
    
    Also suppress printing a log message for scv instructions in the
    instruction stream.
    
    Reported by: Florian Weimer <fweimer@redhat.com>
    
    DARN bug: https://bugs.kde.org/show_bug.cgi?id=411189
    SCV bug: https://bugs.kde.org/show_bug.cgi?id=431157

When SCV is implemented it can be added to HWCAPS2 again.

Comment 18 Carl Love 2021-02-26 01:08:40 UTC

Created attachment 136175 [details]
patch to add support for the scv instruction

Rebased patch on Vaglrind mainline, enabled SCV which was masked out in Mark's committed patch.   Tested on Power 8 LE, Power BE, Power 9 and prototype hardware using latest glibc that I can get.  The patch seems to be working and regression tests pass.  Note, with the latest glibc there are a number of new regression test failures specifically in helgrind without applying the scv patch.  The SCV patch doesn't seem to introduce any additional issues.  Will look into the new baseline regression test failures.

Comment 19 Mark Wielaard 2021-02-28 22:10:16 UTC

(In reply to Carl Love from comment #18)
> Created attachment 136175 [details]
> patch to add support for the scv instruction
> 
> Rebased patch on Vaglrind mainline, enabled SCV which was masked out in
> Mark's committed patch.   Tested on Power 8 LE, Power BE, Power 9 and
> prototype hardware using latest glibc that I can get.  The patch seems to be
> working and regression tests pass.

Tested on a linux 5.11.0 and glibc 2.33 p9 machine (basically what Fedora 34 has).

With current valgrind git, commit 9e9d1a171 (so SCV masked out):

$ make nonexp-regtest

== 633 tests, 6 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 3 post failures ==
memcheck/tests/bug340392                 (stderr)
memcheck/tests/leak_cpp_interior         (stderr)
memcheck/tests/linux/rfcomm              (stderr)
memcheck/tests/linux/sys-execveat        (stderr)
memcheck/tests/supp_unknown              (stderr)
drd/tests/swapcontext                    (stderr)
massif/tests/deep-D                      (post)
massif/tests/new-cpp                     (post)
massif/tests/overloaded-new              (post)

With the attached patch (so with SCV enabled) on the same setup:

== 634 tests, 44 stderr failures, 14 stdout failures, 0 stderrB failures, 0 stdoutB failures, 3 post failures ==
memcheck/tests/buflen_check              (stderr)
memcheck/tests/bug340392                 (stderr)
memcheck/tests/erringfds                 (stdout)
memcheck/tests/erringfds                 (stderr)
memcheck/tests/file_locking              (stderr)
memcheck/tests/leak_cpp_interior         (stderr)
memcheck/tests/linux/memfd               (stderr)
memcheck/tests/linux/rfcomm              (stderr)
memcheck/tests/linux/sys-execveat        (stderr)
memcheck/tests/linux/sys-openat          (stderr)
memcheck/tests/linux/sys-preadv_pwritev  (stderr)
memcheck/tests/linux/timerfd-syscall     (stderr)
memcheck/tests/null_socket               (stdout)
memcheck/tests/post-syscall              (stderr)
memcheck/tests/sigkill                   (stderr)
memcheck/tests/supp_unknown              (stderr)
memcheck/tests/writev1                   (stderr)
drd/tests/sem_open                       (stderr)
drd/tests/sem_open2                      (stderr)
drd/tests/sem_open3                      (stderr)
drd/tests/sem_open_traced                (stderr)
drd/tests/std_list                       (stderr)
drd/tests/swapcontext                    (stderr)
massif/tests/deep-D                      (post)
massif/tests/new-cpp                     (post)
massif/tests/overloaded-new              (post)
none/tests/coolo_sigaction               (stdout)
none/tests/coolo_sigaction               (stderr)
none/tests/empty-exe                     (stderr)
none/tests/linux/mremap                  (stderr)
none/tests/linux/mremap2                 (stdout)
none/tests/linux/mremap2                 (stderr)
none/tests/map_unaligned                 (stderr)
none/tests/nocwd                         (stdout)
none/tests/nocwd                         (stderr)
none/tests/pth_cancel2                   (stderr)
none/tests/rlimit64_nofile               (stderr)
none/tests/rlimit_nofile                 (stderr)
none/tests/scripts/bug231357             (stdout)
none/tests/scripts/bug231357             (stderr)
none/tests/scripts/nointerp1             (stdout)
none/tests/scripts/nointerp1             (stderr)
none/tests/scripts/nointerp2             (stdout)
none/tests/scripts/nointerp2             (stderr)
none/tests/scripts/nointerp3             (stdout)
none/tests/scripts/nointerp3             (stderr)
none/tests/scripts/nointerp4             (stderr)
none/tests/scripts/nointerp5             (stderr)
none/tests/scripts/relative1             (stdout)
none/tests/scripts/relative1             (stderr)
none/tests/scripts/relative2             (stdout)
none/tests/scripts/relative2             (stderr)
none/tests/scripts/shell                 (stdout)
none/tests/scripts/shell                 (stderr)
none/tests/scripts/shell_valid1          (stderr)
none/tests/scripts/shell_valid4          (stdout)
none/tests/scripts/shell_valid4          (stderr)
none/tests/scripts/shell_zerolength      (stderr)
none/tests/syscall-restart1              (stderr)
none/tests/syscall-restart2              (stderr)
none/tests/threadederrno                 (stdout)

So there still seem to be some issue when SCV is enabled under valgrind.

Here is a syscall trace of /bin/true --help the simplest program I found that crashes under valgrind with SCV support enabled:

# ./vg-in-place -q --trace-syscalls=yes --tool=none /bin/true --help
SYSCALL[279836,1](45) sys_brk ( 0x0 ) --> [pre-success] Success(0x4070000) 
SYSCALL[279836,1](122) sys_newuname ( 0x1fff00e0b0 )[sync] --> Success(0x0) 
SYSCALL[279836,1](286) sys_openat ( -100, 0x4061a40(/root/valgrind/./.in_place/vgpreload_core-ppc64le-linux.so), 524288 ) --> [async] ... 
SYSCALL[279836,1](286) ... [async] --> Success(0x3) 
SYSCALL[279836,1](3) sys_read ( 3, 0x1fff00c938, 832 ) --> [async] ... 
SYSCALL[279836,1](3) ... [async] --> Success(0x340) 
SYSCALL[279836,1](291) sys_newfstatat ( 3, 0x403fc08(), 0x1fff00c750 )[sync] --> Success(0x0) 
SYSCALL[279836,1](90) sys_mmap ( 0x0, 131128, 5, 2050, 3, 0 ) --> [pre-success] Success(0x4870000) 
SYSCALL[279836,1](90) sys_mmap ( 0x4880000, 65536, 3, 2066, 3, 0 ) --> [pre-success] Success(0x4880000) 
SYSCALL[279836,1](90) sys_mmap ( 0x4890000, 56, 3, 50, 18446744073709551615, 0 ) --> [pre-success] Success(0x4890000) 
SYSCALL[279836,1](6) sys_close ( 3 )[sync] --> Success(0x0) 
SYSCALL[279836,1](33) sys_access ( 0x403c9d0(/etc/ld.so.preload), 4 )[sync] --> Failure(0x2) 
SYSCALL[279836,1](286) sys_openat ( -100, 0x403ea78(/etc/ld.so.cache), 524288 ) --> [async] ... 
SYSCALL[279836,1](286) ... [async] --> Success(0x3) 
SYSCALL[279836,1](291) sys_newfstatat ( 3, 0x403fc08(), 0x1fff00d230 )[sync] --> Success(0x0) 
SYSCALL[279836,1](90) sys_mmap ( 0x0, 25160, 1, 2, 3, 0 ) --> [pre-success] Success(0x48a0000) 
SYSCALL[279836,1](6) sys_close ( 3 )[sync] --> Success(0x0) 
SYSCALL[279836,1](286) sys_openat ( -100, 0x4061fb0(/lib64/power9/libc.so.6), 524288 ) --> [async] ... 
SYSCALL[279836,1](286) ... [async] --> Success(0x3) 
SYSCALL[279836,1](3) sys_read ( 3, 0x1fff00d4e8, 832 ) --> [async] ... 
SYSCALL[279836,1](3) ... [async] --> Success(0x340) 
SYSCALL[279836,1](291) sys_newfstatat ( 3, 0x403fc08(), 0x1fff00d300 )[sync] --> Success(0x0) 
SYSCALL[279836,1](90) sys_mmap ( 0x0, 2272232, 5, 2050, 3, 0 ) --> [pre-success] Success(0x48b0000) 
SYSCALL[279836,1](125) sys_mprotect ( 0x4ab0000, 65536, 0 )[sync] --> Success(0x0) 
SYSCALL[279836,1](90) sys_mmap ( 0x4ac0000, 131072, 3, 2066, 3, 2097152 ) --> [pre-success] Success(0x4ac0000) 
SYSCALL[279836,1](6) sys_close ( 3 )[sync] --> Success(0x0) 
SYSCALL[279836,1](125) sys_mprotect ( 0x4ac0000, 65536, 1 )[sync] --> Success(0x0) 
SYSCALL[279836,1](125) sys_mprotect ( 0x4880000, 65536, 1 )[sync] --> Success(0x0) 
SYSCALL[279836,1](125) sys_mprotect ( 0x190000, 65536, 1 )[sync] --> Success(0x0) 
SYSCALL[279836,1](125) sys_mprotect ( 0x4050000, 65536, 1 )[sync] --> Success(0x0) 
SYSCALL[279836,1](91) sys_munmap ( 0x48a0000, 25160 )[sync] --> Success(0x0) 
SYSCALL[279836,1](45) sys_brk ( 0x0 ) --> [pre-success] Success(0x4070000) 
SYSCALL[279836,1](45) sys_brk ( 0x40a0000 ) --> [pre-success] Success(0x40a0000) 
SYSCALL[279836,1](286) sys_openat ( -100, 0x4a6ee28(/usr/lib/locale/locale-archive), 524288 ) --> [async] ... 
SYSCALL[279836,1](286) ... [async] --> Success(0xfffffffffffffffe) 
SYSCALL[279836,1](291) sys_newfstatat ( 24, 0x4a73980(), 0x4ad2578 )[sync] --> Failure(0x9) 
SYSCALL[279836,1](90) sys_mmap ( 0x0, 0, 1, 2, 24, 0 ) --> [pre-fail] Failure(0x16) 
==279836== 
==279836== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==279836==  Access not within mapped region at address 0x26
==279836==    at 0x48F1D00: _nl_load_locale_from_archive (in /usr/lib64/power9/libc-2.33.so)
==279836==    by 0x48F03DF: _nl_find_locale (in /usr/lib64/power9/libc-2.33.so)
==279836==    by 0x48EF95F: setlocale (in /usr/lib64/power9/libc-2.33.so)
==279836==    by 0x181963: ??? (in /usr/bin/true)
==279836==    by 0x48DF953: generic_start_main.constprop.0.isra.0 (in /usr/lib64/power9/libc-2.33.so)
==279836==    by 0x48DFB2F: (below main) (in /usr/lib64/power9/libc-2.33.so)
==279836==  If you believe this happened as a result of a stack
==279836==  overflow in your program's main thread (unlikely but
==279836==  possible), you can try to increase the size of the
==279836==  main thread stack using the --main-stacksize= flag.
==279836==  The main thread stack size used in this run was 8388608.
./vg-in-place: line 31: 279836 Segmentation fault      (core dumped) VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@"

Comment 20 Julian Seward 2021-03-05 09:26:42 UTC

Looking at the end of the log there:

SYSCALL[279836,1](291) sys_newfstatat ( 24, 0x4a73980(), 0x4ad2578 )[sync] --> Failure(0x9) 
SYSCALL[279836,1](90) sys_mmap ( 0x0, 0, 1, 2, 24, 0 ) --> [pre-fail] Failure(0x16) 
==279836== 
==279836== Process terminating with default action of signal 11 (SIGSEGV): dumping core

The mmap call is marked "[pre-fail]", which means that V never passed it
to the kernel.  Instead it decided to "fail" it itself, for the reason 0x16,
whatever that is.  Might be worth investigating.

Also I wonder if the failure of the immediately preceding sys_newfstatat
syscall is related.

Comment 21 Tom Hughes 2021-03-05 09:41:01 UTC

Well 0x16 is EINVAL which doesn't tell us much.

I notice that the mmap is on FD 24 though and the previous line is a newfstatat on the same descriptor that failed with EBADF so it looks like it was trying to map a file that wasn't open?

Comment 22 Tom Hughes 2021-03-05 09:42:59 UTC

This just before that is pretty odd:

SYSCALL[279836,1](286) sys_openat ( -100, 0x4a6ee28(/usr/lib/locale/locale-archive), 524288 ) --> [async] ... 
SYSCALL[279836,1](286) ... [async] --> Success(0xfffffffffffffffe) 

So if thinks the open was a success with a negative file descriptor returned?

Comment 23 Julian Seward 2021-03-05 09:57:57 UTC

(In reply to Tom Hughes from comment #22)
> So if thinks the open was a success with a negative file descriptor returned?

Ah, maybe it's trying to tell us something :-)  I vaguely remember reading
that this new syscall convention denotes failure by returning a value between
-1 and -4096 [bizarrely, like x86-32 in the earliest days of Linux].  If that's
true, it would imply that the call was passed to the kernel using the new 
convention, it failed, but we mistakenly believe, in the post-syscall handling,
that it succeeded.

Comment 24 Tom Hughes 2021-03-05 10:05:35 UTC

Indeed if that is -2 that would be errno 2 aka ENOENT which would make sense.

Comment 25 Julian Seward 2021-03-05 10:24:10 UTC

(In reply to Tom Hughes from comment #24)
> Indeed if that is -2 that would be errno 2 aka ENOENT which would make sense.

Agreed.

So then it would seem to me that the problem is that the patch doesn't
update getSyscallStatusFromGuestState.  How this works is: the assembly code
in syscall-ppc64{le,be}-linux.S hands the syscall to the host, and when
it returns, it copies the result back into the guest state:

        /* put the result back in the threadstate  */
3:	std     3,OFFSET_ppc64_GPR3(30)     /* gst->GPR3 = sc result */
	/* copy cr0.so back to simulated state */
	mfcr    5                           /* r5 = CR               */
	rlwinm	5,5,4,31,31                 /* r5 = (CR >> 28) & 1   */
        stb     5,OFFSET_ppc64_CR0_0(30)    /* gst->CR0.SO = cr0.so  */

So far so good.  No problem there.  The problem occurs slightly later, when
the main driver logic in syswrap-main.c calls getSyscallStatusFromGuestState
to find out whether the syscall failed or not, in a target-independent way.
And I am guessing this needs to be updated too.  That currently computes
the result as

   canonical->sres = VG_(mk_SysRes_ppc64_linux)( gst->guest_GPR3, cr0so );

meaning, it is using the legacy (R3 + CR0.S0) scheme to figure out whether
the call failed, regardless of which scheme (sc or scv) the guest originally
requested.  Hence also, the syscall appears to fail or succeed depending on
what junk the kernel left for us in CR0.S0, hence the apparently random
failures that have been observed.

Fixing it would be pretty simple if getSyscallStatusFromGuestState can find
out for sure which syscall scheme was requested.  That is probably safest done
by passing it sci->orig_args.

Another complication (sigh) is that this only covers the cases where we
run the syscall async:

      if (sci->flags & SfMayBlock) {

         /* Syscall may block, so run it asynchronously */
         vki_sigset_t mask;

But this case will also need fixing

      } else {

         /* run the syscall directly */

and hence it looks like convert_SysRes_to_SyscallStatus will need fixing too.

Comment 26 Carl Love 2021-03-05 16:32:51 UTC

Julian:

Yes, that is correct, the return value for an error is negative.  The
condition code register is not used.  

On Fri, 2021-03-05 at 09:57 +0000, Julian Seward wrote:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.kde.org_show-5Fbug.cgi-3Fid-3D431157&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=RFEmMkZAk--_wFGN5tkM_A&m=LDVfEmUd-H_cdiL0yAJIDD4f9GlVpf4oXZSJe8xr5xU&s=xZmlKz03m2k_wUJhzUNSKgExgb7_Drdzz2_0WAG3yrg&e= 
> 
> --- Comment #23 from Julian Seward <jseward@acm.org> ---
> (In reply to Tom Hughes from comment #22)
> > So if thinks the open was a success with a negative file descriptor
> > returned?
> 
> Ah, maybe it's trying to tell us something :-)  I vaguely remember
> reading
> that this new syscall convention denotes failure by returning a value
> between
> -1 and -4096 [bizarrely, like x86-32 in the earliest days of
> Linux].  If that's
> true, it would imply that the call was passed to the kernel using the
> new 
> convention, it failed, but we mistakenly believe, in the post-syscall 
> handling,
> that it succeeded.
>

Comment 27 Carl Love 2021-03-25 18:01:42 UTC

Created attachment 137063 [details]
patch to add support for the scv instruction

Updated the scv instruction support patch.  The scv instruction requires Linux kernel  5.10.11 or newer.  The initial patch development was on an older kernel so wasn't actually exercising the scv paths in the original patch.  That is why Mark ran into issues with the patch.

One issue with the patch is the SC_FLAG and SCV_FLAG defines.  These #defines were created to set a flag indicating which system call instruction is to be used.  Unfortunately I could not find an include file that is visible in VEX as well as coregrind/m_syswrap and in coregrind/pub_core_syscall.h.   As a result, I had to add the defines for these values in three different places with a comment pointing at the other locations.  Rather ugly.  Wasn't sure if we wanted to create an include file for these that would then be globally available??  Suggestions on how to better was to make these defines more globally visible would be nice.

The patch has been tested on Power 8LE, Power 8BE, Power 9 (with scv support) and prototype ISA 3.1 hardware.

Comment 28 Carl Love 2021-04-16 16:29:45 UTC

Mark,

I was wondering if you have had a chance to try out the updated version of the scv patch.  The updated patch takes care of the synchronous and asynchronous call paths.  You do need to make sure you are testing with a kernel that is new enough to support scv and is not a partitioned system.  Currently scv is not supported on partitioned systems from what I understand.

Please let me know if see any issues with the updated patch.  Thanks.

Comment 29 Julian Seward 2021-05-04 09:29:36 UTC

(In reply to Carl Love from comment #27)
> Created attachment 137063 [details]
> patch to add support for the scv instruction

This looks OK to me.

> One issue with the patch is the SC_FLAG and SCV_FLAG defines.  These
> #defines were created to set a flag indicating which system call instruction
> is to be used.  Unfortunately I could not find an include file that is
> visible in VEX as well as coregrind/m_syswrap and in
> coregrind/pub_core_syscall.h.   As a result, I had to add the defines for
> these values in three different places with a comment pointing at the other
> locations.  Rather ugly.  Wasn't sure if we wanted to create an include file
> for these that would then be globally available??  Suggestions on how to
> better was to make these defines more globally visible would be nice.

Nothing better immediately springs to mind.

Comment 30 Carl Love 2021-05-04 20:35:55 UTC

patch committed

commit 8f69756a2215e396b93573c9568cab00cc6c116d (HEAD -> master, origin/master, origin/HEAD)
Author: Carl Love <cel@us.ibm.com>
Date:   Tue May 4 14:49:49 2021 -0500

    PPC64: add support for the vectored system call instruction scv.

Comment 31 Carl Love 2021-05-04 20:36:06 UTC

closing