Bug 444399 - disInstr(arm64): unhandled instruction 0xC87F2D89 (LD{,A}XP and ST{,L}XP)
Summary: disInstr(arm64): unhandled instruction 0xC87F2D89 (LD{,A}XP and ST{,L}XP)
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 434283 (view as bug list)
Depends on: 445354
Blocks:
  Show dependency treegraph
 
Reported: 2021-10-26 00:07 UTC by Felix Klock
Modified: 2022-07-05 10:25 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
WIP patch that will possibly get you back on the road. DO NOT LAND. (37.91 KB, patch)
2021-11-08 07:23 UTC, Julian Seward
Details
Final proposed patch (55.84 KB, patch)
2021-11-12 08:01 UTC, Julian Seward
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Klock 2021-10-26 00:07:23 UTC
SUMMARY

I tried to run valgrind on the following Rust program on AArch64:

```rust
fn main() { let _n = std::time::Instant::now(); }
```

I ran `valgrind` with no flags, just ` /usr/local/bin/valgrind ./target/debug/instant`, and got the error:

```
/usr/local/bin/valgrind ./target/debug/instant
==16560== Memcheck, a memory error detector
==16560== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==16560== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==16560== Command: ./target/debug/instant
==16560==
ARM64 front end: load_store
disInstr(arm64): unhandled instruction 0xC87F2D89
disInstr(arm64): 1100'1000 0111'1111 0010'1101 1000'1001
==16560== valgrind: Unrecognised instruction at address 0x11ffa8.
==16560==    at 0x11FFA8: std::time::Instant::now (atomic.rs:2574)
==16560==    by 0x10EB7B: instant::main (main.rs:2)
==16560==    by 0x10ECA3: core::ops::function::FnOnce::call_once (function.rs:227)
==16560==    by 0x10EC23: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125)
==16560==    by 0x10ED83: std::rt::lang_start::{{closure}} (rt.rs:63)
==16560==    by 0x12219F: std::rt::lang_start_internal (function.rs:259)
==16560==    by 0x10ED4B: std::rt::lang_start (rt.rs:62)
==16560==    by 0x10EBB7: main (in /local/home/pnkfelix/instant/target/debug/instant)
==16560== Your program just tried to execute an instruction that Valgrind
==16560== did not recognise.  There are two possible reasons for this.
==16560== 1. Your program has a bug and erroneously jumped to a non-code
==16560==    location.  If you are running Memcheck and you just saw a
==16560==    warning about a bad jump, it's probably your program's fault.
==16560== 2. The instruction is legitimate but Valgrind doesn't handle it,
==16560==    i.e. it's Valgrind's fault.  If you think this is the case or
==16560==    you are not sure, please let us know and we'll try to fix it.
==16560== Either way, Valgrind will now raise a SIGILL signal which will
==16560== probably kill your program.

```

OBSERVED RESULT
A disInstr failure


EXPECTED RESULT
Program runs with no instructions unhandled.

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
Comment 1 Felix Klock 2021-10-26 00:08:18 UTC
Oh, this is probably a duplicate of https://bugs.kde.org/show_bug.cgi?id=434283 ?
Comment 2 Julian Seward 2021-10-27 16:31:53 UTC
I can't reproduce this, testing on Fedora 33 on Parallels Workstation
running on an M1 Mac Mini, with either rustc-1.55 or rustc-1.56.

I suspect this is some kind of hardware capabilities problem, in that
rustc-generated code is using instructions that V doesn't claim to
support.  From irc I see that the hardware you used here was a 
"Graviton 2", which has Neoverse N1 cores, and they support AArch64-v8.4.

The M1 is AArch64-v8.6 I think, although which of those extensions are
available within Parallels I don't know.  That said, I'm still surprised
it fails for you, given that it doesn't fail here, *and* given that V
doesn't even fully support v8.2 of the instruction set.

The output of /usr/bin/lscpu for F33-on-Parallels-on-M1 are below.
Can you show the output on the failing target?

Architecture:                    aarch64
CPU op-mode(s):                  64-bit
Byte Order:                      Little Endian
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              1
Core(s) per socket:              8
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       ARM
Model:                           0
Stepping:                        r0p0
BogoMIPS:                        48.00
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp 
                                 cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdf
                                 hm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
Comment 3 Hans Kratz 2021-10-28 11:58:00 UTC
This seems to be bug in valgrind. Support for some ld*p instructions (which are ARMv8.0) is not implemented. Simple reproducer in C++ for stxp:
--- snip ---
#include <atomic>

int main() {
    std::atomic<__int128_t> x;
    x.store(23, std::memory_order_relaxed);
}
--- snip ---

$ clang++-12 main.cxx -o main
$ valgrind ./main
==25164== Memcheck, a memory error detector
==25164== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==25164== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==25164== Command: ./main
==25164==
ARM64 front end: load_store
disInstr(arm64): unhandled instruction 0xC87F3168
disInstr(arm64): 1100'1000 0111'1111 0011'0001 0110'1000
==25164== valgrind: Unrecognised instruction at address 0x400690.
==25164==    at 0x400690: std::atomic<__int128>::store(__int128, std::memory_order) (in /data/dev/clangtest/main)
==25164==    by 0x4005F7: main (in /data/dev/clangtest/main)
==25164== Your program just tried to execute an instruction that Valgrind
==25164== did not recognise.  There are two possible reasons for this.
==25164== 1. Your program has a bug and erroneously jumped to a non-code
==25164==    location.  If you are running Memcheck and you just saw a
==25164==    warning about a bad jump, it's probably your program's fault.
==25164== 2. The instruction is legitimate but Valgrind doesn't handle it,
==25164==    i.e. it's Valgrind's fault.  If you think this is the case or
==25164==    you are not sure, please let us know and we'll try to fix it.
==25164== Either way, Valgrind will now raise a SIGILL signal which will
==25164== probably kill your program.
==25164==
==25164== Process terminating with default action of signal 4 (SIGILL)
==25164==  Illegal opcode at address 0x400690
==25164==    at 0x400690: std::atomic<__int128>::store(__int128, std::memory_order) (in /data/dev/clangtest/main)
==25164==    by 0x4005F7: main (in /data/dev/clangtest/main)
==25164==
==25164== HEAP SUMMARY:
==25164==     in use at exit: 0 bytes in 0 blocks
==25164==   total heap usage: 1 allocs, 1 frees, 72,704 bytes allocated
==25164==
==25164== All heap blocks were freed -- no leaks are possible
==25164==
==25164== For lists of detected and suppressed errors, rerun with: -s
==25164== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction (core dumped)
Comment 4 Felix Klock 2021-10-28 20:58:02 UTC
(In reply to Julian Seward from comment #2)
> The output of /usr/bin/lscpu for F33-on-Parallels-on-M1 are below.
> Can you show the output on the failing target?

$ /usr/bin/lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          32
On-line CPU(s) list:             0-31
Thread(s) per core:              1
Core(s) per socket:              32
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       ARM
Model:                           1
Model name:                      Neoverse-N1
Stepping:                        r3p1
BogoMIPS:                        243.75
L1d cache:                       2 MiB
L1i cache:                       2 MiB
L2 cache:                        32 MiB
L3 cache:                        32 MiB
NUMA node0 CPU(s):               0-31
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
Comment 5 Julian Seward 2021-11-02 08:18:58 UTC
I looked into this a bit.  It does indeed appear that LD{A}XP and ST{L}XP
exist in AArch64 8.0 but are not implemented in V.  I am somewhat surprised by
this since I distinctly remember carefully making a list of all instructions
that needed to be implemented, when doing the initial AArch64 port, so I'm not
sure how these got forgotten.

I will fix it, but it may not be an immediate fix.  VEX's intermediate
representation has a way to represent doubleword CAS, but can only represent
single word load-exclusive / store-check, so it will need to be extended
accordingly, and that may have some minor knock-on effect on other
architectures.

I would guess that the immediate cause of the failure is that LLVM 12 has
started generating these instructions.  That would explain why rustc shows the
problem in comment 0 -- presumably that is rustc nightly -- and also why
clang++ 12 shows the problem in comment 3.
Comment 6 Julian Seward 2021-11-08 07:16:35 UTC
*** Bug 434283 has been marked as a duplicate of this bug. ***
Comment 7 Julian Seward 2021-11-08 07:23:19 UTC
Created attachment 143328 [details]
WIP patch that will possibly get you back on the road.  DO NOT LAND.

Fixing this is a whole trip because the various IR and arm64 frameworks
were not really designed to accommodate it.  Anyways, here is a WIP 
patch.  It seems to work for simple tests (in the patch) but is not fully
tested.  It will not work if you run with `--sim-hints=fallback-llsc` or if the
fallback LL/SC implementation is auto-selected, based on your processor,
at startup.  It applies against the head and also against a vanilla 3.18.1
tarball, although I haven't tested it in the latter case.

If anyone wants to test it, and let me know if works, that would be 
appreciated.  I will try to finish it up properly this coming week.
Comment 8 Julian Seward 2021-11-12 08:01:47 UTC
Created attachment 143474 [details]
Final proposed patch

Final proposed patch.  This includes the fix for blocking bug 445354, 
which is small and which I will land separately and first.
Comment 9 Julian Seward 2021-11-12 11:42:37 UTC
Landed, 530df882b8f60ecacaf2b9b8a719f7ea1c1d1650.  I think it's
OK, and the patch contains test cases, but .. please test.