Looking into the code I noticed that the ARMv8.1 LSE instructions are not supported so any code compiled with -mcpu=thunderx+lse crashes (well we ran into 369459 first but that is a different story). Reproducible: Always Steps to Reproduce: Compile glibc with -mcpu=thunderx+lse and run any resulting binary with this ld.so. We get a crash with unknown instructions
Created attachment 123537 [details] Patch implements and tests Arm v8.1 LSE atomics This patch implements ARMv8.1 memory atomics in VEX/priv/guest_arm64_toIR.c and tests in none/tests/arm64/atomics_v81* LDADD{,A}{,L}<sz> <Rs>, <Rt>, LDCLR{,A}{,L}<sz> <Rs>, <Rt>, LDEOR{,A}{,L}<sz> <Rs>, <Rt>, LDSET{,A}{,L}<sz> <Rs>, <Rt>, LDSMAX{,A}{,L}<sz> <Rs>, <Rt>, LDSMIN{,A}{,L}<sz> <Rs>, <Rt>, LDUMAX{,A}{,L}<sz> <Rs>, <Rt>, LDUMIN{,A}{,L}<sz> <Rs>, <Rt>, SWP{,A}{,L}<sz> <Rs>, <Rt> Version: valgrind-3.16.0.GIT (based on Git development head SHA 78054a6c0) OS: Linux The regression tests in none/tests/arm64 do not check for existence of hardware opcode features yet as the atomics implementations are emulated in order to accommodate execution on v8.0 CPUs.
Mostly this looks fine. Thanks for the attention to detail and for following the house style. My only big concern here is the lack of hwcaps support in Vex/Valgrind. That could be done in a followup bug, but it needs to happen fairly soon. I can provide guidance. Let me ask: are the 8.1 extensions a single all-or-nothing thing (meaning, they have to be all implemented, or none?) Or are they split up into smaller subsets of instructions, and each group of instructions can be implemented or not implemented? There are also some smaller points directly with the patch: > + && (INSN(15,15) == 0 || INSN(15,12) == BITS4(1,0,0,0)) I'd prefer this just to be INSN(15,12) <= BITS4(1,0,0,0) + stmt( IRStmt_Exit( + binop(Iop_CmpNE64, + widenUto64(ty, mkexpr(old)), + widenUto64(ty, mkexpr(orig))), + Ijk_Boring, nia, OFFB_PC )); You need to use CasCmpNE64, not CmpNE64 here. For the reason see libvex_ir.h defn of CasCmpNEXX. +# Does the C compiler support the armv81 features flag +# Note, this doesn't generate a C-level symbol. It generates a +# automake-level symbol (BUILD_ARMV81_TESTS), used in test Makefile.am's This is important, but it's not enough. We also need to check that the assembler can assemble the instructions. The way to do that is to change the test program (from "int main() { return 0; }") so that it has a bit of inline assembly containing one of the relevant instructions. For an example, see (eg) AC_MSG_CHECKING([for VSX support in the assembler ]) in that file.
Created attachment 123930 [details] Patch implements and tests Arm v8.1 LSE atomics (addresses first set of review comments) Thanks for reviewing. New patch attached with all but the Iop_CasCmpNE64 change, see why below. > My only big concern here is the lack of hwcaps support in Vex/Valgrind. > That could be done in a followup bug, but it needs to happen fairly > soon. Sorry, I'm not clear about this. Do you mean we need to add new IR primitives to Valgrind to support new hardware capabilities or to be able to detect the current hardware capabilities in order to determine if the subject binary can be executed? If it's the latter I've posted a patch for detecting AArch64 hardware capabilities to https://bugs.kde.org/show_bug.cgi?id=413547 > Let me ask: are the 8.1 extensions a single all-or-nothing thing > (meaning, they have to be all implemented, or none?) Or are they > split up into smaller subsets of instructions, and each group of > instructions can be implemented or > not implemented? It's the latter. They're split up into smaller subsets. I'm planning to post a patch for the v8.1 arithmetic instructions shortly, see https://bugs.kde.org/show_bug.cgi?id=413634 which will be the last patch for v8.1. > I'd prefer this just to be INSN(15,12) <= BITS4(1,0,0,0) Done. > We also need to check that the assembler can assemble the instructions. Done. > You need to use CasCmpNE64, not CmpNE64 here. I tried that but the first test fails with a vpanic("iselCondCode"): - - - snip CasCmpNE64(t21,t20) vex: the `impossible' happened: iselCondCode vex storage: T total 36020704 bytes allocated vex storage: P total 0 bytes allocated valgrind: the 'impossible' happened: LibVEX called failure_exit(). host stacktrace: at 0x........: show_sched_status_wrk (m_libcassert.c:406) by 0x........: report_and_quit (m_libcassert.c:477) by 0x........: vgPlain_core_panic_at (m_libcassert.c:553) by 0x........: vgPlain_core_panic (m_libcassert.c:563) by 0x........: failure_exit (m_translate.c:749) by 0x........: vpanic (main_util.c:253) by 0x........: iselCondCode_wrk (host_arm64_isel.c:1450) <--- by 0x........: iselStmt (host_arm64_isel.c:1293) by 0x........: iselSB_ARM64 (host_arm64_isel.c:4209) by 0x........: LibVEX_Translate (main_main.c:1099) by 0x........: vgPlain_translate (m_translate.c:1816) by 0x........: vgPlain_scheduler (scheduler.c:1138) by 0x........: run_a_thread_NORETURN (syswrap-linux.c:101) by 0x........: ??? . . . - - - snip It looks like the conditonal instruction selector, iselCondCode_wrk(), doesn't handle Iop_CasCmpNE64 at all! Is that missing functionality in iselCondCode_wrk() or is there a valid reason why it's intentionally not there? As to the reason for the change, does switching Iop_CmpNE64 to Iop_CasCmpNE64 mean that with Iop_CmpNE64 memcheck would not have instrumented these instructions correctly? The comment in libvex_ir.h seems to imply memcheck needs to know that there are two instances of the same representative value: /* Exactly like CmpEQ8/16/32/64, but carrying the additional hint that these compute the success/failure of a CAS operation, and hence are almost certainly applied to two copies of the same value, which in turn has implications for Memcheck's instrumentation. */ Presumably, memcheck then treats them as effectively a single value?
(In reply to ahashmi from comment #3) I'll reply about the hwcaps a bit later. Regarding CasCmpNE64: the underlying problem is like this. If the location is not contended (which is almost always the case), then the comparison will compare the same value against itself. If that value is undefined, and we use normal CmpNE64, then Memcheck will complain that we're taking a branch on undefined data, which in fact the program is correct. So it's a false positive. CasCmpNE64 is therefore defined to behave identically to CmpNE64, but provides a hint to Memcheck not to check the definedness of the operands. In that way, the false positive is avoid in the common case that the value is not contended. This is at the cost of not reporting the case where the value is contended and is also undefined. But it seems a small price to pay. > CasCmpNE64(t21,t20) > vex: the `impossible' happened: > iselCondCode Ah, I forgot about that. Given that, from a program-execution viewpoint, it is identical to CmpNE64, just add a "case CasCmpNE64:" at the obvious place in the insn selector.
Created attachment 123934 [details] Patch implements and tests Arm v8.1 LSE atomics (addresses second set of review comments) Thanks for the explanation. I think I understand that now. > Given that, from a program-execution viewpoint, it is identical to > CmpNE64, just add a "case CasCmpNE64:" at the obvious place in the > insn selector. Done, also for 32, 16 and 8 bits.
> > My only big concern here is the lack of hwcaps support in Vex/Valgrind. > > That could be done in a followup bug, but it needs to happen fairly > > soon. > > Sorry, I'm not clear about this. Do you mean we need to add new IR > primitives to Valgrind to support new hardware capabilities or to be > able to detect the current hardware capabilities in order to determine > if the subject binary can be executed? If it's the latter I've posted > a patch for detecting AArch64 hardware capabilities to > https://bugs.kde.org/show_bug.cgi?id=413547 The 413547 patch does the hw capabilities detection needed to decide which regtests to run, and it looks like you've successfully added the relevant bits for AArch64. But it doesn't do those tests for Valgrind itself, hence V has no way to know whether the host has (eg) 8.1 support and so whether guest_arm64_toIR.c should decode 8.1 instructions. So far you've gotten away with "down translating" 8.1 on the guest side to 8.0 on the host side. But that's not a long-term solution. What needs to happen is: * add VEX_HWCAPS_ARM64_whatever constants (see existing definitions) * add code in VG_(machine_get_hwcaps) to query the host's capabilities. The relevant capabilities description is eventually passed to disInstr_ARM64, which can use it to choose whether to decode or not decode an instruction. For the definitions of VEX_HWCAPS_ARM64_* that you want, I suggest looking at VEX_HWCAPS_AMD64_* as examples. But for actually detecting capabilities in VG_(machine_get_hwcaps), I suggest you use the signal-longjmp scheme that is used by the existing arm32 code. Given that your tests/arm64_features.c lists 40+ capability subsets for 8.1, I suggest you only add VEX_HWCAPS_ARM64_* for the ones you want to implement. Also, it might be worth considering whether it's possible to merge some of those classes -- maybe some of the features are implemented in groups? All of this should be done in a new patch. This patch (v8.1-a LSE instructions) looks ready to go. Finally .. I'm unclear what the top level set of capabilities you're aiming to implement, is. Could you please open a meta-bug which simply links to all the other bugs involved? So as to have a single "starting point".
> The 413547 patch does the hw capabilities detection needed to decide > which regtests to run, and it looks like you've successfully added the > relevant bits for AArch64. But it doesn't do those tests for Valgrind > itself, hence V has no way to know whether the host has (eg) 8.1 > support and so whether guest_arm64_toIR.c should decode 8.1 > instructions. So far you've gotten away with "down translating" 8.1 on > the guest side to 8.0 on the host side. But that's not a long-term > solution. That's fine. I agree that "down translating" to v8.0 is not a long-term solution but for now it's adequate. I'm assuming that in "down translating" or ("lowering" in the LLVM compiler's codegen terminology), we don't lose any functionality as far as tools are concerned, i.e. lowering IR for v8.0 code generation does not mask or disable any tools features AS LONG AS we lower each instruction correctly. > What needs to happen is: > * add VEX_HWCAPS_ARM64_whatever constants (see existing definitions) > * add code in VG_(machine_get_hwcaps) to query the host's capabilities. > The relevant capabilities description is eventually passed to > disInstr_ARM64, which can use it to choose whether to decode or not > decode an instruction. That's fine. I can't see any VEX_HWCAPS_ARM64_whatever just VEX_HWCAPS_ARM_whatever so these will be all new for AArch64. > For the definitions of VEX_HWCAPS_ARM64_* that you want, I suggest > looking at VEX_HWCAPS_AMD64_* as examples. But for actually detecting > capabilities in VG_(machine_get_hwcaps), I suggest you use the > signal-longjmp scheme that is used by the existing arm32 code. Using SIGILLs to detect features is fine but what about (possibly stupid question) using getauxval() for AArch64 in VG_(machine_get_hwcaps)? It's called once from valgrind_main() on startup. I notice for x86 (VGA_x86) cpuid is used. > Given that your tests/arm64_features.c lists 40+ capability subsets > for 8.1, I suggest you only add VEX_HWCAPS_ARM64_* for the ones you > want to implement. Also, it might be worth considering whether it's > possible to merge some of those classes -- maybe some of the features > are implemented in groups? Good idea. The features could be grouped by ISA version, e.g. v8.1 has atomics and asimdrdm (SQRDMLAH/SQRDMLSH) v8.2 has dcpop, SHA and dot product instructions > All of this should be done in a new patch. I've created https://bugs.kde.org/show_bug.cgi?id=414268 > This patch (v8.1-a LSE instructions) looks ready to go. Thanks! What's the next step to get it merged into the repo? Can we merge https://bugs.kde.org/show_bug.cgi?id=413547 as well? > Finally .. I'm unclear what the top level set of capabilities you're > aiming to implement, is. Could you please open a meta-bug which simply > links to all the other bugs involved? So as to have a single > "starting point". I've created https://bugs.kde.org/show_bug.cgi?id=414270 for v8.1 I'll create another meta-bug for v8.2 work when I start on that.
Feature identification via SIGILL is definitely not recommended for either 32-bit or 64-bit Arm. There are cases where it will give you the wrong answer (eg if the hardware supports the instruction but support is disabled by the kernel because of some erratum or because the kernel has had support compiled out). You have two recommended methods of detection: (1) hwcaps; or (2) for 64-bit only, you can directly access the ID registers via MRS and rely on the kernel to emulate this as documented in https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt. (You might want some kind of fallback for older kernels which don't support that; I forget when it came in.) PS: does valgrind implement 'emulate the kernel emulation of those ID registers' for an arm64 guest binary?
Thanks for the SIGILL warning. I forgot the erratum/kernel dependencies! I don't know how often that's an issue in practical terms but it's best to be safe. Alex Benee also pointed me to the MRS CPU features register method when reviewing https://bugs.kde.org/show_bug.cgi?id=413547 for the reason that you don't have to rely on the latest libc to ship if there's a lag in s/w support for new h/w. However, in recent practice it appears that by the time new Arm h/w becomes available, s/w support is up-to-date. Which brings me back to the question I asked in a previous posting: why not just use getauxval(), especially if you know the kernel, compiler and libraries support the features you're checking for and implementing.
I think we should move further discussion about h/w capabilities detection to https://bugs.kde.org/show_bug.cgi?id=414268
Landed, f1cf734554320e92f0dfc80125dfc2a7e5356ff2. Thanks for the patch!
*** Bug 409391 has been marked as a duplicate of this bug. ***