Bug 369509 - ARMv8.1-a LSE instructions are not supported
Summary: ARMv8.1-a LSE instructions are not supported
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.14.0
Platform: Compiled Sources Linux
: NOR grave
Target Milestone: ---
Assignee: ahashmi
URL:
Keywords:
: 409391 (view as bug list)
Depends on:
Blocks: 414270
  Show dependency treegraph
 
Reported: 2016-09-29 04:54 UTC by Andrew Pinski
Modified: 2021-07-27 18:00 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Patch implements and tests Arm v8.1 LSE atomics (92.55 KB, text/plain)
2019-10-28 11:28 UTC, ahashmi
Details
Patch implements and tests Arm v8.1 LSE atomics (addresses first set of review comments) (92.66 KB, text/plain)
2019-11-14 15:48 UTC, ahashmi
Details
Patch implements and tests Arm v8.1 LSE atomics (addresses second set of review comments) (96.21 KB, patch)
2019-11-14 18:18 UTC, ahashmi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2016-09-29 04:54:13 UTC
Looking into the code I noticed that the ARMv8.1 LSE instructions are not supported so any code compiled with -mcpu=thunderx+lse crashes (well we ran into 369459 first  but that is a different story).

Reproducible: Always

Steps to Reproduce:
Compile glibc with -mcpu=thunderx+lse and run any resulting binary with this ld.so.  We get a crash with unknown instructions
Comment 1 ahashmi 2019-10-28 11:28:47 UTC
Created attachment 123537 [details]
Patch implements and tests Arm v8.1 LSE atomics

This patch implements ARMv8.1 memory atomics in VEX/priv/guest_arm64_toIR.c and tests in none/tests/arm64/atomics_v81*
LDADD{,A}{,L}<sz>  <Rs>, <Rt>,
LDCLR{,A}{,L}<sz>  <Rs>, <Rt>,
LDEOR{,A}{,L}<sz>  <Rs>, <Rt>,
LDSET{,A}{,L}<sz>  <Rs>, <Rt>,
LDSMAX{,A}{,L}<sz> <Rs>, <Rt>,
LDSMIN{,A}{,L}<sz> <Rs>, <Rt>,
LDUMAX{,A}{,L}<sz> <Rs>, <Rt>,
LDUMIN{,A}{,L}<sz> <Rs>, <Rt>,
SWP{,A}{,L}<sz>    <Rs>, <Rt>

Version:           valgrind-3.16.0.GIT (based on Git development head SHA 78054a6c0)
OS:                Linux

The regression tests in none/tests/arm64 do not check for existence of hardware opcode features yet as the atomics implementations are emulated in order to accommodate execution on v8.0 CPUs.
Comment 2 Julian Seward 2019-11-14 08:29:37 UTC
Mostly this looks fine.  Thanks for the attention to detail and for following
the house style.

My only big concern here is the lack of hwcaps support in Vex/Valgrind.  That
could be done in a followup bug, but it needs to happen fairly soon.  I can
provide guidance.  Let me ask: are the 8.1 extensions a single all-or-nothing
thing (meaning, they have to be all implemented, or none?)  Or are they split
up into smaller subsets of instructions, and each group of instructions can be
implemented or not implemented?

There are also some smaller points directly with the patch:

> +       && (INSN(15,15) == 0 || INSN(15,12) == BITS4(1,0,0,0))

I'd prefer this just to be  INSN(15,12) <= BITS4(1,0,0,0)

+      stmt( IRStmt_Exit(
+                binop(Iop_CmpNE64,
+                      widenUto64(ty, mkexpr(old)),
+                      widenUto64(ty, mkexpr(orig))),
+                Ijk_Boring, nia, OFFB_PC ));

You need to use CasCmpNE64, not CmpNE64 here.  For the reason see libvex_ir.h
defn of CasCmpNEXX.

+# Does the C compiler support the armv81 features flag
+# Note, this doesn't generate a C-level symbol.  It generates a
+# automake-level symbol (BUILD_ARMV81_TESTS), used in test Makefile.am's

This is important, but it's not enough.  We also need to check that the
assembler can assemble the instructions.  The way to do that is to change the
test program (from "int main() { return 0; }") so that it has a bit of inline
assembly containing one of the relevant instructions.  For an example, see
(eg)

AC_MSG_CHECKING([for VSX support in the assembler ])

in that file.
Comment 3 ahashmi 2019-11-14 15:48:30 UTC
Created attachment 123930 [details]
Patch implements and tests Arm v8.1 LSE atomics (addresses first set of review comments)

Thanks for reviewing. New patch attached with all but the Iop_CasCmpNE64
change, see why below.

> My only big concern here is the lack of hwcaps support in Vex/Valgrind.
> That could be done in a followup bug, but it needs to happen fairly
> soon.
Sorry, I'm not clear about this. Do you mean we need to add new IR
primitives to Valgrind to support new hardware capabilities or to be
able to detect the current hardware capabilities in order to determine
if the subject binary can be executed? If it's the latter I've posted
a patch for detecting AArch64 hardware capabilities to
https://bugs.kde.org/show_bug.cgi?id=413547

> Let me ask: are the 8.1 extensions a single all-or-nothing thing
> (meaning, they have to be all implemented, or none?)  Or are they
> split up into smaller subsets of instructions, and each group of
> instructions can be implemented or > not implemented?
It's the latter. They're split up into smaller subsets. I'm planning
to post a patch for the v8.1 arithmetic instructions shortly, see
https://bugs.kde.org/show_bug.cgi?id=413634 which will be the last
patch for v8.1.

> I'd prefer this just to be  INSN(15,12) <= BITS4(1,0,0,0)
Done.

> We also need to check that the assembler can assemble the instructions.
Done.

> You need to use CasCmpNE64, not CmpNE64 here.
I tried that but the first test fails with a vpanic("iselCondCode"):
- - - snip
CasCmpNE64(t21,t20)
vex: the `impossible' happened:
   iselCondCode
vex storage: T total 36020704 bytes allocated
vex storage: P total 0 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
   at 0x........: show_sched_status_wrk (m_libcassert.c:406)
   by 0x........: report_and_quit (m_libcassert.c:477)
   by 0x........: vgPlain_core_panic_at (m_libcassert.c:553)
   by 0x........: vgPlain_core_panic (m_libcassert.c:563)
   by 0x........: failure_exit (m_translate.c:749)
   by 0x........: vpanic (main_util.c:253)
   by 0x........: iselCondCode_wrk (host_arm64_isel.c:1450)    <---
   by 0x........: iselStmt (host_arm64_isel.c:1293)
   by 0x........: iselSB_ARM64 (host_arm64_isel.c:4209)
   by 0x........: LibVEX_Translate (main_main.c:1099)
   by 0x........: vgPlain_translate (m_translate.c:1816)
   by 0x........: vgPlain_scheduler (scheduler.c:1138)
   by 0x........: run_a_thread_NORETURN (syswrap-linux.c:101)
   by 0x........: ???
. . .
- - - snip

It looks like the conditonal instruction selector, iselCondCode_wrk(),
doesn't handle Iop_CasCmpNE64 at all! Is that missing functionality in
iselCondCode_wrk() or is there a valid reason why it's intentionally not
there?

As to the reason for the change, does switching Iop_CmpNE64 to
Iop_CasCmpNE64 mean that with Iop_CmpNE64 memcheck would not have
instrumented these instructions correctly? The comment in libvex_ir.h
seems to imply memcheck needs to know that there are two instances of
the same representative value:

      /* Exactly like CmpEQ8/16/32/64, but carrying the additional
         hint that these compute the success/failure of a CAS
         operation, and hence are almost certainly applied to two
         copies of the same value, which in turn has implications for
         Memcheck's instrumentation. */

Presumably, memcheck then treats them as effectively a single value?
Comment 4 Julian Seward 2019-11-14 16:19:27 UTC
(In reply to ahashmi from comment #3)

I'll reply about the hwcaps a bit later.

Regarding CasCmpNE64: the underlying problem is like this.  If the
location is not contended (which is almost always the case), then
the comparison will compare the same value against itself.  If that
value is undefined, and we use normal CmpNE64, then Memcheck will
complain that we're taking a branch on undefined data, which in fact
the program is correct.  So it's a false positive.

CasCmpNE64 is therefore defined to behave identically to CmpNE64, but
provides a hint to Memcheck not to check the definedness of the operands.
In that way, the false positive is avoid in the common case that the
value is not contended.  This is at the cost of not reporting the case
where the value is contended and is also undefined.  But it seems a
small price to pay.


> CasCmpNE64(t21,t20)
> vex: the `impossible' happened:
>    iselCondCode

Ah, I forgot about that.  Given that, from a program-execution 
viewpoint, it is identical to CmpNE64, just add a "case CasCmpNE64:"
at the obvious place in the insn selector.
Comment 5 ahashmi 2019-11-14 18:18:58 UTC
Created attachment 123934 [details]
Patch implements and tests Arm v8.1 LSE atomics (addresses second set of review comments)

Thanks for the explanation. I think I understand that now.

> Given that, from a program-execution viewpoint, it is identical to
> CmpNE64, just add a "case CasCmpNE64:" at the obvious place in the
> insn selector.
Done, also for 32, 16 and 8 bits.
Comment 6 Julian Seward 2019-11-18 07:39:20 UTC
> > My only big concern here is the lack of hwcaps support in Vex/Valgrind.
> > That could be done in a followup bug, but it needs to happen fairly
> > soon.
>
> Sorry, I'm not clear about this. Do you mean we need to add new IR
> primitives to Valgrind to support new hardware capabilities or to be
> able to detect the current hardware capabilities in order to determine
> if the subject binary can be executed? If it's the latter I've posted
> a patch for detecting AArch64 hardware capabilities to
> https://bugs.kde.org/show_bug.cgi?id=413547

The 413547 patch does the hw capabilities detection needed to decide which
regtests to run, and it looks like you've successfully added the relevant bits
for AArch64.  But it doesn't do those tests for Valgrind itself, hence V has
no way to know whether the host has (eg) 8.1 support and so whether
guest_arm64_toIR.c should decode 8.1 instructions.  So far you've gotten away
with "down translating" 8.1 on the guest side to 8.0 on the host side.  But
that's not a long-term solution.

What needs to happen is:

* add VEX_HWCAPS_ARM64_whatever constants (see existing definitions)

* add code in VG_(machine_get_hwcaps) to query the host's capabilities.

The relevant capabilities description is eventually passed to disInstr_ARM64,
which can use it to choose whether to decode or not decode an instruction.

For the definitions of VEX_HWCAPS_ARM64_* that you want, I suggest looking at
VEX_HWCAPS_AMD64_* as examples.  But for actually detecting capabilities in
VG_(machine_get_hwcaps), I suggest you use the signal-longjmp scheme that is
used by the existing arm32 code.

Given that your tests/arm64_features.c lists 40+ capability subsets for 8.1, I
suggest you only add VEX_HWCAPS_ARM64_* for the ones you want to implement.
Also, it might be worth considering whether it's possible to merge some of
those classes -- maybe some of the features are implemented in groups?

All of this should be done in a new patch.  This patch (v8.1-a LSE
instructions) looks ready to go.

Finally .. I'm unclear what the top level set of capabilities you're aiming to
implement, is.  Could you please open a meta-bug which simply links to all the
other bugs involved?  So as to have a single "starting point".
Comment 7 ahashmi 2019-11-18 17:30:18 UTC
> The 413547 patch does the hw capabilities detection needed to decide
> which regtests to run, and it looks like you've successfully added the
> relevant bits for AArch64. But it doesn't do those tests for Valgrind
> itself, hence V has no way to know whether the host has (eg) 8.1
> support and so whether guest_arm64_toIR.c should decode 8.1
> instructions. So far you've gotten away with "down translating" 8.1 on
> the guest side to 8.0 on the host side. But that's not a long-term
> solution.
That's fine. I agree that "down translating" to v8.0 is not a long-term
solution but for now it's adequate.

I'm assuming that in "down translating" or ("lowering" in the LLVM
compiler's codegen terminology), we don't lose any functionality as far
as tools are concerned, i.e. lowering IR for v8.0 code generation does
not mask or disable any tools features AS LONG AS we lower each
instruction correctly.

> What needs to happen is:
> * add VEX_HWCAPS_ARM64_whatever constants (see existing definitions)
> * add code in VG_(machine_get_hwcaps) to query the host's capabilities.
> The relevant capabilities description is eventually passed to
> disInstr_ARM64, which can use it to choose whether to decode or not
> decode an instruction.
That's fine. I can't see any VEX_HWCAPS_ARM64_whatever just
VEX_HWCAPS_ARM_whatever so these will be all new for AArch64.

> For the definitions of VEX_HWCAPS_ARM64_* that you want, I suggest
> looking at VEX_HWCAPS_AMD64_* as examples. But for actually detecting
> capabilities in VG_(machine_get_hwcaps), I suggest you use the
> signal-longjmp scheme that is used by the existing arm32 code.
Using SIGILLs to detect features is fine but what about (possibly stupid
question) using getauxval() for AArch64 in VG_(machine_get_hwcaps)? It's
called once from valgrind_main() on startup. I notice for x86 (VGA_x86)
cpuid is used.

> Given that your tests/arm64_features.c lists 40+ capability subsets
> for 8.1, I suggest you only add VEX_HWCAPS_ARM64_* for the ones you
> want to implement. Also, it might be worth considering whether it's
> possible to merge some of those classes -- maybe some of the features
> are implemented in groups?
Good idea. The features could be grouped by ISA version, e.g.
v8.1 has atomics and asimdrdm (SQRDMLAH/SQRDMLSH)
v8.2 has dcpop, SHA and dot product instructions

> All of this should be done in a new patch.
I've created https://bugs.kde.org/show_bug.cgi?id=414268

> This patch (v8.1-a LSE instructions) looks ready to go.
Thanks! What's the next step to get it merged into the repo?
Can we merge https://bugs.kde.org/show_bug.cgi?id=413547 as well?

> Finally .. I'm unclear what the top level set of capabilities you're
> aiming to implement, is. Could you please open a meta-bug which simply
> links to all the other bugs involved? So as to have a single
> "starting point".
I've created https://bugs.kde.org/show_bug.cgi?id=414270 for v8.1
I'll create another meta-bug for v8.2 work when I start on that.
Comment 8 Peter Maydell 2019-11-18 17:36:20 UTC
Feature identification via SIGILL is definitely not recommended for either 32-bit or 64-bit Arm. There are cases where it will give you the wrong answer (eg if the hardware supports the instruction but support is disabled by the kernel because of some erratum or because the kernel has had support compiled out). You have two recommended methods of detection: (1) hwcaps; or (2) for 64-bit only, you can directly access the ID registers via MRS and rely on the kernel to emulate this as documented in https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt. (You might want some kind of fallback for older kernels which don't support that; I forget when it came in.)

PS: does valgrind implement 'emulate the kernel emulation of those ID registers' for an arm64 guest binary?
Comment 9 ahashmi 2019-11-19 11:38:26 UTC
Thanks for the SIGILL warning. I forgot the erratum/kernel dependencies!
I don't know how often that's an issue in practical terms but it's best
to be safe.

Alex Benee also pointed me to the MRS CPU features register method when
reviewing https://bugs.kde.org/show_bug.cgi?id=413547 for the reason
that you don't have to rely on the latest libc to ship if there's a lag
in s/w support for new h/w. However, in recent practice it appears that
by the time new Arm h/w becomes available, s/w support is up-to-date.

Which brings me back to the question I asked in a previous posting: why
not just use getauxval(), especially if you know the kernel, compiler
and libraries support the features you're checking for and implementing.
Comment 10 ahashmi 2019-11-19 11:41:02 UTC
I think we should move further discussion about h/w capabilities
detection to https://bugs.kde.org/show_bug.cgi?id=414268
Comment 11 Julian Seward 2019-11-20 11:50:08 UTC
Landed, f1cf734554320e92f0dfc80125dfc2a7e5356ff2.  Thanks for the patch!
Comment 12 Petr Pavlu 2021-07-27 18:00:20 UTC
*** Bug 409391 has been marked as a duplicate of this bug. ***