Bug 414268 - Enable AArch64 feature detection and decoding for v8.x instructions (where x>0)
Summary: Enable AArch64 feature detection and decoding for v8.x instructions (where x>0)
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.14.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: ahashmi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-18 16:20 UTC by ahashmi
Modified: 2022-02-02 09:53 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Patch adds arm64 feature detection for v8.x instructions (16.76 KB, text/plain)
2020-11-13 18:15 UTC, ahashmi
Details
Patch adds arm64 feature detection for v8.x instructions (16.01 KB, text/plain)
2020-11-26 12:41 UTC, ahashmi
Details
Adds arm64 feature detection for v8.x instructions (13.95 KB, text/plain)
2020-11-30 14:39 UTC, ahashmi
Details
Adds arm64 feature detection for v8.x instructions (13.98 KB, text/plain)
2020-12-08 18:32 UTC, ahashmi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ahashmi 2019-11-18 16:20:55 UTC
SUMMARY
There is no AArch64 HWCAPs detection in VG_(machine_get_hwcaps). This is required by disInstr_ARM64() to decode for v8.1, v8.2 etc hosts.

STEPS TO IMPLEMENT (See discussion in 369509)
- Add VEX_HWCAPS_ARM64_<feature(s)) constants.
- Add code in VG_(machine_get_hwcaps) to query the host's capabilities using signal-longjmp/SIGILL scheme as implemented by the existing AArch32 code.
- Host capabilities will be passed to disInstr_ARM64() which should use these to decide whether to decode v8.x instructions.
- Try and group VEX_HWCAPS_ARM64_ features by ISA version.
Comment 1 ahashmi 2019-11-19 11:43:45 UTC
See also comments in https://bugs.kde.org/show_bug.cgi?id=369509 about
h/w capabilities detection when deciding how to implement.
Comment 2 ahashmi 2020-11-13 18:15:05 UTC
Created attachment 133310 [details]
Patch adds arm64 feature detection for v8.x instructions

This patch adds hwcaps support for arm64. It uses the arm recommended method of hwcaps feature detection where a reliable auxilliary vector's AT_HWCAP may not be available. Instruction set attribute and feature registers are read to establish the v8 architecture level as well as optional and mandatory ISA features.

This patch can be tested with the patch which fixes https://bugs.kde.org/show_bug.cgi?id=413547

This patch must be committed before https://bugs.kde.org/show_bug.cgi?id=413547
Comment 3 Mark Wielaard 2020-11-23 14:57:26 UTC
There are two machines this seems to work correctly on:
Arch and hwcaps: ARM64, LittleEndian, v8.1-rdm-atomics
Arch and hwcaps: ARM64, LittleEndian, v8.2-dpcvap-rdm-atomics-fp16-vfp16

But one where I get Arch and hwcaps: ARM64, LittleEndian, v8.1 and then the impossible happens:

VEX: SQRDML{AH|SH} instructions are mandatory from v8.1 onwards.
     Found: v8.1
Cannot continue. Good-bye

This machine has 6 cores, but 2 are slightly different from the other:

$ cat /proc/cpuinfo 
processor	: 0
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 1
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 2
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd09
CPU revision	: 2

processor	: 3
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd09
CPU revision	: 2

processor	: 4
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd09
CPU revision	: 2

processor	: 5
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd09
CPU revision	: 2
Comment 4 Peter Maydell 2020-11-23 15:17:46 UTC
In the attachment of comment #2, this looks odd:

+     #define ID_AA64ISAR0_RDM_MASK             0x1

The ID register fields are all 4 bits wide, so looking at just the least significant bit will produce wrong answers if in future field value 2 is defined.

The right way to test fields in these ID registers is to look at the entire 4-bit number and test that it is greater than or equal to the value representing the feature you're looking for. (See D13.1.3 "Principles of the ID scheme for fields in ID registers" in the v8A Arm ARM.)

For A-profile my recommendation is as far as you possibly can to avoid having any "is this v8.1?" or "is this v8.2?" tests and instead to stick strictly to "is feature X present?". The architecture permits a v8.x implementation to include any arbitrary set of v8.(x+1) features as well, and pretty much everything has an ID register feature field you can check for, so it works better to just look at those. (FWIW QEMU's emulation does not need v8.1, v8.2, etc feature flags so I would be surprised if valgrind needed to do anything based on the architecture point version.)
Comment 5 ahashmi 2020-11-26 12:41:24 UTC
Created attachment 133666 [details]
Patch adds arm64 feature detection for v8.x instructions

Patch update after first set of review comments.
Comment 6 ahashmi 2020-11-26 12:44:32 UTC
Thanks for reviewing/testing Mark Wielaard and Peter Maydell.

> There are two machines this seems to work correctly on:
> Arch and hwcaps: ARM64, LittleEndian, v8.1-rdm-atomics
> Arch and hwcaps: ARM64, LittleEndian, v8.2-dpcvap-rdm-atomics-fp16-vfp16
> 
> But one where I get Arch and hwcaps: ARM64, LittleEndian, v8.1 and then the impossible happens:
> 
> VEX: SQRDML{AH|SH} instructions are mandatory from v8.1 onwards.
>      Found: v8.1
> Cannot continue. Good-bye
Fixed.
Just goes to show I need to test on a wider range of hardware before submitting!

> The right way to test fields in these ID registers is to look at the entire
> 4-bit number and test that it is greater than or equal to the value
> representing the feature you're looking for. (See D13.1.3 "Principles of the ID
> scheme for fields in ID registers" in the v8A Arm ARM.)
Fixed. Good spot, it was staring me in the face!

> For A-profile my recommendation is as far as you possibly can to avoid having
> any "is this v8.1?" or "is this v8.2?" tests and instead to stick strictly to
> "is feature X present?". The architecture permits a v8.x implementation to
I thought about this disconnect between version and supported feature set when implementing the initial patch and considered omitting versions altogether. I decided against it because I don't yet have enough experience with Valgrind, particularly the decode-to-IR logic and the tools pipeline. On the basis that the version is set when its mandatory option(s) are established and it may simplify things down the line, I prefer to leave this for now. If it transpires versions are redundant, I'll remove them at the earliest sign.
Comment 7 Peter Maydell 2020-11-26 14:06:18 UTC
Your logic attempting to identify v8.1, v8.2, etc isn't right. For instance:

+     /* Must be at least v8.2 if DC CVAP instruction available. */
+     if (have_dpbcvap)
+        SET_VEX_ARM64_ARCHLEVEL(vai.hwcaps, VEX_HWCAPS_ARM64_V82);

It's true that if you are v8.2 or better then DC CVAP is present, but this does not mean the converse (that if DC CVAP is present then you are v8.2). It is valid for a v8.1 implementation to provide this feature.

I still think it would be better not to try to identify and base anything on v8.1 vs v8.2 vs v8.3...chances are really high that if some code is using "is this v8.1 ?" as a condition on something then that code is wrong, because it should be checking for a more specific feature field (or it will behave incorrectly on hardware which provides that feature without necessarily being the v8.x version where that feature became mandatory, or whatever).
Comment 8 ahashmi 2020-11-30 14:39:37 UTC
Created attachment 133750 [details]
Adds arm64 feature detection for v8.x instructions

Removed v8.x version detection/setting. Now using feature list only.
Comment 9 ahashmi 2020-11-30 14:40:47 UTC
> It's true that if you are v8.2 or better then DC CVAP is present, but this does not mean the converse (that if DC CVAP is present then you are v8.2). It is valid for a v8.1 implementation to provide this feature.
That makes a difference to my decision. I made an incorrect assumption when reading the spec:
- - - snip
0b0000 DC CVAP not supported.
0b0001 DC CVAP supported.
0b0010 DC CVAP and DC CVADP supported.
...
In Armv8.2, the permitted values are 0b0001 and 0b0010.
From Armv8.5, the only permitted value is 0b0010
- - - snip

"In Armv8.2" != "From Armv8.2"

But some features are more tightly bound to a version, e.g. RDM
- - - snip
0b0000 No RDMA instructions implemented.
0b0001 SQRDMLAH and SQRDMLSH instructions implemented.

All other values are reserved.
From Armv8.1, the only permitted value is 0b0001.
- - - snip

So, Arm publishes a version specification with a set of mandatory and optional features. An implementer builds h/w which states conformance to a version. The implementer's version conformance statement is fairly meaningless without a list of features which are supported.

This answers the question: "why doesn't /proc/cpuinfo give an Arm version?"
Comment 10 Mark Wielaard 2020-12-03 15:19:41 UTC
OK with the latest patch the machine that crashed now simply says:
Arch and hwcaps: ARM64, LittleEndian, v8

It does allow reading ID_AA64ISAR0_EL1 register but doesn't seem to have any features we care for (from /proc/cpuinfo):
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

I know too little about arm64 to know whether or not we should care about these. I see we already filter on these in the auxv AT_HWCAP. Once we support, and not just test, some of these new features we should forget to add them to the AT_HWCAP filter.

Since programs can also check features through the attribute/feature registers, do we also need to filter those when inspected by the guest code? Or do we already do that?
Comment 11 ahashmi 2020-12-03 17:18:21 UTC
> I know too little about arm64 to know whether or not we should care about these.
The current approach assumes that all base v8 h/w has the features you list.
I've not seen any base v8 h/w which is different from that list.
AFAICT:
- We don't need to make any changes to Valgrind's existing base v8 support in terms of establishing and checking features.
- Time has shown that no bugs have been raised to do with lack of base v8 feature support.

> Since programs can also check features through the attribute/feature registers, do we also need to filter those when inspected by the guest code?
Hmm...I don't see why we should. We aim to faithfully support features detected, so if guest code reads those it should expect Valgrind to support it. If Valgrind doesn't, it means we haven't yet implemented support for that feature. (If I understand your question correctly.)
Comment 12 Mark Wielaard 2020-12-03 17:38:49 UTC
(In reply to ahashmi from comment #11)
> > Since programs can also check features through the attribute/feature registers, do we also need to filter those when inspected by the guest code?
> Hmm...I don't see why we should. We aim to faithfully support features
> detected, so if guest code reads those it should expect Valgrind to support
> it. If Valgrind doesn't, it means we haven't yet implemented support for
> that feature. (If I understand your question correctly.)

So the question is not about this particular patch. But how we signal to the guest process which features valgrind/VEX supports itself. If the guest code checks the auxv HW_CAPS we are good because we filter out any features not implemented. But if the guest code checks attribute/feature registers it might get the attributes/features supported by the host. But if valgrind/VEX doesn't support those and the guest tries to use them it will get a SIGILL instead of trying to use fallback code.

So it seems to me we also need a way to filter the attribute/feature registers to only advertise those armv8 instructions we actually implement. That is what other arches do too (see for example the dirtyhelper_CPUID in guest_x86_helpers.c or the s390x_dirtyhelper_vec_op).

IMHO we should only pass through any bits set in these registers if we actually have implemented them (and preemptively zero out any future reserved ones).
Comment 13 ahashmi 2020-12-07 18:05:47 UTC
> So it seems to me we also need a way to filter the attribute/feature registers to only advertise those armv8 instructions we actually implement. That is what other arches do too (see for example the dirtyhelper_CPUID in guest_x86_helpers.c or the s390x_dirtyhelper_vec_op).
ISWYM now, thanks for shining a light on an aspect of Valgrind's transparency I hadn't thought about.

I would prefer to implement such a guest feature support view/filter as a separate patch.

I've created https://bugs.kde.org/show_bug.cgi?id=430117 to track this change.
Comment 14 Julian Seward 2020-12-08 17:33:27 UTC
Thank you all for the high quality discussion and analysis.  This looks
fine to me; please land.

One tiny nit .. please #undef get_cpu_ftr and get_ftr just before the
closing brace of the fn that defines them.

I agree with Peter's and Mark's view that this should be feature-based;
but I see the difficulties in making that work cleanly because (unlike
on x86) there's no "single source of truth" (viz, CPUID) that we can fake
up for the guest, at our own convenience.
Comment 15 Peter Maydell 2020-12-08 18:19:52 UTC
The closest thing to single-source-of-truth is the architectural ID registers, which these days a Linux kernel will make available to userspace via trap-and-emulate and which presumably Valgrind does or should provide to its guest: https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.rst
There's a hwcap that tells you if those are emulated for you.
Comment 16 ahashmi 2020-12-08 18:32:47 UTC
Created attachment 133936 [details]
Adds arm64 feature detection for v8.x instructions

#undef get_cpu_ftr and get_ftr after use.
Comment 17 ahashmi 2020-12-08 18:33:37 UTC
> One tiny nit .. please #undef get_cpu_ftr and get_ftr just before the
> closing brace of the fn that defines them.
Good spot! Fixed in latest patch.
Comment 18 Julian Seward 2020-12-09 12:00:34 UTC
Landed, cb52fee5ddbc2c0e936fd1efe5107a1afcf375cf.