Summary: | Enable AArch64 feature detection and decoding for v8.x instructions (where x>0) | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | ahashmi <assad.hashmi> |
Component: | vex | Assignee: | ahashmi <assad.hashmi> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | assad.hashmi, fweimer, jseward, mark, peter.maydell |
Priority: | NOR | ||
Version: | 3.14.0 | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
See Also: | https://bugs.kde.org/show_bug.cgi?id=413547 | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Patch adds arm64 feature detection for v8.x instructions
Patch adds arm64 feature detection for v8.x instructions Adds arm64 feature detection for v8.x instructions Adds arm64 feature detection for v8.x instructions |
Description
ahashmi
2019-11-18 16:20:55 UTC
See also comments in https://bugs.kde.org/show_bug.cgi?id=369509 about h/w capabilities detection when deciding how to implement. Created attachment 133310 [details] Patch adds arm64 feature detection for v8.x instructions This patch adds hwcaps support for arm64. It uses the arm recommended method of hwcaps feature detection where a reliable auxilliary vector's AT_HWCAP may not be available. Instruction set attribute and feature registers are read to establish the v8 architecture level as well as optional and mandatory ISA features. This patch can be tested with the patch which fixes https://bugs.kde.org/show_bug.cgi?id=413547 This patch must be committed before https://bugs.kde.org/show_bug.cgi?id=413547 There are two machines this seems to work correctly on: Arch and hwcaps: ARM64, LittleEndian, v8.1-rdm-atomics Arch and hwcaps: ARM64, LittleEndian, v8.2-dpcvap-rdm-atomics-fp16-vfp16 But one where I get Arch and hwcaps: ARM64, LittleEndian, v8.1 and then the impossible happens: VEX: SQRDML{AH|SH} instructions are mandatory from v8.1 onwards. Found: v8.1 Cannot continue. Good-bye This machine has 6 cores, but 2 are slightly different from the other: $ cat /proc/cpuinfo processor : 0 BogoMIPS : 48.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 1 BogoMIPS : 48.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 processor : 2 BogoMIPS : 48.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd09 CPU revision : 2 processor : 3 BogoMIPS : 48.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd09 CPU revision : 2 processor : 4 BogoMIPS : 48.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd09 CPU revision : 2 processor : 5 BogoMIPS : 48.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd09 CPU revision : 2 In the attachment of comment #2, this looks odd: + #define ID_AA64ISAR0_RDM_MASK 0x1 The ID register fields are all 4 bits wide, so looking at just the least significant bit will produce wrong answers if in future field value 2 is defined. The right way to test fields in these ID registers is to look at the entire 4-bit number and test that it is greater than or equal to the value representing the feature you're looking for. (See D13.1.3 "Principles of the ID scheme for fields in ID registers" in the v8A Arm ARM.) For A-profile my recommendation is as far as you possibly can to avoid having any "is this v8.1?" or "is this v8.2?" tests and instead to stick strictly to "is feature X present?". The architecture permits a v8.x implementation to include any arbitrary set of v8.(x+1) features as well, and pretty much everything has an ID register feature field you can check for, so it works better to just look at those. (FWIW QEMU's emulation does not need v8.1, v8.2, etc feature flags so I would be surprised if valgrind needed to do anything based on the architecture point version.) Created attachment 133666 [details]
Patch adds arm64 feature detection for v8.x instructions
Patch update after first set of review comments.
Thanks for reviewing/testing Mark Wielaard and Peter Maydell. > There are two machines this seems to work correctly on: > Arch and hwcaps: ARM64, LittleEndian, v8.1-rdm-atomics > Arch and hwcaps: ARM64, LittleEndian, v8.2-dpcvap-rdm-atomics-fp16-vfp16 > > But one where I get Arch and hwcaps: ARM64, LittleEndian, v8.1 and then the impossible happens: > > VEX: SQRDML{AH|SH} instructions are mandatory from v8.1 onwards. > Found: v8.1 > Cannot continue. Good-bye Fixed. Just goes to show I need to test on a wider range of hardware before submitting! > The right way to test fields in these ID registers is to look at the entire > 4-bit number and test that it is greater than or equal to the value > representing the feature you're looking for. (See D13.1.3 "Principles of the ID > scheme for fields in ID registers" in the v8A Arm ARM.) Fixed. Good spot, it was staring me in the face! > For A-profile my recommendation is as far as you possibly can to avoid having > any "is this v8.1?" or "is this v8.2?" tests and instead to stick strictly to > "is feature X present?". The architecture permits a v8.x implementation to I thought about this disconnect between version and supported feature set when implementing the initial patch and considered omitting versions altogether. I decided against it because I don't yet have enough experience with Valgrind, particularly the decode-to-IR logic and the tools pipeline. On the basis that the version is set when its mandatory option(s) are established and it may simplify things down the line, I prefer to leave this for now. If it transpires versions are redundant, I'll remove them at the earliest sign. Your logic attempting to identify v8.1, v8.2, etc isn't right. For instance: + /* Must be at least v8.2 if DC CVAP instruction available. */ + if (have_dpbcvap) + SET_VEX_ARM64_ARCHLEVEL(vai.hwcaps, VEX_HWCAPS_ARM64_V82); It's true that if you are v8.2 or better then DC CVAP is present, but this does not mean the converse (that if DC CVAP is present then you are v8.2). It is valid for a v8.1 implementation to provide this feature. I still think it would be better not to try to identify and base anything on v8.1 vs v8.2 vs v8.3...chances are really high that if some code is using "is this v8.1 ?" as a condition on something then that code is wrong, because it should be checking for a more specific feature field (or it will behave incorrectly on hardware which provides that feature without necessarily being the v8.x version where that feature became mandatory, or whatever). Created attachment 133750 [details]
Adds arm64 feature detection for v8.x instructions
Removed v8.x version detection/setting. Now using feature list only.
> It's true that if you are v8.2 or better then DC CVAP is present, but this does not mean the converse (that if DC CVAP is present then you are v8.2). It is valid for a v8.1 implementation to provide this feature.
That makes a difference to my decision. I made an incorrect assumption when reading the spec:
- - - snip
0b0000 DC CVAP not supported.
0b0001 DC CVAP supported.
0b0010 DC CVAP and DC CVADP supported.
...
In Armv8.2, the permitted values are 0b0001 and 0b0010.
From Armv8.5, the only permitted value is 0b0010
- - - snip
"In Armv8.2" != "From Armv8.2"
But some features are more tightly bound to a version, e.g. RDM
- - - snip
0b0000 No RDMA instructions implemented.
0b0001 SQRDMLAH and SQRDMLSH instructions implemented.
All other values are reserved.
From Armv8.1, the only permitted value is 0b0001.
- - - snip
So, Arm publishes a version specification with a set of mandatory and optional features. An implementer builds h/w which states conformance to a version. The implementer's version conformance statement is fairly meaningless without a list of features which are supported.
This answers the question: "why doesn't /proc/cpuinfo give an Arm version?"
OK with the latest patch the machine that crashed now simply says: Arch and hwcaps: ARM64, LittleEndian, v8 It does allow reading ID_AA64ISAR0_EL1 register but doesn't seem to have any features we care for (from /proc/cpuinfo): Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid I know too little about arm64 to know whether or not we should care about these. I see we already filter on these in the auxv AT_HWCAP. Once we support, and not just test, some of these new features we should forget to add them to the AT_HWCAP filter. Since programs can also check features through the attribute/feature registers, do we also need to filter those when inspected by the guest code? Or do we already do that? > I know too little about arm64 to know whether or not we should care about these. The current approach assumes that all base v8 h/w has the features you list. I've not seen any base v8 h/w which is different from that list. AFAICT: - We don't need to make any changes to Valgrind's existing base v8 support in terms of establishing and checking features. - Time has shown that no bugs have been raised to do with lack of base v8 feature support. > Since programs can also check features through the attribute/feature registers, do we also need to filter those when inspected by the guest code? Hmm...I don't see why we should. We aim to faithfully support features detected, so if guest code reads those it should expect Valgrind to support it. If Valgrind doesn't, it means we haven't yet implemented support for that feature. (If I understand your question correctly.) (In reply to ahashmi from comment #11) > > Since programs can also check features through the attribute/feature registers, do we also need to filter those when inspected by the guest code? > Hmm...I don't see why we should. We aim to faithfully support features > detected, so if guest code reads those it should expect Valgrind to support > it. If Valgrind doesn't, it means we haven't yet implemented support for > that feature. (If I understand your question correctly.) So the question is not about this particular patch. But how we signal to the guest process which features valgrind/VEX supports itself. If the guest code checks the auxv HW_CAPS we are good because we filter out any features not implemented. But if the guest code checks attribute/feature registers it might get the attributes/features supported by the host. But if valgrind/VEX doesn't support those and the guest tries to use them it will get a SIGILL instead of trying to use fallback code. So it seems to me we also need a way to filter the attribute/feature registers to only advertise those armv8 instructions we actually implement. That is what other arches do too (see for example the dirtyhelper_CPUID in guest_x86_helpers.c or the s390x_dirtyhelper_vec_op). IMHO we should only pass through any bits set in these registers if we actually have implemented them (and preemptively zero out any future reserved ones). > So it seems to me we also need a way to filter the attribute/feature registers to only advertise those armv8 instructions we actually implement. That is what other arches do too (see for example the dirtyhelper_CPUID in guest_x86_helpers.c or the s390x_dirtyhelper_vec_op). ISWYM now, thanks for shining a light on an aspect of Valgrind's transparency I hadn't thought about. I would prefer to implement such a guest feature support view/filter as a separate patch. I've created https://bugs.kde.org/show_bug.cgi?id=430117 to track this change. Thank you all for the high quality discussion and analysis. This looks fine to me; please land. One tiny nit .. please #undef get_cpu_ftr and get_ftr just before the closing brace of the fn that defines them. I agree with Peter's and Mark's view that this should be feature-based; but I see the difficulties in making that work cleanly because (unlike on x86) there's no "single source of truth" (viz, CPUID) that we can fake up for the guest, at our own convenience. The closest thing to single-source-of-truth is the architectural ID registers, which these days a Linux kernel will make available to userspace via trap-and-emulate and which presumably Valgrind does or should provide to its guest: https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.rst There's a hwcap that tells you if those are emulated for you. Created attachment 133936 [details]
Adds arm64 feature detection for v8.x instructions
#undef get_cpu_ftr and get_ftr after use.
> One tiny nit .. please #undef get_cpu_ftr and get_ftr just before the
> closing brace of the fn that defines them.
Good spot! Fixed in latest patch.
Landed, cb52fee5ddbc2c0e936fd1efe5107a1afcf375cf. |