Bug 447989 - Support Armv8.2 SHA-512 instructions
Summary: Support Armv8.2 SHA-512 instructions
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.19 GIT
Platform: unspecified Unspecified
: NOR normal
Target Milestone: ---
Assignee: Paul Floyd
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-05 19:04 UTC by David Benjamin
Modified: 2024-09-18 11:08 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Patch to implement SHA-512 instructions (12.18 KB, application/mbox)
2022-01-05 19:04 UTC, David Benjamin
Details
Reattached with the right content type (12.18 KB, text/plain)
2022-01-05 19:07 UTC, David Benjamin
Details
[PATCH 1/2] Extract common arm64 SIMD helpers into a single header (2.45 MB, application/mbox)
2024-05-16 18:29 UTC, David Benjamin
Details
[PATCH 2/2] Add support for Armv8.2 SHA-512 instructions (58.30 KB, text/plain)
2024-05-16 18:31 UTC, David Benjamin
Details
[PATCH 1/2] Extract common arm64 SIMD helpers into a single header (2.45 MB, text/plain)
2024-05-16 18:32 UTC, David Benjamin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Benjamin 2022-01-05 19:04:55 UTC
Created attachment 145139 [details]
Patch to implement SHA-512 instructions

Armv8.2 adds a number of new cryptography extensions, including SHA-512 instructions (FEAT_SHA512, which covers SHA512H, SHA512H2, SHA512SU0, and SHA512SU1). I was trying to test some SHA-512 assembly with Valgrind and ran into this.

I've attached a patch which implements them, mostly by adapting surrounding code. I wasn't sure how adding tests worked, so it's missing those. Happy to fill those in with some pointers.
Comment 1 David Benjamin 2022-01-05 19:07:33 UTC
Created attachment 145140 [details]
Reattached with the right content type
Comment 2 David Benjamin 2023-02-01 21:16:01 UTC
Anything needed before this patch is submittable?
Comment 3 Paul Floyd 2023-02-02 10:23:56 UTC
(In reply to David Benjamin from comment #2)
> Anything needed before this patch is submittable?

Someone actively working on the ARM version of Valgrind?
Comment 4 Paul Floyd 2024-05-15 07:23:49 UTC
(In reply to Paul Floyd from comment #3)
> (In reply to David Benjamin from comment #2)
> > Anything needed before this patch is submittable?
> 
> Someone actively working on the ARM version of Valgrind?

^^^ well now that's me, time to look atthis again.
Comment 5 Paul Floyd 2024-05-15 07:40:48 UTC
This will also need a change in arm64g_dirtyhelper_MRS_ID_AA64ISAR0_EL1

This bit will need to be removed

   /* Degredate SHA2 from b0010 to b0001*/
   if ( (w >> 12) & 0x2 ) {
      w ^= (0x2 << 12);
      w |= (0x1 << 12);
   }

And at least one testcase would also be good!
Comment 6 David Benjamin 2024-05-15 13:06:46 UTC
This was quite some time ago, so I'll have to page this back in and remind myself how I tested this. :-)

> This will also need a change in arm64g_dirtyhelper_MRS_ID_AA64ISAR0_EL1

I don't believe that code was there when I first uploaded the patch, but yeah, I can update that.

> And at least one testcase would also be good!

Yup. See the initial comment. :-P

> I wasn't sure how adding tests worked, so it's missing those. Happy to fill those in with some pointers.

Do you have any pointers?
Comment 7 Paul Floyd 2024-05-15 19:50:12 UTC
(In reply to David Benjamin from comment #6)
> This was quite some time ago, so I'll have to page this back in and remind
> myself how I tested this. :-)
> 
> > This will also need a change in arm64g_dirtyhelper_MRS_ID_AA64ISAR0_EL1
> 
> I don't believe that code was there when I first uploaded the patch, but
> yeah, I can update that.

Yes, it only recently got added.

> Do you have any pointers?

I think that in this case I would just add sha512 tests to none/tests/fp_and_simd (or fp_and_simd_v82 - I'm not sure what the minimum version is for sha512).

Otherwise I just added a section to README_DEVELOPERS on writing regression tests.
Comment 8 David Benjamin 2024-05-16 03:10:17 UTC
(In reply to Paul Floyd from comment #7)
> I think that in this case I would just add sha512 tests to
> none/tests/fp_and_simd (or fp_and_simd_v82 - I'm not sure what the minimum
> version is for sha512).
> 
> Otherwise I just added a section to README_DEVELOPERS on writing regression
> tests.

Ah, perfect! Thanks, that's all really helpful! I think I've gotten the start of it working. A couple questions before I finish this up:

Regarding the generating the expected files by running the test, I assume those just record what my patch does. I.e. if my patch were wrong, the wrong things would get recorded in expectations. I've run my test against BoringSSL's use of these instructions, so I'm fairly confident it's correct. But it seems to me we can do better: just synthesize the expected output by running the instructions directly, without involving valgrind at all.

I don't suppose this exists? (If not, no big deal. Like I said, I've already manually tested this against BoringSSL.)

Regarding where to put the tests, these are indeed Armv8.2 instructions, but I noticed fp_and_simd_v82's prerequisite is actually the fphp (floating point half-precision) extension. These instructions are from a separate extension. My test device happens to have both but, in principle, they're orthogonal.

Should I still put them in fp_and_simd_v82 or make a new one? I'm guessing a new one would be preferable, though this would get pretty large fast. Arm, alas, has lots and lots of extensions. E.g. I noticed that the Armv8.0 cryptography instructions are just in the plain fp_and_simd test, with no prereq, but not all Armv8.0 chips have those instructions. (It's pretty rare to omit them, but I think the Raspberry Pis don't?)

If I make a new one, should I just copy all the various helper macros at the top of that file, or try to extract them somewhere? Seems they're in a few places already.
Comment 9 Paul Floyd 2024-05-16 06:12:43 UTC
(In reply to David Benjamin from comment #8)

> Regarding the generating the expected files by running the test, I assume
> those just record what my patch does. I.e. if my patch were wrong, the wrong
> things would get recorded in expectations. I've run my test against
> BoringSSL's use of these instructions, so I'm fairly confident it's correct.
> But it seems to me we can do better: just synthesize the expected output by
> running the instructions directly, without involving valgrind at all.

Yes. Generally for the error detecting tools the stderr.exp file checks that the
errors are detected (and that there are no false positives). The 'none' tests
ensure that the opcodes are handled and they usually 'printf' random values
that get fed into the FP or crypto opcodes. Running the standalone exe to
generate the stdout.exp files is probably the best. There are some rare cases
when it needs to be done with Valgrind (usually when the exe using client
requests to change code paths).
 
> I don't suppose this exists? (If not, no big deal. Like I said, I've already
> manually tested this against BoringSSL.)

There are some perl scripts for testcase generation. For instance see
none/tests/amd64/gen_insn_test.pl

Otherwise I just use a lot of copy and paste.

> Regarding where to put the tests, these are indeed Armv8.2 instructions, but
> I noticed fp_and_simd_v82's prerequisite is actually the fphp (floating
> point half-precision) extension. These instructions are from a separate
> extension. My test device happens to have both but, in principle, they're
> orthogonal.
> 
> Should I still put them in fp_and_simd_v82 or make a new one? I'm guessing a
> new one would be preferable, though this would get pretty large fast. Arm,
> alas, has lots and lots of extensions. E.g. I noticed that the Armv8.0
> cryptography instructions are just in the plain fp_and_simd test, with no
> prereq, but not all Armv8.0 chips have those instructions. (It's pretty rare
> to omit them, but I think the Raspberry Pis don't?)
> 
> If I make a new one, should I just copy all the various helper macros at the
> top of that file, or try to extract them somewhere? Seems they're in a few
> places already.

I'm not really an ARM expert. It sounds like it would be best to create a new test
and refactor the helpers into a single header.

For the extensions and -march (and -mcpu -mthumb -marm) it's all a bit of a mess.
For instance see https://bugs.kde.org/show_bug.cgi?id=454346. All I ran really
say is that it worked for the original environment used for the arm64 port
(qemu afaict) and it works on my Pi 5, the sourceware buildbot farm and the
few other aarch64 machines that I have access to.
Comment 10 David Benjamin 2024-05-16 18:29:52 UTC
Created attachment 169538 [details]
[PATCH 1/2] Extract common arm64 SIMD helpers into a single header

Alright, here's the first of the two patches. This one pulls out some common code for testing SIMD stuff. Git commit message in the patch has some notes.
Comment 11 David Benjamin 2024-05-16 18:31:45 UTC
Created attachment 169539 [details]
[PATCH 2/2] Add support for Armv8.2 SHA-512 instructions

And here's the second one. Test expectations generated by running the test program uninstrumented, so it's unlikely that we're baking in any bugs.

Think that's everything. Let me know if you'd like any more changeS!
Comment 12 David Benjamin 2024-05-16 18:32:46 UTC
Created attachment 169540 [details]
[PATCH 1/2] Extract common arm64 SIMD helpers into a single header

(Reattaching the first one with the right content type.)
Comment 13 Paul Floyd 2024-05-19 05:25:11 UTC
Thanks for the patches. I've applied, tested and pushed the first one. The second patch looks OK so far but I'd like to check it on FreeBSD before pushing the change. It might take me a little while to see if I can get my hands on a suitable machine.
Comment 14 David Benjamin 2024-05-19 18:10:51 UTC
(In reply to Paul Floyd from comment #13)
> Thanks for the patches. I've applied, tested and pushed the first one. The
> second patch looks OK so far but I'd like to check it on FreeBSD before
> pushing the change. It might take me a little while to see if I can get my
> hands on a suitable machine.

Sounds good. Let me know if there's any way I can help! (I've been testing on a Linux VM on one an M1 Mac, which have the extension available. QEMU probably knows how to emulate it too.)
Comment 15 Paul Floyd 2024-05-20 19:21:44 UTC
I got some AWS credits from the FreeBSD dev that makes everything work on ec2, no problems with the patch.

Thanks again.

commit 5bf987891972478376e816fefea2d9abac3cb4fe (origin/master, origin/HEAD)
Author: David Benjamin <davidben@google.com>
Date:   Wed Jan 5 00:22:30 2022 -0500

    Add support for Armv8.2 SHA-512 instructions
    
    Fixes https://bugs.kde.org/show_bug.cgi?id=447989
Comment 16 Paul Floyd 2024-05-21 08:42:29 UTC
I need to add a sha3 feature test. sha512_v82 fails to build on a Cavium thunderx2t99.

diff --git a/configure.ac b/configure.ac
index 8792150..20ff73e 100755
--- a/configure.ac
+++ b/configure.ac
@@ -5618,6 +5618,23 @@ AC_MSG_RESULT([no])
 AM_CONDITIONAL(HAVE_THRD_CREATE, test x$ac_cxx_have_thrd_create = xyes)
 
 
+# Check arm64 sha3
+safe_CFLAGS=$CFLAGS
+CFLAGS="${CFLAGS} -march=armv8.2-a+sha3"
+AC_MSG_CHECKING([for sha3])
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[]], [[
+   return 0;
+]])],
+[
+ac_have_sha3=yes
+AC_MSG_RESULT([yes])
+], [
+ac_have_sha3=no
+AC_MSG_RESULT([no])
+])
+
+AM_CONDITIONAL(HAVE_SHA3, test x$ac_have_sha3 = xyes)
+CFLAGS=$safe_CFLAGS
 
 #----------------------------------------------------------------------------
 # Ok.  We're done checking.
diff --git a/none/tests/arm64/Makefile.am b/none/tests/arm64/Makefile.am
index 296000d..071cce0 100644
--- a/none/tests/arm64/Makefile.am
+++ b/none/tests/arm64/Makefile.am
@@ -45,7 +45,11 @@ if BUILD_ARMV81_TESTS
 endif
 
 if BUILD_ARMV82_TESTS
-  check_PROGRAMS += fp_and_simd_v82 sha512_v82
+  check_PROGRAMS += fp_and_simd_v82
+endif
+
+if HAVE_SHA3
+  check_PROGRAMS += sha512_v82
 endif
 
 if HAVE_CXX17
Comment 17 David Benjamin 2024-05-21 14:06:49 UTC
Ah whoops, yeah. TBH we probably don't strictly need to build the test with -march=armv8.2-a+sha3, since it's all in inline assembly anyway. I just carried that over from the other tests. But we do need the assembler to recognize the new instructions, which is probably analogous to checking if -march=armv8.2-a+sha3 works.
Comment 18 Paul Floyd 2024-05-22 06:02:56 UTC
commit 60fca88acc9311fbd910f852f0761110a21cd7e8 (HEAD -> master, origin/master, origin/HEAD)
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Wed May 22 08:01:40 2024 +0200

    arm64 regtest: add a configure check that the compiler accepts sha3.