The current implementation causes rounding issues as reported in https://bugzilla.redhat.com/show_bug.cgi?id=1844910#c7
Created attachment 132242 [details] arm64 VEX backend support for Iop_M{Add,Sub}F{32,64} to fmadd and fmsub Sassha was experimenting with turning the fmadd/fmsub into Iop_MAdd/Sub only to discover the VEX arm64 backend doesn't accept those Iops. This patch attempts to implement them in the backend. It compiles but it is totally untested otherwise.
Created attachment 132344 [details] tests patch ests for both 32 and 64 bit versions for fmadd and fmsub test results: ./fmadd32 FMADD 32bit: dst = z + x * y 76.246193 = 38.123096 + 55.000000 * 0.693147 /vg-in-place -q ./fmadd32 FMADD 32bit: dst = z + x * y 76.246193 = 38.123096 + 55.000000 * 0.693147 ./fmadd64 FMADD 64bit: dst = z + x * y 76.246190 = 38.123095 + 55.000000 * 0.693147 ./vg-in-place -q ./fmadd64 FMADD 64bit: dst = z + x * y 76.246190 = 38.123095 + 55.000000 * 0.693147 /fmsub32 FMSUB 32bit: dst = z + (-x) * y 0.000001 = 38.123096 + (-55.000000) * 0.693147 dst = 0000000035c00000 = 1.430511474609375e-06 /vg-in-place -q ./fmsub32 FMSUB 32bit: dst = z + (-x) * y 0.000000 = 38.123096 + (-55.000000) * 0.693147 dst = 0000000000000000 = 0 ./fmsub64 FMSUB 64bit: dst = z + (-x) * y -0.000000 = 38.123095 + (-55.000000) * 0.693147 dst = bce9000000000000 = -2.7755575615628914e-15 ./vg-in-place -q ./fmsub64 FMSUB 64bit: dst = z + (-x) * y 0.000000 = 38.123095 + (-55.000000) * 0.693147 dst = 0000000000000000 = 0
Created attachment 132351 [details] ARM64 VEX frontend support for fmadd/fmsub This should implement the frontend support by replacing the separate Iop_Mul and Iop_Add with a single Iop_MAddF(32|64). But looking at the tests something is wrong because all results are garbled. Possibly we are not passing the arguments in the correct order. Will have to investigate with --trace-flags to see what IR is really generated and which fmadd/fmsub instructions are really produced by the backend.
Created attachment 132887 [details] Fix ARM64Instr_VTriS/D argument order Turned out we had one typo assigning arguments (skipping arg2 and using arg3 twice) and the order of the addend, multiplier product. Which obviously caused the result to be totally incorrect. With this fixed the results look sane. There is a question whether the original argument order of the Iop_MAdd/SubF32/64 were correct in the first place. We better make sure we use the same as other arches (s390 and mips) so any VEX IR optimization PASS gets/keeps things in the correct order.
Created attachment 134086 [details] patch
Created attachment 134093 [details] patch Thanks. Cleaned up the patch a little amd added a few more comments in the commit message. - Removed the standalone tests_fmadd/sub32/64.c tests in the top level dir. - Updated the SPEC/IMPL comment to show how we implement it now using fmadd and fmsub - Removed the now unused opADD, opSUB, opMUL and eNxM - Reindented the ix switch statement
Pushed the patch with one small change. I replaced none/tests/arm64/fmadd_sub.stdout.exp with the exact output of running none/tests/arm64/fmadd_sub on bare metal. Now that we fixed the -0.0/+0.0 sign issues the result running under valgrind must be identical to running without. commit 04cdc29b007594a0e58ffef0c9dd87df3ea595ea Author: Mark Wielaard <mark@klomp.org> Date: Wed Oct 14 06:11:34 2020 -0400 arm64 VEX frontend and backend support for Iop_M{Add,Sub}F{32,64} The arm64 frontend used to implement the scalar fmadd, fmsub, fnmadd and fnmsub iinstructions into separate addition/substraction and multiplication instructions, which caused rounding issues. This patch turns them into Iop_M{Add,Sub}F{32,64} instructions (with some arguments negated). And the backend now emits fmadd or fmsub instructions. Alexandra Hajkova <ahajkova@redhat.com> added tests and fixed up the implementation to make sure rounding (and sign) are correct now. https://bugs.kde.org/show_bug.cgi?id=426014