Bug 426014 - arm64: implement fmadd and fmsub as Iop_MAdd/Sub instead of Iop_Add plus Iop_Mul
Summary: arm64: implement fmadd and fmsub as Iop_MAdd/Sub instead of Iop_Add plus Iop_Mul
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal (vote)
Target Milestone: ---
Assignee: Mark Wielaard
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-31 10:51 UTC by Mark Wielaard
Modified: 2020-12-15 15:36 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
arm64 VEX backend support for Iop_M{Add,Sub}F{32,64} to fmadd and fmsub (12.14 KB, text/plain)
2020-10-09 17:33 UTC, Mark Wielaard
Details
tests patch (3.72 KB, patch)
2020-10-14 11:13 UTC, Alexandra Hajkova
Details
ARM64 VEX frontend support for fmadd/fmsub (2.47 KB, text/plain)
2020-10-14 14:42 UTC, Mark Wielaard
Details
Fix ARM64Instr_VTriS/D argument order (2.31 KB, patch)
2020-10-30 11:20 UTC, Mark Wielaard
Details
patch (27.74 KB, patch)
2020-12-15 10:33 UTC, Alexandra Hajkova
Details
patch (25.57 KB, text/plain)
2020-12-15 13:23 UTC, Mark Wielaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2020-08-31 10:51:24 UTC
The current implementation causes rounding issues as reported in https://bugzilla.redhat.com/show_bug.cgi?id=1844910#c7
Comment 1 Mark Wielaard 2020-10-09 17:33:56 UTC
Created attachment 132242 [details]
arm64 VEX backend support for Iop_M{Add,Sub}F{32,64} to fmadd  and fmsub

Sassha was experimenting with turning the fmadd/fmsub into Iop_MAdd/Sub only to discover the VEX arm64 backend doesn't accept those Iops. This patch attempts to implement them in the backend. It compiles but it is totally untested otherwise.
Comment 2 Alexandra Hajkova 2020-10-14 11:13:39 UTC
Created attachment 132344 [details]
tests patch

ests for both 32 and 64 bit versions for fmadd and fmsub

test results:
./fmadd32
FMADD 32bit: dst = z + x * y
76.246193 = 38.123096 + 55.000000 * 0.693147

/vg-in-place -q ./fmadd32
FMADD 32bit: dst = z + x * y
76.246193 = 38.123096 + 55.000000 * 0.693147

./fmadd64
FMADD 64bit: dst = z + x * y
76.246190 = 38.123095 + 55.000000 * 0.693147

./vg-in-place -q  ./fmadd64
FMADD 64bit: dst = z + x * y
76.246190 = 38.123095 + 55.000000 * 0.693147

/fmsub32
FMSUB 32bit: dst = z + (-x) * y
0.000001 = 38.123096 + (-55.000000) * 0.693147
dst = 0000000035c00000 = 1.430511474609375e-06

/vg-in-place -q ./fmsub32
FMSUB 32bit: dst = z + (-x) * y
0.000000 = 38.123096 + (-55.000000) * 0.693147
dst = 0000000000000000 = 0

./fmsub64
FMSUB 64bit: dst = z + (-x) * y
-0.000000 = 38.123095 + (-55.000000) * 0.693147
dst = bce9000000000000 = -2.7755575615628914e-15

./vg-in-place -q ./fmsub64
FMSUB 64bit: dst = z + (-x) * y
0.000000 = 38.123095 + (-55.000000) * 0.693147
dst = 0000000000000000 = 0
Comment 3 Mark Wielaard 2020-10-14 14:42:35 UTC
Created attachment 132351 [details]
ARM64 VEX frontend support for fmadd/fmsub

This should implement the frontend support by replacing the separate Iop_Mul and Iop_Add with a single Iop_MAddF(32|64). But looking at the tests something is wrong because all results are garbled. Possibly we are not passing the arguments in the correct order. Will have to investigate with --trace-flags to see what IR is really generated and which fmadd/fmsub instructions are really produced by the backend.
Comment 4 Mark Wielaard 2020-10-30 11:20:05 UTC
Created attachment 132887 [details]
Fix ARM64Instr_VTriS/D argument order

Turned out we had one typo assigning arguments (skipping arg2 and using arg3 twice) and the order of the addend, multiplier product. Which obviously caused the result to be totally incorrect. With this fixed the results look sane.

There is a question whether the original argument order of the Iop_MAdd/SubF32/64 were correct in the first place. We better make sure we use the same as other arches (s390 and mips) so any VEX IR optimization PASS gets/keeps things in the correct order.
Comment 5 Alexandra Hajkova 2020-12-15 10:33:22 UTC
Created attachment 134086 [details]
patch
Comment 6 Mark Wielaard 2020-12-15 13:23:31 UTC
Created attachment 134093 [details]
patch

Thanks. Cleaned up the patch a little amd added a few more comments in the commit message.
- Removed the standalone tests_fmadd/sub32/64.c tests in the top level dir.
- Updated the SPEC/IMPL comment to show how we implement it now using fmadd and fmsub
- Removed the now unused opADD, opSUB, opMUL and eNxM
- Reindented the ix switch statement
Comment 7 Mark Wielaard 2020-12-15 15:36:07 UTC
Pushed the patch with one small change. I replaced none/tests/arm64/fmadd_sub.stdout.exp with the exact output of running none/tests/arm64/fmadd_sub on bare metal. Now that we fixed the -0.0/+0.0 sign issues the result running under valgrind must be identical to running without.

commit 04cdc29b007594a0e58ffef0c9dd87df3ea595ea
Author: Mark Wielaard <mark@klomp.org>
Date:   Wed Oct 14 06:11:34 2020 -0400

    arm64 VEX frontend and backend support for Iop_M{Add,Sub}F{32,64}
    
    The arm64 frontend used to implement the scalar fmadd, fmsub, fnmadd
    and fnmsub iinstructions into separate addition/substraction and
    multiplication instructions, which caused rounding issues.
    
    This patch turns them into Iop_M{Add,Sub}F{32,64} instructions
    (with some arguments negated). And the backend now emits fmadd or fmsub
    instructions.
    
    Alexandra Hajkova <ahajkova@redhat.com> added tests and fixed up the
    implementation to make sure rounding (and sign) are correct now.
    
    https://bugs.kde.org/show_bug.cgi?id=426014