406256 – PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

Bug 406256 - PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

Summary: PPC64, vector floating point instructions don't handle subnormal according to...

Status:	CLOSED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	vex (show other bugs)
Version:	3.15 SVN
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-04-05 15:39 UTC by Carl Love
Modified:	2019-05-28 18:55 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
Proposed fix for making the subnormal results track the VSCR[NJ] bit (1.57 MB, patch) 2019-04-05 16:04 UTC, Carl Love	Details
Updated patch to fix issues with dnormal values (1.76 MB, patch) 2019-04-19 15:17 UTC, Carl Love	Details
Updated patch to fix issues with dnormal values v3 (1.77 MB, patch) 2019-05-08 17:03 UTC, Carl Love	Details
Updated patch to fix issues with dnormal values v4 (1.77 MB, patch) 2019-05-08 18:17 UTC, Carl Love	Details
Updated patch to fix issues with dnormal values v5 (1.84 MB, patch) 2019-05-09 22:27 UTC, Carl Love	Details
Updated patch to fix issues with dnormal values v5 (27.44 KB, patch) 2019-05-15 21:27 UTC, Carl Love	Details
Update test case, add new test (1.81 MB, patch) 2019-05-15 21:28 UTC, Carl Love	Details
Show Obsolete (5) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Carl Love 2019-04-05 15:39:39 UTC

The handling of subnormal arguments and results for the various vector floating point instructions is controlled by the VSCR[NJ] bit.  

      VSCR[NJ] = 0  Denormalized values are handled as
                    specified by Java and the IEEE standard.

      VSCR[NJ] = 1  If an element in a source VR contains a denormalized
                    value, the value 0 is used instead. If an instruction
                    causes an Underflow Exception, the corresponding element
                    in the target VR is set to 0.  In both cases the 0 has
                    the same sign as the denormalized or underflowing value.
                    Convert negative zero to positive zero.

On BE systems, the VSCR[NJ] bit is set to 1.  On LE systems, the setting is 0 as required by the ABI.  Valgrind is generating results based on the setting of the bit in the host.  If the user changes the bit in their application, the subnormal results do not match the expected results.  This is due to the fact Valgrind does not track the setting of the bit.

Comment 1 Carl Love 2019-04-05 16:04:47 UTC

Created attachment 119258 [details]
Proposed fix for making the subnormal results track the VSCR[NJ] bit

The attachment is a proposed fix for the issue.  The Iops to implement the various vector floating point instructions map to a subset of the vector floating point instructions.  The underlying host hardware needs to run with the VSCR[NJ] bit set to zero to generate the subnormal results.  The guest state needs to then adjust the arguments/results of the vector floating point instructions as needed based on the guest setting of VSCR[NJ].  Hence the need to setup the host with VSCR[NJ] = 0.  This was done in VEX/priv/host_ppc_defs.c, function getRRegUniverse_PPC ().  The function is actually setting up the guest register state.  The function is called once as part of the initialization process.  That is the right time to set the host configuration.  Ideally there would be a host initialization function that would be called for doing this but I don't see one.  Given that there is not host specific function, I had to put the code into the guest setup function.  I don't consider this to be the ideal place for the code.  I would be interested in ideas on a more appropriate location for this code. *******

The host setup code requires the addition of the -mvsx and -maltivec command line options to be set for Power 7.  These options are on by default when compiling for Power 8, 9.  Hence the Makefile.all.am change.

Finally, the current regression tests do not seem to cover the subnormal cases well enough.  I created an explicit subnormal test which runs with the VSCR[NJ] bit set to zero and one.

Again, would appreciate feedback on where best to do the host initialization.

Comment 2 Will Schmidt 2019-04-08 14:26:10 UTC

I have no input into a different better spot for getRRegUniverse_PPC changes. 

A few other patch comments: 

+++ b/Makefile.all.am
the "-mvsx -maltivec" combo can be shrunk to just "-mvsx" .  (mvsx is a superset that includes maltivec).

+   /* LE API requires NJ be set to 0. */
A few spots.. that API reference should prob be ABI. 

+      vrm[0] = ~(1 << (127 - 111)) & vrm[0];  // Clear NJ bit LE case
+      vrm[1] = ~(1 << (127 - 111)) & vrm[1];  // Clear NJ bit BE case
Preferably wrap this in some logic so we are not writing to reserved bits of the vscr in the non-LE or non-BE cases. 


+static IRExpr* dnorm_adj ( IRExpr* value )
just wondering, should this be instead named dnorm_adj_int or some variation to avoid conflict if we need to do anything similar with Longs or Shorts in the future.

Comment 3 Carl Love 2019-04-19 15:17:33 UTC

Created attachment 119504 [details]
Updated patch to fix issues with dnormal values

Updated patch, needs review by Julian

Comment 4 Carl Love 2019-05-08 17:03:38 UTC

Created attachment 119918 [details]
Updated patch to fix issues with dnormal values v3

Updated the patch.  Moved the code to set VSCR[NJ] to the assembly routines for ppc64le, ppc64be, ppc32.  Tested on P8 LE, P8 BE, P9.  Manually verified the assembly code will jump to .invariant_violation if the VSCR[NJ] bit is set to 1 by replacing the "mfvscr 7" instruction with some instructions that sets contents of register v7 to 0x010000 which is what the value would be if the NJ bit is set.

The routine dnormV32_adj() was rewritten to use vector Iops.  Fixed a couple of bugs that I found in retesting the patch.

Comment 5 Carl Love 2019-05-08 18:17:33 UTC

Created attachment 119920 [details]
Updated patch to fix issues with dnormal values v4

Testing on P7 found that I needed to make sure the system supports altivec or the subnormal_test will fail on illegal inst.  Updated the subnormal_test.vgtest file

Comment 6 Carl Love 2019-05-08 19:08:15 UTC

Additional testing shows that the  mtvsrd  and mfvsrd instructions which are used in the assembly interface functions are not supported on P7 and earlier.  Will need to rework current patch.

Comment 7 Carl Love 2019-05-09 22:27:07 UTC

Created attachment 119940 [details]
Updated patch to fix issues with dnormal values v5

Updated patch after finding issues on Power 7.  The new assembly code was re-written to remove the mtvsrd and mfvsrd so the code will run.  The specific files are:

 coregrind/m_dispatch/dispatch-ppc32-linux.S  
 coregrind/m_dispatch/dispatch-ppc64be-linux.S  
 coregrind/m_dispatch/dispatch-ppc64le-linux.S 

The updated patch has been tested on Power 6, Power 7, Power 8LE, Power 8BE and Power 9.

Comment 8 Julian Seward 2019-05-14 09:04:49 UTC

Thanks for the respin.  I have mostly only minor comments about it.  Is OK to
land provided all the comments below are addressed, except for the one about
vectorising negateVF32, which would be nice to fix if you can do so relatively
easily, but is not essential.

Also, when landing, please split the patch into two parts: the implementation
and the tests, and land the implementation first.


--- a/VEX/priv/guest_ppc_toIR.c
+++ b/VEX/priv/guest_ppc_toIR.c

is_Zero_Vector, is_Denorm_Vector and dnormV32_adj have vectorised nicely.  Is
it also possible to do negateVF32 with vectors, rather lane by lane as at
present?  Note, for a vector version of is_NaN, you can see more or less how
to do it by looking at isNan() in host_ppc_isel.c.


+static IRExpr* dnormV32_adj ( IRExpr* src )

nit: maybe rename this to be more consistent with your other vector-helper
function names (is_Zero_Vector etc)


+   assign ( VSCR_NJ_mask, binop( Iop_64HLtoV128,
+                                 unop( Iop_1Sto64,
+                                       mkexpr( VSCR_NJ ) ) ,
+                                 unop( Iop_1Sto64,
+                                       mkexpr( VSCR_NJ ) ) ) );

nit: VSCR_NJ isn't used past this point.  Change its type to Ity_I64 and lift
the 1Sto64 operation into that definition, so it isn't duplicated here.


--- a/coregrind/m_dispatch/dispatch-ppc32-linux.S
+++ b/coregrind/m_dispatch/dispatch-ppc32-linux.S

 LafterFP2:
+	/* set host Vector Status Control Register bit NJ to zero
+	   to ensure the host generate subnormal results for the
+	   vector floating point instructions. */
+        mfvscr  16  /* Clear NJ bit */
+        vspltisw 9,0x1                   /* 4x 0x00000001 */
+        vspltisw 8,0x0                   /* zero */
+        vsldoi   9,8,9,0x2               /* <<2*8 => 4x 0x00010000 */
+        vnor     9,9,9                   /* 4x 0xFFFEFFFF */
+	vand     16,16,9                 /* Mask out NJ bit */
+        mtvscr  16

(1) Guard these with #ifdef HAS_ALTIVEC like the other Altivec stuff in
this file.  Otherwise this will fail when run on a non-Altivec enabled target
(do we still support any of those)?  And the same for the other to assembly
files.

(2) (As a check) is the above sequence runnable even on the lowest level
Altivec subset?  Otherwise (again) it will fail at run time.

(3) I wasn't entirely clear what the changes to the post-run invariant checks
are (after label "postamble:").  IIUC, they already do check that
VSCR[NJ].host == 0, but the comments are wrong, and you've updated the
comments, but not the code?  Can you clarify/re-check?


diff --git a/memcheck/tests/ppc32/vgcore.9687 b/memcheck/tests/ppc32/vgcore.9687
new file mode 100644

This should definitely not be in the patch!

Comment 9 Carl Love 2019-05-15 21:27:52 UTC

Created attachment 120089 [details]
Updated patch to fix issues with dnormal values v5

Updated the patch per latest comments from Julian.

Split patch into VEX patch and test case patch.

Renamed negateVF32(value) to negate_Vector( size, value), vectorized.

Created new function  is_NaN_Vector(size, value)

Renamed dnormV32_adj() to dnorm_adj_Vector().

Lifted VSCR_NJ, made it Ity_I64.

The various assembly routines:  Removed my new code to set VSCR[NJ]=0 as it is redundant given that there is existing code that is already clearing the register.  Updated the comments in the code to make it clear where VSCR[NJ] is set to 0.  Updated comments with regard to the invarent check to make it clear what is being checked and what to do based on the check.  There are now no functional changes to the assembly functions as they already ensure the VSCR[NJ] but is set to 0.

Comment 10 Carl Love 2019-05-15 21:28:48 UTC

Created attachment 120090 [details]
Update test case, add new test

The test case patch that goes with the subnormal changes in VEX.

Comment 11 Carl Love 2019-05-15 21:47:38 UTC

Julian:


> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.kde.org_sho
> w-5Fbug.cgi-3Fid-3D406256&d=DwIFaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RFEmMkZAk
> --
> _wFGN5tkM_A&m=5o2Sjz00Tqqxz4dOXWvuGwWsbia9aURAkghSmeY0Sm0&s=1rRXfsCpJ
> Dor_oQ1SSmOvBBYZkeJgD7aT4Iz_gYMxGk&e=
> 
> --- Comment #8 from Julian Seward <jseward@acm.org> ---
> Thanks for the respin.  I have mostly only minor comments about
> it.  Is OK to
> land provided all the comments below are addressed, except for the
> one about
> vectorising negateVF32, which would be nice to fix if you can do so
> relatively
> easily, but is not essential.
> 
> Also, when landing, please split the patch into two parts: the
> implementation
> and the tests, and land the implementation first.

Done.  I had all the changes in one patch to make it easier to move
files around to 5 different machines for testing.

> 
> 
> --- a/VEX/priv/guest_ppc_toIR.c
> +++ b/VEX/priv/guest_ppc_toIR.c
> 
> is_Zero_Vector, is_Denorm_Vector and dnormV32_adj have vectorised
> nicely.  Is
> it also possible to do negateVF32 with vectors, rather lane
> by lane as at
> present?  Note, for a vector version of is_NaN, you can see more or
> less how
> to do it by looking at isNan() in host_ppc_isel.c.
> 
Renamed negateVF32(value) to negate_Vector( size, value) to make the
naming more consistent.  Also, structured it to generalize easily to
more vector sizes.  Currently just supporting vector of F32.  This
change requires creating the new function is_NaN_Vector(size, value) as
you eluded to above.  Again, structured the function to easily extend
to more vector sizes.

> 
> +static IRExpr* dnormV32_adj ( IRExpr* src )
> 
> nit: maybe rename this to be more consistent with your other vector-
> helper
> function names (is_Zero_Vector etc)
> 

Yea, consistency is a good thing.  Renamed dnormV32_adj() to
dnorm_adj_Vector().  Note did not restructure the function to easily
extend to other vector sizes.  Left that for the future.
> 
> +   assign ( VSCR_NJ_mask, binop( Iop_64HLtoV128,
> +                                 unop( Iop_1Sto64,
> +                                       mkexpr( VSCR_NJ ) ) ,
> +                                 unop( Iop_1Sto64,
> +                                       mkexpr( VSCR_NJ ) ) ) );
> 
> nit: VSCR_NJ isn't used past this point.  Change its type to Ity_I64
> and lift
> the 1Sto64 operation into that definition, so it isn't duplicated
> here.
> 

Yea, that would be better.  Fixed.

> 
> --- a/coregrind/m_dispatch/dispatch-ppc32-linux.S
> +++ b/coregrind/m_dispatch/dispatch-ppc32-linux.S
> 
>  LafterFP2:
> +       /* set host Vector Status Control Register bit NJ to zero
> +          to ensure the host generate subnormal results for the
> +          vector floating point instructions. */
> +        mfvscr  16  /* Clear NJ bit */
> +        vspltisw 9,0x1                   /* 4x 0x00000001 */
> +        vspltisw 8,0x0                   /* zero */
> +        vsldoi   9,8,9,0x2               /* <<2*8 => 4x 0x00010000
> */
> +        vnor     9,9,9                   /* 4x 0xFFFEFFFF */
> +       vand     16,16,9                 /* Mask out NJ bit */
> +        mtvscr  16
> 
> (1) Guard these with #ifdef HAS_ALTIVEC like the other Altivec stuff
> in
> this file.  Otherwise this will fail when run on a non-Altivec
> enabled target
> (do we still support any of those)?  And the same for the other to
> assembly
> files.


> (2) (As a check) is the above sequence runnable even on the lowest
> level
> Altivec subset?  Otherwise (again) it will fail at run time.
> 
> (3) I wasn't entirely clear what the changes to the post-run
> invariant checks
> are (after label "postamble:").  IIUC, they already do check that
> VSCR[NJ].host == 0, but the comments are wrong, and you've updated
> the
> comments, but not the code?  Can you clarify/re-check?

When I went in to fix up the code per you comments, I noticed that the
code I added to set VSCR[NJ] = 0 is not doing anything. I had missed
the code a bit lower:

       /* set host AltiVec control word to the default mode expected           
           by VEX-generated code. */                                            
   ld 6,.tocent__vgPlain_machine_ppc64_has_VMX@toc(2)                           
   ld 6,0(6)                                                                    
        cmpldi  6,0                                                             
        beq     .LafterVMX2                                                     
                                                                                
        vspltisw 3,0x0  /* generate zero */                                     
        mtvscr  3    

which is forcing the entire VSCR register to zero.  The instruction
vtvscr moves the contents of the specified register to VSCR.  This is
actually what I had originally been looking for when I was trying to
figure out why Valgrind always generated subnormal results but natively
I wasn't getting subnormal results on BE.  This code to set VSCR to
zero exists on ppc32, ppc64le, ppc64be.  

The comment /* Check VSCR[NJ] == 1 */ really throws me, hence I was
trying to make it clear that we need the NJ bit to be zero.  The code
is checking for the bit to be set 1, if it is, then it calls the
invariant violation.  Chanaged the comment to make this clearer to the
reader what is going on.   

So, I removed my code to set VSCR[NJ] as it is overwritten anyway.  I
updated the code with comments to make it clearer where VSCR[NJ] gets
set to zero and really what is going on with the invarient check.  So,
I claim there is now no functional changes in the assembly code.
  
Retested on Power 6, Power 7, Power 8LE, Power 8BE, Power 9.

> 
> diff --git a/memcheck/tests/ppc32/vgcore.9687
> b/memcheck/tests/ppc32/vgcore.9687
> new file mode 100644
> 
> This should definitely not be in the patch!

Argh!! not sure how I managed that.  Fixed
> 

Given the changes to the assembly functions are not trivial, probably
best for another quick review before committing.

I have updated the the corresponding bugzilla:  Bug 406256 - PPC64,
vector floating point instructions don't handle subnormal according to
VSCR[NJ] bit setting
      https://bugs.kde.org/show_bug.cgi?id=406256

The changes to this patch require the patch for adding support to
vlogefp and vexptefp instructions to change due to the function name
changes.   

https://bugs.kde.org/show_bug.cgi?id=407340

This patch needs your OK on the name of the added Iop.

                        Carl

Comment 12 Julian Seward 2019-05-27 12:13:07 UTC

> Updated patch to fix issues with dnormal values v5 (27.44 KB, patch)
> 2019-05-15 21:27 UTC, Carl Love 	Details
> Update test case, add new test (1.81 MB, patch)
> 2019-05-15 21:28 UTC, Carl Love 

Carl, these look fine to land now.  Thank you for your patience with this.
I am particularly pleased that the supporting functions (is_NaN, etc) all
got vectorised.

Comment 13 Carl Love 2019-05-28 18:54:39 UTC

The VEX and testsuite patch were committed. 

commit d2cbb78a151256290d490fcb7a805884d6406a7e
Author: Carl Love <carll@us.ibm.com>
Date:   Tue May 28 11:33:00 2019 -0500

    PPC64, Subnormal testcase changes
    
    VEX patch fixed issues with generating subnormal results.
    
    This patch adds a specific test case and updates the expected values for
    the existing test case.
    
    Update jm-vmx tests, add subnormal test case.
    
    https://bugs.kde.org/show_bug.cgi?id=406256

commit 991db2a39bcbdbf5cdb4337684c29f96c63070a8
Author: Carl Love <carll@us.ibm.com>
Date:   Tue May 28 11:26:13 2019 -0500

    PPC64, fix issues with dnormal values in the vector fp instructions.
    
    The result of the floating point instructions vmaddfp, vnmsubfp,
    vaddfp, vsubfp, vmaxfp, vminfp, vrefp, vrsqrtefp, vcmpeqfp, vcmpeqfp,
    vcmpgefp, vcmpgtfp are controlled by the setting of the NJ bit in
    the VSCR register.  If VSCR[NJ] = 0; then denormalized values are
    handled as specified by Java and the IEEE standard.  If the bit is
    a 1, then the denormalized element in the vector is replaced with
    a zero.
    
    Valgrind was not properly handling the denormalized case for these
    instructions.  This patch fixes the issue.
    
    https://bugs.kde.org/show_bug.cgi?id=406256

Closing