The handling of subnormal arguments and results for the various vector floating point instructions is controlled by the VSCR[NJ] bit. VSCR[NJ] = 0 Denormalized values are handled as specified by Java and the IEEE standard. VSCR[NJ] = 1 If an element in a source VR contains a denormalized value, the value 0 is used instead. If an instruction causes an Underflow Exception, the corresponding element in the target VR is set to 0. In both cases the 0 has the same sign as the denormalized or underflowing value. Convert negative zero to positive zero. On BE systems, the VSCR[NJ] bit is set to 1. On LE systems, the setting is 0 as required by the ABI. Valgrind is generating results based on the setting of the bit in the host. If the user changes the bit in their application, the subnormal results do not match the expected results. This is due to the fact Valgrind does not track the setting of the bit.
Created attachment 119258 [details] Proposed fix for making the subnormal results track the VSCR[NJ] bit The attachment is a proposed fix for the issue. The Iops to implement the various vector floating point instructions map to a subset of the vector floating point instructions. The underlying host hardware needs to run with the VSCR[NJ] bit set to zero to generate the subnormal results. The guest state needs to then adjust the arguments/results of the vector floating point instructions as needed based on the guest setting of VSCR[NJ]. Hence the need to setup the host with VSCR[NJ] = 0. This was done in VEX/priv/host_ppc_defs.c, function getRRegUniverse_PPC (). The function is actually setting up the guest register state. The function is called once as part of the initialization process. That is the right time to set the host configuration. Ideally there would be a host initialization function that would be called for doing this but I don't see one. Given that there is not host specific function, I had to put the code into the guest setup function. I don't consider this to be the ideal place for the code. I would be interested in ideas on a more appropriate location for this code. ******* The host setup code requires the addition of the -mvsx and -maltivec command line options to be set for Power 7. These options are on by default when compiling for Power 8, 9. Hence the Makefile.all.am change. Finally, the current regression tests do not seem to cover the subnormal cases well enough. I created an explicit subnormal test which runs with the VSCR[NJ] bit set to zero and one. Again, would appreciate feedback on where best to do the host initialization.
I have no input into a different better spot for getRRegUniverse_PPC changes. A few other patch comments: +++ b/Makefile.all.am the "-mvsx -maltivec" combo can be shrunk to just "-mvsx" . (mvsx is a superset that includes maltivec). + /* LE API requires NJ be set to 0. */ A few spots.. that API reference should prob be ABI. + vrm[0] = ~(1 << (127 - 111)) & vrm[0]; // Clear NJ bit LE case + vrm[1] = ~(1 << (127 - 111)) & vrm[1]; // Clear NJ bit BE case Preferably wrap this in some logic so we are not writing to reserved bits of the vscr in the non-LE or non-BE cases. +static IRExpr* dnorm_adj ( IRExpr* value ) just wondering, should this be instead named dnorm_adj_int or some variation to avoid conflict if we need to do anything similar with Longs or Shorts in the future.
Created attachment 119504 [details] Updated patch to fix issues with dnormal values Updated patch, needs review by Julian
Created attachment 119918 [details] Updated patch to fix issues with dnormal values v3 Updated the patch. Moved the code to set VSCR[NJ] to the assembly routines for ppc64le, ppc64be, ppc32. Tested on P8 LE, P8 BE, P9. Manually verified the assembly code will jump to .invariant_violation if the VSCR[NJ] bit is set to 1 by replacing the "mfvscr 7" instruction with some instructions that sets contents of register v7 to 0x010000 which is what the value would be if the NJ bit is set. The routine dnormV32_adj() was rewritten to use vector Iops. Fixed a couple of bugs that I found in retesting the patch.
Created attachment 119920 [details] Updated patch to fix issues with dnormal values v4 Testing on P7 found that I needed to make sure the system supports altivec or the subnormal_test will fail on illegal inst. Updated the subnormal_test.vgtest file
Additional testing shows that the mtvsrd and mfvsrd instructions which are used in the assembly interface functions are not supported on P7 and earlier. Will need to rework current patch.
Created attachment 119940 [details] Updated patch to fix issues with dnormal values v5 Updated patch after finding issues on Power 7. The new assembly code was re-written to remove the mtvsrd and mfvsrd so the code will run. The specific files are: coregrind/m_dispatch/dispatch-ppc32-linux.S coregrind/m_dispatch/dispatch-ppc64be-linux.S coregrind/m_dispatch/dispatch-ppc64le-linux.S The updated patch has been tested on Power 6, Power 7, Power 8LE, Power 8BE and Power 9.
Thanks for the respin. I have mostly only minor comments about it. Is OK to land provided all the comments below are addressed, except for the one about vectorising negateVF32, which would be nice to fix if you can do so relatively easily, but is not essential. Also, when landing, please split the patch into two parts: the implementation and the tests, and land the implementation first. --- a/VEX/priv/guest_ppc_toIR.c +++ b/VEX/priv/guest_ppc_toIR.c is_Zero_Vector, is_Denorm_Vector and dnormV32_adj have vectorised nicely. Is it also possible to do negateVF32 with vectors, rather lane by lane as at present? Note, for a vector version of is_NaN, you can see more or less how to do it by looking at isNan() in host_ppc_isel.c. +static IRExpr* dnormV32_adj ( IRExpr* src ) nit: maybe rename this to be more consistent with your other vector-helper function names (is_Zero_Vector etc) + assign ( VSCR_NJ_mask, binop( Iop_64HLtoV128, + unop( Iop_1Sto64, + mkexpr( VSCR_NJ ) ) , + unop( Iop_1Sto64, + mkexpr( VSCR_NJ ) ) ) ); nit: VSCR_NJ isn't used past this point. Change its type to Ity_I64 and lift the 1Sto64 operation into that definition, so it isn't duplicated here. --- a/coregrind/m_dispatch/dispatch-ppc32-linux.S +++ b/coregrind/m_dispatch/dispatch-ppc32-linux.S LafterFP2: + /* set host Vector Status Control Register bit NJ to zero + to ensure the host generate subnormal results for the + vector floating point instructions. */ + mfvscr 16 /* Clear NJ bit */ + vspltisw 9,0x1 /* 4x 0x00000001 */ + vspltisw 8,0x0 /* zero */ + vsldoi 9,8,9,0x2 /* <<2*8 => 4x 0x00010000 */ + vnor 9,9,9 /* 4x 0xFFFEFFFF */ + vand 16,16,9 /* Mask out NJ bit */ + mtvscr 16 (1) Guard these with #ifdef HAS_ALTIVEC like the other Altivec stuff in this file. Otherwise this will fail when run on a non-Altivec enabled target (do we still support any of those)? And the same for the other to assembly files. (2) (As a check) is the above sequence runnable even on the lowest level Altivec subset? Otherwise (again) it will fail at run time. (3) I wasn't entirely clear what the changes to the post-run invariant checks are (after label "postamble:"). IIUC, they already do check that VSCR[NJ].host == 0, but the comments are wrong, and you've updated the comments, but not the code? Can you clarify/re-check? diff --git a/memcheck/tests/ppc32/vgcore.9687 b/memcheck/tests/ppc32/vgcore.9687 new file mode 100644 This should definitely not be in the patch!
Created attachment 120089 [details] Updated patch to fix issues with dnormal values v5 Updated the patch per latest comments from Julian. Split patch into VEX patch and test case patch. Renamed negateVF32(value) to negate_Vector( size, value), vectorized. Created new function is_NaN_Vector(size, value) Renamed dnormV32_adj() to dnorm_adj_Vector(). Lifted VSCR_NJ, made it Ity_I64. The various assembly routines: Removed my new code to set VSCR[NJ]=0 as it is redundant given that there is existing code that is already clearing the register. Updated the comments in the code to make it clear where VSCR[NJ] is set to 0. Updated comments with regard to the invarent check to make it clear what is being checked and what to do based on the check. There are now no functional changes to the assembly functions as they already ensure the VSCR[NJ] but is set to 0.
Created attachment 120090 [details] Update test case, add new test The test case patch that goes with the subnormal changes in VEX.
Julian: > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.kde.org_sho > w-5Fbug.cgi-3Fid-3D406256&d=DwIFaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RFEmMkZAk > -- > _wFGN5tkM_A&m=5o2Sjz00Tqqxz4dOXWvuGwWsbia9aURAkghSmeY0Sm0&s=1rRXfsCpJ > Dor_oQ1SSmOvBBYZkeJgD7aT4Iz_gYMxGk&e= > > --- Comment #8 from Julian Seward <jseward@acm.org> --- > Thanks for the respin. I have mostly only minor comments about > it. Is OK to > land provided all the comments below are addressed, except for the > one about > vectorising negateVF32, which would be nice to fix if you can do so > relatively > easily, but is not essential. > > Also, when landing, please split the patch into two parts: the > implementation > and the tests, and land the implementation first. Done. I had all the changes in one patch to make it easier to move files around to 5 different machines for testing. > > > --- a/VEX/priv/guest_ppc_toIR.c > +++ b/VEX/priv/guest_ppc_toIR.c > > is_Zero_Vector, is_Denorm_Vector and dnormV32_adj have vectorised > nicely. Is > it also possible to do negateVF32 with vectors, rather lane > by lane as at > present? Note, for a vector version of is_NaN, you can see more or > less how > to do it by looking at isNan() in host_ppc_isel.c. > Renamed negateVF32(value) to negate_Vector( size, value) to make the naming more consistent. Also, structured it to generalize easily to more vector sizes. Currently just supporting vector of F32. This change requires creating the new function is_NaN_Vector(size, value) as you eluded to above. Again, structured the function to easily extend to more vector sizes. > > +static IRExpr* dnormV32_adj ( IRExpr* src ) > > nit: maybe rename this to be more consistent with your other vector- > helper > function names (is_Zero_Vector etc) > Yea, consistency is a good thing. Renamed dnormV32_adj() to dnorm_adj_Vector(). Note did not restructure the function to easily extend to other vector sizes. Left that for the future. > > + assign ( VSCR_NJ_mask, binop( Iop_64HLtoV128, > + unop( Iop_1Sto64, > + mkexpr( VSCR_NJ ) ) , > + unop( Iop_1Sto64, > + mkexpr( VSCR_NJ ) ) ) ); > > nit: VSCR_NJ isn't used past this point. Change its type to Ity_I64 > and lift > the 1Sto64 operation into that definition, so it isn't duplicated > here. > Yea, that would be better. Fixed. > > --- a/coregrind/m_dispatch/dispatch-ppc32-linux.S > +++ b/coregrind/m_dispatch/dispatch-ppc32-linux.S > > LafterFP2: > + /* set host Vector Status Control Register bit NJ to zero > + to ensure the host generate subnormal results for the > + vector floating point instructions. */ > + mfvscr 16 /* Clear NJ bit */ > + vspltisw 9,0x1 /* 4x 0x00000001 */ > + vspltisw 8,0x0 /* zero */ > + vsldoi 9,8,9,0x2 /* <<2*8 => 4x 0x00010000 > */ > + vnor 9,9,9 /* 4x 0xFFFEFFFF */ > + vand 16,16,9 /* Mask out NJ bit */ > + mtvscr 16 > > (1) Guard these with #ifdef HAS_ALTIVEC like the other Altivec stuff > in > this file. Otherwise this will fail when run on a non-Altivec > enabled target > (do we still support any of those)? And the same for the other to > assembly > files. > (2) (As a check) is the above sequence runnable even on the lowest > level > Altivec subset? Otherwise (again) it will fail at run time. > > (3) I wasn't entirely clear what the changes to the post-run > invariant checks > are (after label "postamble:"). IIUC, they already do check that > VSCR[NJ].host == 0, but the comments are wrong, and you've updated > the > comments, but not the code? Can you clarify/re-check? When I went in to fix up the code per you comments, I noticed that the code I added to set VSCR[NJ] = 0 is not doing anything. I had missed the code a bit lower: /* set host AltiVec control word to the default mode expected by VEX-generated code. */ ld 6,.tocent__vgPlain_machine_ppc64_has_VMX@toc(2) ld 6,0(6) cmpldi 6,0 beq .LafterVMX2 vspltisw 3,0x0 /* generate zero */ mtvscr 3 which is forcing the entire VSCR register to zero. The instruction vtvscr moves the contents of the specified register to VSCR. This is actually what I had originally been looking for when I was trying to figure out why Valgrind always generated subnormal results but natively I wasn't getting subnormal results on BE. This code to set VSCR to zero exists on ppc32, ppc64le, ppc64be. The comment /* Check VSCR[NJ] == 1 */ really throws me, hence I was trying to make it clear that we need the NJ bit to be zero. The code is checking for the bit to be set 1, if it is, then it calls the invariant violation. Chanaged the comment to make this clearer to the reader what is going on. So, I removed my code to set VSCR[NJ] as it is overwritten anyway. I updated the code with comments to make it clearer where VSCR[NJ] gets set to zero and really what is going on with the invarient check. So, I claim there is now no functional changes in the assembly code. Retested on Power 6, Power 7, Power 8LE, Power 8BE, Power 9. > > diff --git a/memcheck/tests/ppc32/vgcore.9687 > b/memcheck/tests/ppc32/vgcore.9687 > new file mode 100644 > > This should definitely not be in the patch! Argh!! not sure how I managed that. Fixed > Given the changes to the assembly functions are not trivial, probably best for another quick review before committing. I have updated the the corresponding bugzilla: Bug 406256 - PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting https://bugs.kde.org/show_bug.cgi?id=406256 The changes to this patch require the patch for adding support to vlogefp and vexptefp instructions to change due to the function name changes. https://bugs.kde.org/show_bug.cgi?id=407340 This patch needs your OK on the name of the added Iop. Carl
> Updated patch to fix issues with dnormal values v5 (27.44 KB, patch) > 2019-05-15 21:27 UTC, Carl Love Details > Update test case, add new test (1.81 MB, patch) > 2019-05-15 21:28 UTC, Carl Love Carl, these look fine to land now. Thank you for your patience with this. I am particularly pleased that the supporting functions (is_NaN, etc) all got vectorised.
The VEX and testsuite patch were committed. commit d2cbb78a151256290d490fcb7a805884d6406a7e Author: Carl Love <carll@us.ibm.com> Date: Tue May 28 11:33:00 2019 -0500 PPC64, Subnormal testcase changes VEX patch fixed issues with generating subnormal results. This patch adds a specific test case and updates the expected values for the existing test case. Update jm-vmx tests, add subnormal test case. https://bugs.kde.org/show_bug.cgi?id=406256 commit 991db2a39bcbdbf5cdb4337684c29f96c63070a8 Author: Carl Love <carll@us.ibm.com> Date: Tue May 28 11:26:13 2019 -0500 PPC64, fix issues with dnormal values in the vector fp instructions. The result of the floating point instructions vmaddfp, vnmsubfp, vaddfp, vsubfp, vmaxfp, vminfp, vrefp, vrsqrtefp, vcmpeqfp, vcmpeqfp, vcmpgefp, vcmpgtfp are controlled by the setting of the NJ bit in the VSCR register. If VSCR[NJ] = 0; then denormalized values are handled as specified by Java and the IEEE standard. If the bit is a 1, then the denormalized element in the vector is replaced with a zero. Valgrind was not properly handling the denormalized case for these instructions. This patch fixes the issue. https://bugs.kde.org/show_bug.cgi?id=406256 Closing