Bug 276224

Summary: VEX implementation for PowerPC fp round instructions is incorrect for SNAN argument
Product: [Developer tools] valgrind Reporter: Maynard Johnson <maynardj>
Component: vexAssignee: Julian Seward <jseward>
Status: REPORTED ---    
Severity: normal CC: will_schmidt
Priority: LO    
Version First Reported In: 3.7 SVN   
Target Milestone: ---   
Platform: Unlisted Binaries   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: Patch to fix this bug
Version 2 of the patch to fix this bug

Description Maynard Johnson 2011-06-22 00:28:28 UTC
Created attachment 61226 [details]
Patch to fix this bug

When a SNAN argument is given to frim, the VEX implementation for this instruction yields an incorrect result. The Power ISA document has an appendix that documents the floating point rounding algorithm (in ISA 2.06, "A.4 Floating-Point Round to Integer Model").  In this appendix, it describes that SNAN inputs must be treated specially.  The attached patch modifies the VEX implementation accordingly and adds a SNAN testcase.
Comment 1 Maynard Johnson 2011-06-22 20:25:23 UTC
I took the blinders off and realized that this bug affects ALL of the floating point round operations (frin, friz, frip, and frim).  I'll make a new patch and attach it later.
Comment 2 Maynard Johnson 2011-06-22 20:27:30 UTC
Comment on attachment 61226 [details]
Patch to fix this bug

This patch is not complete (see comment #1) since we need to fix the problem for ALL of the floating point round ops.
Comment 3 Maynard Johnson 2011-06-22 21:43:32 UTC
Created attachment 61257 [details]
Version 2 of the patch to fix this bug
Comment 4 Julian Seward 2011-07-05 10:08:13 UTC
This is going to give a substantial performance hit for those
instructions.  Are there any applications for which this inaccuracy
is a problem?
Comment 5 Maynard Johnson 2011-07-05 13:13:42 UTC
(In reply to comment #4)
> This is going to give a substantial performance hit for those
> instructions.  Are there any applications for which this inaccuracy
> is a problem?
Not that I know.  I became aware of the problem when I was implementing the new VSX xsrdpim and xsrdpip instructions (round to DP integer).  My testcase was more exhaustive than the fri* tests, so it encountered the problem with SNAN inputs.  IMHO, I think it's better for Valgrind to be accurate than fast.  But with that said, I'm not overly concerned that this patch gets committed now.  We could keep it in our back pocket to haul out if and when someone actually reports the problem.  But then again, why wait?
Comment 6 Julian Seward 2011-07-06 07:27:40 UTC
Accurate is good, but so is not being excessively expensive.  Try
writing a test program that does a billion frips etc in a loop.  I
would not be surprised to find that the slowdown running even with
--tool=none is around 100x.  FWIW, I usually aim for a
native-to-"none" slowdown of around 5x, and even that's too slow.

I would be more convinced if the implementation was more efficient.
As it stands it is going to generate tons of IR which the IR optimiser
has no hope of folding out, because it has no idea (at JIT time) if
the value in question is or isn't a NaN.

One point is that mkAND1 and mkOR1, in your implementation and helper
functions (is_NaN, is_Zero, etc) generates truly bad code.  Try
running the abovementioned test program with "--tool=none
--profile-flags=10001100".  One trick that can help is to redo them so
that they compute a 32-bit value which is zero to denote false and any
other value to denote true, and see if you can do it just with 32-bit
and/or/xor/not/shifts/constants.  Try to avoid Mux0X, since both
arms are computed even though one will not be used -- it is not lazy
like the C "? :" operator is.  With a bit of ingenuity it might be
possible to come up with a sequence of simple 32-bit and 64-bit
operations which compute a value which is either SNAN_MASK
or zero, which you can indiscriminately Xor64 with frD to get the
final result.
Comment 7 Maynard Johnson 2011-07-08 14:08:04 UTC
OK, I'll look into your suggestions in comment #6.  But for now, I'd like to lower the priority of this bug (I don't seem to have the capability to edit that field) so I can focus on the more important work of finishing up the POWER7 (ISA 2.06) support (i.e., stage 2 patch submitted via bug #276784; and the 3rd and final stage, which is under development).  Thanks.
Comment 8 Julian Seward 2011-07-24 14:44:07 UTC
Change prio to LOW as per comment #7.