Created attachment 61226 [details] Patch to fix this bug When a SNAN argument is given to frim, the VEX implementation for this instruction yields an incorrect result. The Power ISA document has an appendix that documents the floating point rounding algorithm (in ISA 2.06, "A.4 Floating-Point Round to Integer Model"). In this appendix, it describes that SNAN inputs must be treated specially. The attached patch modifies the VEX implementation accordingly and adds a SNAN testcase.
I took the blinders off and realized that this bug affects ALL of the floating point round operations (frin, friz, frip, and frim). I'll make a new patch and attach it later.
Comment on attachment 61226 [details] Patch to fix this bug This patch is not complete (see comment #1) since we need to fix the problem for ALL of the floating point round ops.
Created attachment 61257 [details] Version 2 of the patch to fix this bug
This is going to give a substantial performance hit for those instructions. Are there any applications for which this inaccuracy is a problem?
(In reply to comment #4) > This is going to give a substantial performance hit for those > instructions. Are there any applications for which this inaccuracy > is a problem? Not that I know. I became aware of the problem when I was implementing the new VSX xsrdpim and xsrdpip instructions (round to DP integer). My testcase was more exhaustive than the fri* tests, so it encountered the problem with SNAN inputs. IMHO, I think it's better for Valgrind to be accurate than fast. But with that said, I'm not overly concerned that this patch gets committed now. We could keep it in our back pocket to haul out if and when someone actually reports the problem. But then again, why wait?
Accurate is good, but so is not being excessively expensive. Try writing a test program that does a billion frips etc in a loop. I would not be surprised to find that the slowdown running even with --tool=none is around 100x. FWIW, I usually aim for a native-to-"none" slowdown of around 5x, and even that's too slow. I would be more convinced if the implementation was more efficient. As it stands it is going to generate tons of IR which the IR optimiser has no hope of folding out, because it has no idea (at JIT time) if the value in question is or isn't a NaN. One point is that mkAND1 and mkOR1, in your implementation and helper functions (is_NaN, is_Zero, etc) generates truly bad code. Try running the abovementioned test program with "--tool=none --profile-flags=10001100". One trick that can help is to redo them so that they compute a 32-bit value which is zero to denote false and any other value to denote true, and see if you can do it just with 32-bit and/or/xor/not/shifts/constants. Try to avoid Mux0X, since both arms are computed even though one will not be used -- it is not lazy like the C "? :" operator is. With a bit of ingenuity it might be possible to come up with a sequence of simple 32-bit and 64-bit operations which compute a value which is either SNAN_MASK or zero, which you can indiscriminately Xor64 with frD to get the final result.
OK, I'll look into your suggestions in comment #6. But for now, I'd like to lower the priority of this bug (I don't seem to have the capability to edit that field) so I can focus on the more important work of finishing up the POWER7 (ISA 2.06) support (i.e., stage 2 patch submitted via bug #276784; and the 3rd and final stage, which is under development). Thanks.
Change prio to LOW as per comment #7.