Created attachment 114267 [details] Proposed patch vmovss/vmovsd xmm1, xmm2, xmm3 using opcode 11h was decoded exactly the same as opcode 10h, swapping the xmm1 and xmm3 arguments. This leads to all kinds of floating point software going crazy under Valgrind because of assignments suddenly working in the wrong direction.
The really worrying thing here is not that there is a bug, but that the test suite failed to detect it. Investigating.
Tomas: the test suite (none/tests/amd64/avx-1.c) contains only this GEN_test_Ronly(VMOVSS_REG_XMM, "vmovss %%xmm9, %%xmm7, %%xmm8") and I don't know whether that is the 10h or 11h instruction since the Intel docs give the same assembly syntax for both :-/ Obviously it can't be both, with the effect that the Vex implementation of one or the other must always have been un-tested. Do you know if there is a way to write both variants, using the GNU as syntax? Do you maybe have a .o or .so file that uses both, that you can objdump -d, to see if there is a difference in the assembly syntax?
I don't know if there's any special assembly syntax to force one or the other. I wouldn't be surprised if there's not, because these three-register variants are functionally equivalent (except for the direction). I have found that my GNU as (2.29.1) reliably picks opcode 11h when the first operand is xmm8+ and the third operand is xmm[0-7]. All other combinations seem to use opcode 10h. (No idea why. I just bruteforced all combinations). This is probably the reason why the three-register opcode 11h version is somewhat less common (and thus why most of the FP code out there seems to mostly work). 0: c5 6b 10 c1 vmovsd %xmm1,%xmm2,%xmm8 0: c5 6b 11 c1 vmovsd %xmm8,%xmm2,%xmm1
Landed as 27194eb985b22a66b439925e821a41800bcb881e. Thanks for the patch.
Test cases landed as 529436ff596203a451cf525824b3538d006de0bb.