Bug 397089

Summary: [PATCH] Incorrect decoding of three-register vmovss/vmovsd opcode 11h
Product: [Developer tools] valgrind Reporter: Tomas Trnka <tomastrnka>
Component: vexAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version First Reported In: 3.13.0   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: Proposed patch

Description Tomas Trnka 2018-08-02 12:45:29 UTC
Created attachment 114267 [details]
Proposed patch

vmovss/vmovsd xmm1, xmm2, xmm3 using opcode 11h was decoded exactly the same as opcode 10h, swapping the xmm1 and xmm3 arguments. This leads to all kinds of floating point software going crazy under Valgrind because of assignments suddenly working in the wrong direction.
Comment 1 Julian Seward 2018-09-10 10:08:38 UTC
The really worrying thing here is not that there is a bug, but that
the test suite failed to detect it.  Investigating.
Comment 2 Julian Seward 2018-09-10 10:47:44 UTC
Tomas: the test suite (none/tests/amd64/avx-1.c) contains only this

GEN_test_Ronly(VMOVSS_REG_XMM, "vmovss %%xmm9, %%xmm7, %%xmm8")

and I don't know whether that is the 10h or 11h instruction since
the Intel docs give the same assembly syntax for both :-/  Obviously
it can't be both, with the effect that the Vex implementation of one
or the other must always have been un-tested.

Do you know if there is a way to write both variants, using the
GNU as syntax?  Do you maybe have a .o or .so file that uses both,
that you can objdump -d, to see if there is a difference in the
assembly syntax?
Comment 3 Tomas Trnka 2018-09-10 11:22:26 UTC
I don't know if there's any special assembly syntax to force one or the other. I wouldn't be surprised if there's not, because these three-register variants are functionally equivalent (except for the direction).

I have found that my GNU as (2.29.1) reliably picks opcode 11h when the first operand is xmm8+ and the third operand is xmm[0-7]. All other combinations seem to use opcode 10h. (No idea why. I just bruteforced all combinations). This is probably the reason why the three-register opcode 11h version is somewhat less common (and thus why most of the FP code out there seems to mostly work).

   0:   c5 6b 10 c1             vmovsd %xmm1,%xmm2,%xmm8

   0:   c5 6b 11 c1             vmovsd %xmm8,%xmm2,%xmm1
Comment 4 Julian Seward 2018-09-14 11:22:47 UTC
Landed as 27194eb985b22a66b439925e821a41800bcb881e.  Thanks for the patch.
Comment 5 Julian Seward 2018-09-14 11:26:31 UTC
Test cases landed as 529436ff596203a451cf525824b3538d006de0bb.