Summary: | Add support for EXTRACTPS SSE 4.1 instruction in libVEX | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Veselin Georgiev <anrieff> |
Component: | vex | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | NOR | ||
Version: | 3.7 SVN | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | All | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
extractps addition to libvex
extractps testing utility A patch against VEX |
Description
Veselin Georgiev
2010-12-05 03:40:05 UTC
Fixed, vex r2076. Thanks for the patch. I committed something a bit simpler because I think putIReg32(...) generates the correct semantics all the time. Created attachment 56144 [details]
extractps testing utility
Created attachment 56145 [details]
A patch against VEX
(In reply to comment #1) > Fixed, vex r2076. Thanks for the patch. I committed > something a bit simpler because I think putIReg32(...) > generates the correct semantics all the time. Hello again, thanks for including this. Unfortunately, I still think that the code now in VEX is incomplete, as it doesn't catch the 64-bit destination-operand case. I updated the to the latest SVN and, when running my extractps-testing tool under valgrind, it fails with: vex amd64->IR: unhandled instruction bytes: 0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0 0x48 ==4999== valgrind: Unrecognised instruction at address 0x10000092c. = ... This is the 64-bit case (e.g., "extractps $0xf0, %%xmm7, %%rax"). The 32-bit cases work fine though. Also, I don't know why, but under 64-bit Linux the code currently in VEX also works flawlessly. So I'm only experiencing the problem under Apple/64-bit. I'm attaching the tool I use for testing (to be compiled with any 64-bit gcc). I'm also attaching a patch against the current code in VEX that fixes the 64-bit problem. (In reply to comment #4) > vex amd64->IR: unhandled instruction bytes: 0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0 > 0x48 > Also, I don't know why, but under 64-bit Linux the code currently in VEX also > works flawlessly. It's because the Apple assembler inserts a redundant REX.W prefix byte. Compare the bytes in the failure message 0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0 to objdump of the Linux binary 4006a1: 66 0f 3a 17 f8 f0 extractps $0xf0,%xmm7,%eax there is no 48 after the 66 here. Instead of the patch in comment 3, can you try this? It is the same as yours, except it also allows a redundant REX.W==1 for the memory case, which I think would be OK. Index: VEX/priv/guest_amd64_toIR.c =================================================================== --- VEX/priv/guest_amd64_toIR.c (revision 2079) +++ VEX/priv/guest_amd64_toIR.c (working copy) @@ -14649,7 +14649,7 @@ identical to PEXTRD, except that REX.W appears to be ignored. */ if ( have66noF2noF3( pfx ) - && sz == 2 /* REX.W == 0; perhaps too strict? */ + && (sz == 2 || /* ignore redundant REX.W */ sz == 8) && insn[0] == 0x0F && insn[1] == 0x3A && insn[2] == 0x17 ) { Int imm8_10; (In reply to comment #5) > (In reply to comment #4) > > > vex amd64->IR: unhandled instruction bytes: 0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0 > > 0x48 > > Also, I don't know why, but under 64-bit Linux the code currently in VEX also > > works flawlessly. > > It's because the Apple assembler inserts a redundant REX.W prefix > byte. Compare the bytes in the failure message > > 0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0 > > to objdump of the Linux binary > > 4006a1: 66 0f 3a 17 f8 f0 extractps $0xf0,%xmm7,%eax > > there is no 48 after the 66 here. > > > Instead of the patch in comment 3, can you try this? It is the > same as yours, except it also allows a redundant REX.W==1 for the > memory case, which I think would be OK. > > Index: VEX/priv/guest_amd64_toIR.c > =================================================================== > --- VEX/priv/guest_amd64_toIR.c (revision 2079) > +++ VEX/priv/guest_amd64_toIR.c (working copy) > @@ -14649,7 +14649,7 @@ > identical to PEXTRD, except that REX.W appears to be ignored. > */ > if ( have66noF2noF3( pfx ) > - && sz == 2 /* REX.W == 0; perhaps too strict? */ > + && (sz == 2 || /* ignore redundant REX.W */ sz == 8) > && insn[0] == 0x0F && insn[1] == 0x3A && insn[2] == 0x17 ) { > > Int imm8_10; Ah! Of course, you are right. I tried your patch - it works now under OS X. I knew it was strange to get different results between OS X and Linux... Followup fixed committed as vex r2081. |