Bug 258870 - Add support for EXTRACTPS SSE 4.1 instruction in libVEX
Summary: Add support for EXTRACTPS SSE 4.1 instruction in libVEX
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.7 SVN
Platform: Compiled Sources All
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-05 03:40 UTC by Veselin Georgiev
Modified: 2011-01-19 13:24 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
extractps addition to libvex (3.85 KB, patch)
2010-12-05 03:40 UTC, Veselin Georgiev
Details
extractps testing utility (3.45 KB, text/x-csrc)
2011-01-17 20:24 UTC, Veselin Georgiev
Details
A patch against VEX (527 bytes, patch)
2011-01-17 20:25 UTC, Veselin Georgiev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Veselin Georgiev 2010-12-05 03:40:05 UTC
Created attachment 54129 [details]
extractps addition to libvex

Hello,

I am new to this, so please be lenient.

I've implemented the extractps instruction into libvex out of necessity.
I was hunting down a bug in a plugin for Autodesk Maya 2011 for Mac OS X on a Snow Leopard machine. However, while the application seems to start up under valgrind, at some point it halts because of an unhandled instruction, which turned out to be EXTRACTPS. So I got the SVN code and confirmed there is no support for the instruction in libVEX and so I investigated whether I could implement it. It turned out being easy, since the instruction is all very similar to the already present PEXTRD (the only difference between them is that PEXTRD cannot have a 64-bit destination register (with REX.W on, it becomes PEXTRQ); however, EXTRACTPS can output to a 64-bit general purpose register in which case the upper 32bits of the destination are zeroed - see http://www.intel.com/technology/itj/2008/v12i3/3-paper/4-SSE.htm ).

So at this point I have:

* (Probably) A working implementation of EXTRACTPS in libVEX; it borrows code from PEXTRD and handles the 64-bit case. See the attached .diff file.
* A custom test program that is not integrated in any way into the valgrind's test system. I didn't figure out how to run the tests anyway. But I've written this small program which issues EXTACTPS'es with all kinds of parameters and assert()s validity of outputs (tests all output options - r32, r64 and mem32). With real hardware no assert is triggered, same for valgrind, so it seems to work. Maya 2011 also ran after fine with this fix.

So I'm attaching this addition to libVEX here so you can review it. Also, can you point me how can I produce a test case for EXTRACTPS that fits into valgrind nicely? I can also give you my test program so you can adapt it, but that could be on Monday since I forgot it at work.
Comment 1 Julian Seward 2011-01-17 11:40:47 UTC
Fixed, vex r2076.  Thanks for the patch.  I committed 
something a bit simpler because I think putIReg32(...)
generates the correct semantics all the time.
Comment 2 Veselin Georgiev 2011-01-17 20:24:14 UTC
Created attachment 56144 [details]
extractps testing utility
Comment 3 Veselin Georgiev 2011-01-17 20:25:05 UTC
Created attachment 56145 [details]
A patch against VEX
Comment 4 Veselin Georgiev 2011-01-17 20:25:58 UTC
(In reply to comment #1)
> Fixed, vex r2076.  Thanks for the patch.  I committed 
> something a bit simpler because I think putIReg32(...)
> generates the correct semantics all the time.

Hello again,

thanks for including this. Unfortunately, I still think that the code now in VEX is incomplete, as it doesn't catch the 64-bit destination-operand case. I updated the to the latest SVN and, when running my extractps-testing tool under valgrind, it fails with:

vex amd64->IR: unhandled instruction bytes: 0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0 0x48
==4999== valgrind: Unrecognised instruction at address 0x10000092c.
=
...

This is the 64-bit case (e.g., "extractps  $0xf0, %%xmm7, %%rax").
The 32-bit cases work fine though.
Also, I don't know why, but under 64-bit Linux the code currently in VEX also works flawlessly.
So I'm only experiencing the problem under Apple/64-bit.

I'm attaching the tool I use for testing (to be compiled with any 64-bit gcc).
I'm also attaching a patch against the current code in VEX that fixes the 64-bit problem.
Comment 5 Julian Seward 2011-01-17 23:59:22 UTC
(In reply to comment #4)

> vex amd64->IR: unhandled instruction bytes: 0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0
> 0x48
> Also, I don't know why, but under 64-bit Linux the code currently in VEX also
> works flawlessly.

It's because the Apple assembler inserts a redundant REX.W prefix
byte.  Compare the bytes in the failure message

   0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0

to objdump of the Linux binary

  4006a1:	66 0f 3a 17 f8 f0    	extractps $0xf0,%xmm7,%eax

there is no 48 after the 66 here.


Instead of the patch in comment 3, can you try this?  It is the
same as yours, except it also allows a redundant REX.W==1 for the
memory case, which I think would be OK.

Index: VEX/priv/guest_amd64_toIR.c
===================================================================
--- VEX/priv/guest_amd64_toIR.c	(revision 2079)
+++ VEX/priv/guest_amd64_toIR.c	(working copy)
@@ -14649,7 +14649,7 @@
       identical to PEXTRD, except that REX.W appears to be ignored.
    */
    if ( have66noF2noF3( pfx ) 
-        && sz == 2  /* REX.W == 0; perhaps too strict? */
+        && (sz == 2 || /* ignore redundant REX.W */ sz == 8)
         && insn[0] == 0x0F && insn[1] == 0x3A && insn[2] == 0x17 ) {
 
       Int imm8_10;
Comment 6 Veselin Georgiev 2011-01-18 11:53:27 UTC
(In reply to comment #5)
> (In reply to comment #4)
> 
> > vex amd64->IR: unhandled instruction bytes: 0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0
> > 0x48
> > Also, I don't know why, but under 64-bit Linux the code currently in VEX also
> > works flawlessly.
> 
> It's because the Apple assembler inserts a redundant REX.W prefix
> byte.  Compare the bytes in the failure message
> 
>    0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0
> 
> to objdump of the Linux binary
> 
>   4006a1:    66 0f 3a 17 f8 f0        extractps $0xf0,%xmm7,%eax
> 
> there is no 48 after the 66 here.
> 
> 
> Instead of the patch in comment 3, can you try this?  It is the
> same as yours, except it also allows a redundant REX.W==1 for the
> memory case, which I think would be OK.
> 
> Index: VEX/priv/guest_amd64_toIR.c
> ===================================================================
> --- VEX/priv/guest_amd64_toIR.c    (revision 2079)
> +++ VEX/priv/guest_amd64_toIR.c    (working copy)
> @@ -14649,7 +14649,7 @@
>        identical to PEXTRD, except that REX.W appears to be ignored.
>     */
>     if ( have66noF2noF3( pfx ) 
> -        && sz == 2  /* REX.W == 0; perhaps too strict? */
> +        && (sz == 2 || /* ignore redundant REX.W */ sz == 8)
>          && insn[0] == 0x0F && insn[1] == 0x3A && insn[2] == 0x17 ) {
> 
>        Int imm8_10;

Ah! Of course, you are right. I tried your patch - it works now under OS X.
I knew it was strange to get different results between OS X and Linux...
Comment 7 Julian Seward 2011-01-19 13:24:22 UTC
Followup fixed committed as vex r2081.