Bug 258870

Summary:	Add support for EXTRACTPS SSE 4.1 instruction in libVEX
Product:	[Developer tools] valgrind	Reporter:	Veselin Georgiev <anrieff>
Component:	vex	Assignee:	Julian Seward <jseward>
Status:	RESOLVED FIXED
Severity:	normal
Priority:	NOR
Version:	3.7 SVN
Target Milestone:	---
Platform:	Compiled Sources
OS:	All
Latest Commit:		Version Fixed In:
Sentry Crash Report:
Attachments:	extractps addition to libvex extractps testing utility A patch against VEX

Description Veselin Georgiev 2010-12-05 03:40:05 UTC

Created attachment 54129 [details]
extractps addition to libvex

Hello,

I am new to this, so please be lenient.

I've implemented the extractps instruction into libvex out of necessity.
I was hunting down a bug in a plugin for Autodesk Maya 2011 for Mac OS X on a Snow Leopard machine. However, while the application seems to start up under valgrind, at some point it halts because of an unhandled instruction, which turned out to be EXTRACTPS. So I got the SVN code and confirmed there is no support for the instruction in libVEX and so I investigated whether I could implement it. It turned out being easy, since the instruction is all very similar to the already present PEXTRD (the only difference between them is that PEXTRD cannot have a 64-bit destination register (with REX.W on, it becomes PEXTRQ); however, EXTRACTPS can output to a 64-bit general purpose register in which case the upper 32bits of the destination are zeroed - see http://www.intel.com/technology/itj/2008/v12i3/3-paper/4-SSE.htm ).

So at this point I have:

* (Probably) A working implementation of EXTRACTPS in libVEX; it borrows code from PEXTRD and handles the 64-bit case. See the attached .diff file.
* A custom test program that is not integrated in any way into the valgrind's test system. I didn't figure out how to run the tests anyway. But I've written this small program which issues EXTACTPS'es with all kinds of parameters and assert()s validity of outputs (tests all output options - r32, r64 and mem32). With real hardware no assert is triggered, same for valgrind, so it seems to work. Maya 2011 also ran after fine with this fix.

So I'm attaching this addition to libVEX here so you can review it. Also, can you point me how can I produce a test case for EXTRACTPS that fits into valgrind nicely? I can also give you my test program so you can adapt it, but that could be on Monday since I forgot it at work.

Comment 1 Julian Seward 2011-01-17 11:40:47 UTC

Fixed, vex r2076.  Thanks for the patch.  I committed 
something a bit simpler because I think putIReg32(...)
generates the correct semantics all the time.

Comment 2 Veselin Georgiev 2011-01-17 20:24:14 UTC

Created attachment 56144 [details]
extractps testing utility

Comment 3 Veselin Georgiev 2011-01-17 20:25:05 UTC

Created attachment 56145 [details]
A patch against VEX

Comment 4 Veselin Georgiev 2011-01-17 20:25:58 UTC

(In reply to comment #1)
> Fixed, vex r2076.  Thanks for the patch.  I committed 
> something a bit simpler because I think putIReg32(...)
> generates the correct semantics all the time.

Hello again,

thanks for including this. Unfortunately, I still think that the code now in VEX is incomplete, as it doesn't catch the 64-bit destination-operand case. I updated the to the latest SVN and, when running my extractps-testing tool under valgrind, it fails with:

vex amd64->IR: unhandled instruction bytes: 0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0 0x48
==4999== valgrind: Unrecognised instruction at address 0x10000092c.
=
...

This is the 64-bit case (e.g., "extractps  $0xf0, %%xmm7, %%rax").
The 32-bit cases work fine though.
Also, I don't know why, but under 64-bit Linux the code currently in VEX also works flawlessly.
So I'm only experiencing the problem under Apple/64-bit.

I'm attaching the tool I use for testing (to be compiled with any 64-bit gcc).
I'm also attaching a patch against the current code in VEX that fixes the 64-bit problem.

Comment 5 Julian Seward 2011-01-17 23:59:22 UTC

(In reply to comment #4)

> vex amd64->IR: unhandled instruction bytes: 0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0
> 0x48
> Also, I don't know why, but under 64-bit Linux the code currently in VEX also
> works flawlessly.

It's because the Apple assembler inserts a redundant REX.W prefix
byte.  Compare the bytes in the failure message

   0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0

to objdump of the Linux binary

  4006a1:	66 0f 3a 17 f8 f0    	extractps $0xf0,%xmm7,%eax

there is no 48 after the 66 here.


Instead of the patch in comment 3, can you try this?  It is the
same as yours, except it also allows a redundant REX.W==1 for the
memory case, which I think would be OK.

Index: VEX/priv/guest_amd64_toIR.c
===================================================================
--- VEX/priv/guest_amd64_toIR.c	(revision 2079)
+++ VEX/priv/guest_amd64_toIR.c	(working copy)
@@ -14649,7 +14649,7 @@
       identical to PEXTRD, except that REX.W appears to be ignored.
    */
    if ( have66noF2noF3( pfx ) 
-        && sz == 2  /* REX.W == 0; perhaps too strict? */
+        && (sz == 2 || /* ignore redundant REX.W */ sz == 8)
         && insn[0] == 0x0F && insn[1] == 0x3A && insn[2] == 0x17 ) {
 
       Int imm8_10;

Comment 6 Veselin Georgiev 2011-01-18 11:53:27 UTC

(In reply to comment #5)
> (In reply to comment #4)
> 
> > vex amd64->IR: unhandled instruction bytes: 0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0
> > 0x48
> > Also, I don't know why, but under 64-bit Linux the code currently in VEX also
> > works flawlessly.
> 
> It's because the Apple assembler inserts a redundant REX.W prefix
> byte.  Compare the bytes in the failure message
> 
>    0x66 0x48 0xF 0x3A 0x17 0xF8 0xF0
> 
> to objdump of the Linux binary
> 
>   4006a1:    66 0f 3a 17 f8 f0        extractps $0xf0,%xmm7,%eax
> 
> there is no 48 after the 66 here.
> 
> 
> Instead of the patch in comment 3, can you try this?  It is the
> same as yours, except it also allows a redundant REX.W==1 for the
> memory case, which I think would be OK.
> 
> Index: VEX/priv/guest_amd64_toIR.c
> ===================================================================
> --- VEX/priv/guest_amd64_toIR.c    (revision 2079)
> +++ VEX/priv/guest_amd64_toIR.c    (working copy)
> @@ -14649,7 +14649,7 @@
>        identical to PEXTRD, except that REX.W appears to be ignored.
>     */
>     if ( have66noF2noF3( pfx ) 
> -        && sz == 2  /* REX.W == 0; perhaps too strict? */
> +        && (sz == 2 || /* ignore redundant REX.W */ sz == 8)
>          && insn[0] == 0x0F && insn[1] == 0x3A && insn[2] == 0x17 ) {
> 
>        Int imm8_10;

Ah! Of course, you are right. I tried your patch - it works now under OS X.
I knew it was strange to get different results between OS X and Linux...

Comment 7 Julian Seward 2011-01-19 13:24:22 UTC

Followup fixed committed as vex r2081.