Bug 124499

Summary: amd64->IR: unhandled instruction bytes: 0xF 0xE 0x48 0x85 (femms)
Product: [Developer tools] valgrind Reporter: Joost VandeVondele <Joost.VandeVondele>
Component: memcheckAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version: 3.1.1   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Joost VandeVondele 2006-03-29 16:30:00 UTC
vex amd64->IR: unhandled instruction bytes: 0xF 0xE 0x48 0x85
==21409== Your program just tried to execute an instruction that Valgrind
...
==21409== Process terminating with default action of signal 4 (SIGILL)
==21409==  Illegal opcode at address 0x4B41300
==21409==    at 0x4B41300: dcopy_k (in
/users/vondele/GOTOBLAS/libgoto_opteron64p-r1.00.so)

till that part of the execution there have been no other warnings, I'm wondering
if this highly optimised library has a bug or uses some rare instructions.
Comment 1 Tom Hughes 2006-03-29 16:38:09 UTC
That's an FEMMS instruction which doesn't seem to be supported in either x86 or amd64 mode at the moment so it must be fairly unusual (it's an MMX Fast Exit Multimedia State instruction).
Comment 2 Tom Hughes 2006-03-29 16:41:00 UTC
Ah, it is actually officially a 3DNow! instruction (ie an AMD extension to MMX) and valgrind has never supported 3DNow instructions. The ordinary EMMS instruction (0xF 0x77) is supported by both the x86 and amd64 backends.

According to the amd64 manual, FEMMS and EMMS are identical - the FEMMS instruction is only supported for backwards compatability with older AMD processors where it was presumably faster for some reason.
Comment 3 Julian Seward 2006-04-12 19:37:35 UTC
Joost, can you try the following patch?
Index: priv/guest-amd64/toIR.c
===================================================================
--- priv/guest-amd64/toIR.c     (revision 1602)
+++ priv/guest-amd64/toIR.c     (working copy)
@@ -13759,11 +13759,12 @@
          break;
       }

+      case 0x0E: /* FEMMS */
       case 0x77: /* EMMS */
          if (sz != 4)
             goto decode_failure;
          do_EMMS_preamble();
-         DIP("emms\n");
+         DIP("{f}emms\n");
          break;

       /* =-=-=-=-=-=-=-=-=- unimp2 =-=-=-=-=-=-=-=-=-=-= */
Comment 4 Joost VandeVondele 2006-04-12 20:08:30 UTC
yes, this works fine. 

However, a bit later I run into the following:

vex amd64->IR: unhandled instruction bytes: 0x66 0x4C 0xF 0x50

in the same library:

==14700==  Illegal opcode at address 0x4B2D31F
==14700==    at 0x4B2D31F: idamax_ (in /users/vondele/GOTOBLAS/libgoto_opteron64p-r1.00.so)

Should I open another PR for this ?
Comment 5 Julian Seward 2006-04-12 21:06:11 UTC
> vex amd64->IR: unhandled instruction bytes: 0x66 0x4C 0xF 0x50

I'm not sure what that is.  66 0F 50 is MOVMSKPD, which V supports,
and 4C is a valid amd64 prefix byte, and looking at V's insn decoder
logic I think it should have accepted it.  So am a bit mystified.

Can you use objdump -d on the .so to find out what it really is?
(if you can send the few insns before and after it too, so much the
better).
Comment 6 Joost VandeVondele 2006-04-12 21:16:37 UTC
I guess this is the bit you need, it is the right function and that instruction is there. Let me know if you need more (about 300000 lines) :

   102cf:       90                      nop
   102d0:       0f 18 8e 00 08 00 00    prefetcht0 0x800(%rsi)
   102d7:       66 0f 28 4e 00          movapd 0x0(%rsi),%xmm1
   102dc:       66 41 0f 54 cf          andpd  %xmm15,%xmm1
   102e1:       66 0f c2 c8 00          cmpeqpd %xmm0,%xmm1
   102e6:       66 0f 28 5e 10          movapd 0x10(%rsi),%xmm3
   102eb:       66 41 0f 54 df          andpd  %xmm15,%xmm3
   102f0:       66 0f c2 d8 00          cmpeqpd %xmm0,%xmm3
   102f5:       66 0f 28 6e 20          movapd 0x20(%rsi),%xmm5
   102fa:       66 41 0f 54 ef          andpd  %xmm15,%xmm5
   102ff:       66 0f c2 e8 00          cmpeqpd %xmm0,%xmm5
   10304:       66 0f 28 7e 30          movapd 0x30(%rsi),%xmm7
   10309:       66 41 0f 54 ff          andpd  %xmm15,%xmm7
   1030e:       66 0f c2 f8 00          cmpeqpd %xmm0,%xmm7
   10313:       66 0f 56 cb             orpd   %xmm3,%xmm1
   10317:       66 0f 56 ef             orpd   %xmm7,%xmm5
   1031b:       66 0f 56 cd             orpd   %xmm5,%xmm1
   1031f:       66 4c 0f 50 d9          rex64X movmskpd %xmm1,%r11d
   10324:       49 f7 c3 03 00 00 00    test   $0x3,%r11
   1032b:       75 13                   jne    10340 <idamax_+0x2d0>
   1032d:       48 83 c6 40             add    $0x40,%rsi
   10331:       48 83 c0 08             add    $0x8,%rax
   10335:       49 ff c8                dec    %r8
   10338:       7f 96                   jg     102d0 <idamax_+0x260>
   1033a:       e9 b9 00 00 00          jmpq   103f8 <idamax_+0x388>
   1033f:       90                      nop
Comment 7 Julian Seward 2006-04-13 01:12:04 UTC
>    1031f:       66 4c 0f 50 d9          rex64X movmskpd %xmm1,%r11d

It seems to me this instruction has REX.W redundantly set to 1 
(hence giving 4c rather than 44) and this is fooling V's instruction
decoder.

Find this in VEX/priv/guest-amd64/toIR.c

   /* 66 0F 50 = MOVMSKPD - move 2 sign bits from 2 x F64 in xmm(E) to
      2 lowest bits of ireg(G) */
   if (have66noF2noF3(pfx) && sz == 2 
       && insn[0] == 0x0F && insn[1] == 0x50) {

(maybe around line 10342), and change  sz == 2  to  (sz == 2 || sz == 8),
rebuild entire system, and try again.
Comment 8 Joost VandeVondele 2006-04-13 09:40:10 UTC
> change  sz == 2  to  (sz == 2 || sz == 8)

yes this seems to work as well. I seems that I can now run my code with this library in place. Thanks!

I'll try to run this lib's testsuite under valgrind to see if there are any further issues.
Comment 9 Joost VandeVondele 2006-04-13 14:54:28 UTC
There is one additional issue, similar to the previous one. I'm now getting:

vex amd64->IR: unhandled instruction bytes: 0x4C 0xF 0x50 0xD9
at 0x4B2EB63: icamax_ (in /users/vondele/GOTOBLAS/libgoto_opteron64p-r1.00.so)

which I guess is here:
   11b30:       0f 18 8e 00 04 00 00    prefetcht0 0x400(%rsi)
   11b37:       f2 0f 10 4e 00          movsd  0x0(%rsi),%xmm1
   11b3c:       0f 16 4e 08             movhps 0x8(%rsi),%xmm1
   11b40:       f2 0f 10 56 10          movsd  0x10(%rsi),%xmm2
   11b45:       0f 16 56 18             movhps 0x18(%rsi),%xmm2
   11b49:       0f 28 d9                movaps %xmm1,%xmm3
   11b4c:       0f c6 ca 88             shufps $0x88,%xmm2,%xmm1
   11b50:       0f c6 da dd             shufps $0xdd,%xmm2,%xmm3
   11b54:       41 0f 54 cf             andps  %xmm15,%xmm1
   11b58:       41 0f 54 df             andps  %xmm15,%xmm3
   11b5c:       0f 58 cb                addps  %xmm3,%xmm1
   11b5f:       0f c2 c8 00             cmpeqps %xmm0,%xmm1
   11b63:       4c 0f 50 d9             rex64X movmskps %xmm1,%r11d
   11b67:       49 f7 c3 0f 00 00 00    test   $0xf,%r11
   11b6e:       75 20                   jne    11b90 <icamax_+0x220>
   11b70:       48 83 c6 20             add    $0x20,%rsi
   11b74:       48 83 c0 04             add    $0x4,%rax
   11b78:       49 ff c8                dec    %r8
   11b7b:       7f b3                   jg     11b30 <icamax_+0x1c0>
   11b7d:       e9 9e 00 00 00          jmpq   11c20 <icamax_+0x2b0>
Comment 10 Julian Seward 2006-04-13 15:20:29 UTC
>    11b63:       4c 0f 50 d9             rex64X movmskps %xmm1,%r11d

Find the movmskps case in guest-amd64/toIR.c (line 8841?) and change
'sz == 4' to '(sz == 4 || sz == 8)'.  Does that work?
Comment 11 Joost VandeVondele 2006-04-13 15:35:26 UTC
Yes, that fixes the last issue, the testsuite yields a clean run.
Comment 12 Julian Seward 2006-04-14 00:08:04 UTC
Fixed (vex r1604).