124499 – amd64->IR: unhandled instruction bytes: 0xF 0xE 0x48 0x85 (femms)

Bug 124499 - amd64->IR: unhandled instruction bytes: 0xF 0xE 0x48 0x85 (femms)

Summary: amd64->IR: unhandled instruction bytes: 0xF 0xE 0x48 0x85 (femms)

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (other bugs)
Version First Reported In:	3.1.1
Platform:	Compiled Sources Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-03-29 16:30 UTC by Joost VandeVondele
Modified:	2006-04-14 00:08 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Joost VandeVondele 2006-03-29 16:30:00 UTC

vex amd64->IR: unhandled instruction bytes: 0xF 0xE 0x48 0x85
==21409== Your program just tried to execute an instruction that Valgrind
...
==21409== Process terminating with default action of signal 4 (SIGILL)
==21409==  Illegal opcode at address 0x4B41300
==21409==    at 0x4B41300: dcopy_k (in
/users/vondele/GOTOBLAS/libgoto_opteron64p-r1.00.so)

till that part of the execution there have been no other warnings, I'm wondering
if this highly optimised library has a bug or uses some rare instructions.

Comment 1 Tom Hughes 2006-03-29 16:38:09 UTC

That's an FEMMS instruction which doesn't seem to be supported in either x86 or amd64 mode at the moment so it must be fairly unusual (it's an MMX Fast Exit Multimedia State instruction).

Comment 2 Tom Hughes 2006-03-29 16:41:00 UTC

Ah, it is actually officially a 3DNow! instruction (ie an AMD extension to MMX) and valgrind has never supported 3DNow instructions. The ordinary EMMS instruction (0xF 0x77) is supported by both the x86 and amd64 backends.

According to the amd64 manual, FEMMS and EMMS are identical - the FEMMS instruction is only supported for backwards compatability with older AMD processors where it was presumably faster for some reason.

Comment 3 Julian Seward 2006-04-12 19:37:35 UTC

Joost, can you try the following patch?
Index: priv/guest-amd64/toIR.c
===================================================================
--- priv/guest-amd64/toIR.c     (revision 1602)
+++ priv/guest-amd64/toIR.c     (working copy)
@@ -13759,11 +13759,12 @@
          break;
       }

+      case 0x0E: /* FEMMS */
       case 0x77: /* EMMS */
          if (sz != 4)
             goto decode_failure;
          do_EMMS_preamble();
-         DIP("emms\n");
+         DIP("{f}emms\n");
          break;

       /* =-=-=-=-=-=-=-=-=- unimp2 =-=-=-=-=-=-=-=-=-=-= */

Comment 4 Joost VandeVondele 2006-04-12 20:08:30 UTC

yes, this works fine. 

However, a bit later I run into the following:

vex amd64->IR: unhandled instruction bytes: 0x66 0x4C 0xF 0x50

in the same library:

==14700==  Illegal opcode at address 0x4B2D31F
==14700==    at 0x4B2D31F: idamax_ (in /users/vondele/GOTOBLAS/libgoto_opteron64p-r1.00.so)

Should I open another PR for this ?

Comment 5 Julian Seward 2006-04-12 21:06:11 UTC

> vex amd64->IR: unhandled instruction bytes: 0x66 0x4C 0xF 0x50

I'm not sure what that is.  66 0F 50 is MOVMSKPD, which V supports,
and 4C is a valid amd64 prefix byte, and looking at V's insn decoder
logic I think it should have accepted it.  So am a bit mystified.

Can you use objdump -d on the .so to find out what it really is?
(if you can send the few insns before and after it too, so much the
better).

Comment 6 Joost VandeVondele 2006-04-12 21:16:37 UTC

I guess this is the bit you need, it is the right function and that instruction is there. Let me know if you need more (about 300000 lines) :

   102cf:       90                      nop
   102d0:       0f 18 8e 00 08 00 00    prefetcht0 0x800(%rsi)
   102d7:       66 0f 28 4e 00          movapd 0x0(%rsi),%xmm1
   102dc:       66 41 0f 54 cf          andpd  %xmm15,%xmm1
   102e1:       66 0f c2 c8 00          cmpeqpd %xmm0,%xmm1
   102e6:       66 0f 28 5e 10          movapd 0x10(%rsi),%xmm3
   102eb:       66 41 0f 54 df          andpd  %xmm15,%xmm3
   102f0:       66 0f c2 d8 00          cmpeqpd %xmm0,%xmm3
   102f5:       66 0f 28 6e 20          movapd 0x20(%rsi),%xmm5
   102fa:       66 41 0f 54 ef          andpd  %xmm15,%xmm5
   102ff:       66 0f c2 e8 00          cmpeqpd %xmm0,%xmm5
   10304:       66 0f 28 7e 30          movapd 0x30(%rsi),%xmm7
   10309:       66 41 0f 54 ff          andpd  %xmm15,%xmm7
   1030e:       66 0f c2 f8 00          cmpeqpd %xmm0,%xmm7
   10313:       66 0f 56 cb             orpd   %xmm3,%xmm1
   10317:       66 0f 56 ef             orpd   %xmm7,%xmm5
   1031b:       66 0f 56 cd             orpd   %xmm5,%xmm1
   1031f:       66 4c 0f 50 d9          rex64X movmskpd %xmm1,%r11d
   10324:       49 f7 c3 03 00 00 00    test   $0x3,%r11
   1032b:       75 13                   jne    10340 <idamax_+0x2d0>
   1032d:       48 83 c6 40             add    $0x40,%rsi
   10331:       48 83 c0 08             add    $0x8,%rax
   10335:       49 ff c8                dec    %r8
   10338:       7f 96                   jg     102d0 <idamax_+0x260>
   1033a:       e9 b9 00 00 00          jmpq   103f8 <idamax_+0x388>
   1033f:       90                      nop

Comment 7 Julian Seward 2006-04-13 01:12:04 UTC

>    1031f:       66 4c 0f 50 d9          rex64X movmskpd %xmm1,%r11d

It seems to me this instruction has REX.W redundantly set to 1 
(hence giving 4c rather than 44) and this is fooling V's instruction
decoder.

Find this in VEX/priv/guest-amd64/toIR.c

   /* 66 0F 50 = MOVMSKPD - move 2 sign bits from 2 x F64 in xmm(E) to
      2 lowest bits of ireg(G) */
   if (have66noF2noF3(pfx) && sz == 2 
       && insn[0] == 0x0F && insn[1] == 0x50) {

(maybe around line 10342), and change  sz == 2  to  (sz == 2 || sz == 8),
rebuild entire system, and try again.

Comment 8 Joost VandeVondele 2006-04-13 09:40:10 UTC

> change  sz == 2  to  (sz == 2 || sz == 8)

yes this seems to work as well. I seems that I can now run my code with this library in place. Thanks!

I'll try to run this lib's testsuite under valgrind to see if there are any further issues.

Comment 9 Joost VandeVondele 2006-04-13 14:54:28 UTC

There is one additional issue, similar to the previous one. I'm now getting:

vex amd64->IR: unhandled instruction bytes: 0x4C 0xF 0x50 0xD9
at 0x4B2EB63: icamax_ (in /users/vondele/GOTOBLAS/libgoto_opteron64p-r1.00.so)

which I guess is here:
   11b30:       0f 18 8e 00 04 00 00    prefetcht0 0x400(%rsi)
   11b37:       f2 0f 10 4e 00          movsd  0x0(%rsi),%xmm1
   11b3c:       0f 16 4e 08             movhps 0x8(%rsi),%xmm1
   11b40:       f2 0f 10 56 10          movsd  0x10(%rsi),%xmm2
   11b45:       0f 16 56 18             movhps 0x18(%rsi),%xmm2
   11b49:       0f 28 d9                movaps %xmm1,%xmm3
   11b4c:       0f c6 ca 88             shufps $0x88,%xmm2,%xmm1
   11b50:       0f c6 da dd             shufps $0xdd,%xmm2,%xmm3
   11b54:       41 0f 54 cf             andps  %xmm15,%xmm1
   11b58:       41 0f 54 df             andps  %xmm15,%xmm3
   11b5c:       0f 58 cb                addps  %xmm3,%xmm1
   11b5f:       0f c2 c8 00             cmpeqps %xmm0,%xmm1
   11b63:       4c 0f 50 d9             rex64X movmskps %xmm1,%r11d
   11b67:       49 f7 c3 0f 00 00 00    test   $0xf,%r11
   11b6e:       75 20                   jne    11b90 <icamax_+0x220>
   11b70:       48 83 c6 20             add    $0x20,%rsi
   11b74:       48 83 c0 04             add    $0x4,%rax
   11b78:       49 ff c8                dec    %r8
   11b7b:       7f b3                   jg     11b30 <icamax_+0x1c0>
   11b7d:       e9 9e 00 00 00          jmpq   11c20 <icamax_+0x2b0>

Comment 10 Julian Seward 2006-04-13 15:20:29 UTC

>    11b63:       4c 0f 50 d9             rex64X movmskps %xmm1,%r11d

Find the movmskps case in guest-amd64/toIR.c (line 8841?) and change
'sz == 4' to '(sz == 4 || sz == 8)'.  Does that work?

Comment 11 Joost VandeVondele 2006-04-13 15:35:26 UTC

Yes, that fixes the last issue, the testsuite yields a clean run.

Comment 12 Julian Seward 2006-04-14 00:08:04 UTC

Fixed (vex r1604).