Summary: | amd64->IR: unhandled instruction bytes: 0xF 0xE 0x48 0x85 (femms) | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Joost VandeVondele <Joost.VandeVondele> |
Component: | memcheck | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | NOR | ||
Version: | 3.1.1 | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Joost VandeVondele
2006-03-29 16:30:00 UTC
That's an FEMMS instruction which doesn't seem to be supported in either x86 or amd64 mode at the moment so it must be fairly unusual (it's an MMX Fast Exit Multimedia State instruction). Ah, it is actually officially a 3DNow! instruction (ie an AMD extension to MMX) and valgrind has never supported 3DNow instructions. The ordinary EMMS instruction (0xF 0x77) is supported by both the x86 and amd64 backends. According to the amd64 manual, FEMMS and EMMS are identical - the FEMMS instruction is only supported for backwards compatability with older AMD processors where it was presumably faster for some reason. Joost, can you try the following patch? Index: priv/guest-amd64/toIR.c =================================================================== --- priv/guest-amd64/toIR.c (revision 1602) +++ priv/guest-amd64/toIR.c (working copy) @@ -13759,11 +13759,12 @@ break; } + case 0x0E: /* FEMMS */ case 0x77: /* EMMS */ if (sz != 4) goto decode_failure; do_EMMS_preamble(); - DIP("emms\n"); + DIP("{f}emms\n"); break; /* =-=-=-=-=-=-=-=-=- unimp2 =-=-=-=-=-=-=-=-=-=-= */ yes, this works fine. However, a bit later I run into the following: vex amd64->IR: unhandled instruction bytes: 0x66 0x4C 0xF 0x50 in the same library: ==14700== Illegal opcode at address 0x4B2D31F ==14700== at 0x4B2D31F: idamax_ (in /users/vondele/GOTOBLAS/libgoto_opteron64p-r1.00.so) Should I open another PR for this ? > vex amd64->IR: unhandled instruction bytes: 0x66 0x4C 0xF 0x50
I'm not sure what that is. 66 0F 50 is MOVMSKPD, which V supports,
and 4C is a valid amd64 prefix byte, and looking at V's insn decoder
logic I think it should have accepted it. So am a bit mystified.
Can you use objdump -d on the .so to find out what it really is?
(if you can send the few insns before and after it too, so much the
better).
I guess this is the bit you need, it is the right function and that instruction is there. Let me know if you need more (about 300000 lines) : 102cf: 90 nop 102d0: 0f 18 8e 00 08 00 00 prefetcht0 0x800(%rsi) 102d7: 66 0f 28 4e 00 movapd 0x0(%rsi),%xmm1 102dc: 66 41 0f 54 cf andpd %xmm15,%xmm1 102e1: 66 0f c2 c8 00 cmpeqpd %xmm0,%xmm1 102e6: 66 0f 28 5e 10 movapd 0x10(%rsi),%xmm3 102eb: 66 41 0f 54 df andpd %xmm15,%xmm3 102f0: 66 0f c2 d8 00 cmpeqpd %xmm0,%xmm3 102f5: 66 0f 28 6e 20 movapd 0x20(%rsi),%xmm5 102fa: 66 41 0f 54 ef andpd %xmm15,%xmm5 102ff: 66 0f c2 e8 00 cmpeqpd %xmm0,%xmm5 10304: 66 0f 28 7e 30 movapd 0x30(%rsi),%xmm7 10309: 66 41 0f 54 ff andpd %xmm15,%xmm7 1030e: 66 0f c2 f8 00 cmpeqpd %xmm0,%xmm7 10313: 66 0f 56 cb orpd %xmm3,%xmm1 10317: 66 0f 56 ef orpd %xmm7,%xmm5 1031b: 66 0f 56 cd orpd %xmm5,%xmm1 1031f: 66 4c 0f 50 d9 rex64X movmskpd %xmm1,%r11d 10324: 49 f7 c3 03 00 00 00 test $0x3,%r11 1032b: 75 13 jne 10340 <idamax_+0x2d0> 1032d: 48 83 c6 40 add $0x40,%rsi 10331: 48 83 c0 08 add $0x8,%rax 10335: 49 ff c8 dec %r8 10338: 7f 96 jg 102d0 <idamax_+0x260> 1033a: e9 b9 00 00 00 jmpq 103f8 <idamax_+0x388> 1033f: 90 nop > 1031f: 66 4c 0f 50 d9 rex64X movmskpd %xmm1,%r11d
It seems to me this instruction has REX.W redundantly set to 1
(hence giving 4c rather than 44) and this is fooling V's instruction
decoder.
Find this in VEX/priv/guest-amd64/toIR.c
/* 66 0F 50 = MOVMSKPD - move 2 sign bits from 2 x F64 in xmm(E) to
2 lowest bits of ireg(G) */
if (have66noF2noF3(pfx) && sz == 2
&& insn[0] == 0x0F && insn[1] == 0x50) {
(maybe around line 10342), and change sz == 2 to (sz == 2 || sz == 8),
rebuild entire system, and try again.
> change sz == 2 to (sz == 2 || sz == 8)
yes this seems to work as well. I seems that I can now run my code with this library in place. Thanks!
I'll try to run this lib's testsuite under valgrind to see if there are any further issues.
There is one additional issue, similar to the previous one. I'm now getting: vex amd64->IR: unhandled instruction bytes: 0x4C 0xF 0x50 0xD9 at 0x4B2EB63: icamax_ (in /users/vondele/GOTOBLAS/libgoto_opteron64p-r1.00.so) which I guess is here: 11b30: 0f 18 8e 00 04 00 00 prefetcht0 0x400(%rsi) 11b37: f2 0f 10 4e 00 movsd 0x0(%rsi),%xmm1 11b3c: 0f 16 4e 08 movhps 0x8(%rsi),%xmm1 11b40: f2 0f 10 56 10 movsd 0x10(%rsi),%xmm2 11b45: 0f 16 56 18 movhps 0x18(%rsi),%xmm2 11b49: 0f 28 d9 movaps %xmm1,%xmm3 11b4c: 0f c6 ca 88 shufps $0x88,%xmm2,%xmm1 11b50: 0f c6 da dd shufps $0xdd,%xmm2,%xmm3 11b54: 41 0f 54 cf andps %xmm15,%xmm1 11b58: 41 0f 54 df andps %xmm15,%xmm3 11b5c: 0f 58 cb addps %xmm3,%xmm1 11b5f: 0f c2 c8 00 cmpeqps %xmm0,%xmm1 11b63: 4c 0f 50 d9 rex64X movmskps %xmm1,%r11d 11b67: 49 f7 c3 0f 00 00 00 test $0xf,%r11 11b6e: 75 20 jne 11b90 <icamax_+0x220> 11b70: 48 83 c6 20 add $0x20,%rsi 11b74: 48 83 c0 04 add $0x4,%rax 11b78: 49 ff c8 dec %r8 11b7b: 7f b3 jg 11b30 <icamax_+0x1c0> 11b7d: e9 9e 00 00 00 jmpq 11c20 <icamax_+0x2b0> > 11b63: 4c 0f 50 d9 rex64X movmskps %xmm1,%r11d
Find the movmskps case in guest-amd64/toIR.c (line 8841?) and change
'sz == 4' to '(sz == 4 || sz == 8)'. Does that work?
Yes, that fixes the last issue, the testsuite yields a clean run. Fixed (vex r1604). |