Hi, I am using intel IPP 5.0 within this codebase a magic codesequence is available that crashes valgrind. BTW the same problem also occurs on svn trunk (haven't tried 3.2 branch) The details: vex x86->IR: unhandled instruction bytes: 0x67 0xE3 0x67 0xF ==931== valgrind: Unrecognised instruction at address 0x81DDF2D. ==931== Your program just tried to execute an instruction that Valgrind ==931== did not recognise. There are two possible reasons for this. ==931== 1. Your program has a bug and erroneously jumped to a non-code ==931== location. If you are running Memcheck and you just saw a ==931== warning about a bad jump, it's probably your program's fault. ==931== 2. The instruction is legitimate but Valgrind doesn't handle it, ==931== i.e. it's Valgrind's fault. If you think this is the case or ==931== you are not sure, please let us know and we'll try to fix it. ==931== Either way, Valgrind will now raise a SIGILL signal which will ==931== probably kill your program. ==931== ==931== Process terminating with default action of signal 4 (SIGILL) ==931== Illegal opcode at address 0x81DDF2D ==931== at 0x81DDF2D: (within some ipp program) ==931== ==931== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 31 from 1) ==931== malloc/free: in use at exit: 4,726,536 bytes in 85 blocks. ==931== malloc/free: 2,903 allocs, 2,818 frees, 5,163,866 bytes allocated. ==931== For counts of detected errors, rerun with: -v ==931== searching for pointers to 85 not-freed blocks. ==931== checked 726,116 bytes. ==931== ==931== LEAK SUMMARY: ==931== definitely lost: 68 bytes in 1 blocks. ==931== possibly lost: 4,936 bytes in 2 blocks. ==931== still reachable: 4,721,532 bytes in 82 blocks. ==931== suppressed: 0 bytes in 0 blocks. ==931== Use --leak-check=full to see details of leaked memory. (gdb) x/10xb 0x81DDF2D 0x81ddf2d <w7_ownInterpolateC_NR_G729_16s_Sfs_W7+557>: 0x67 0xe3 0x67 0x0f 0xb7 0x02 0x0f 0xb7 0x81ddf35 <w7_ownInterpolateC_NR_G729_16s_Sfs_W7+565>: 0x0e 0x6 Dump of assembler code for function w7_ownInterpolateC_NR_G729_16s_Sfs_W7: <SNIP STUFF REMOVED> 0x081ddf21 <w7_ownInterpolateC_NR_G729_16s_Sfs_W7+545>: add $0x4,%esi 0x081ddf24 <w7_ownInterpolateC_NR_G729_16s_Sfs_W7+548>: add $0x4,%edx 0x081ddf27 <w7_ownInterpolateC_NR_G729_16s_Sfs_W7+551>: add $0x4,%edi 0x081ddf2a <w7_ownInterpolateC_NR_G729_16s_Sfs_W7+554>: add $0xfffffffe,%ecx 0x081ddf2d <w7_ownInterpolateC_NR_G729_16s_Sfs_W7+557>: jcxz 0x81ddf97 <w7_ownInterpolateC_NR_G729_16s_Sfs_W7+663> 0x081ddf30 <w7_ownInterpolateC_NR_G729_16s_Sfs_W7+560>: movzwl (%edx),%eax 0x081ddf33 <w7_ownInterpolateC_NR_G729_16s_Sfs_W7+563>: movzwl (%esi),%ecx 0x081ddf36 <w7_ownInterpolateC_NR_G729_16s_Sfs_W7+566>: movd %eax,%xmm1 <SNIP STUFF REMOVED>
As far as I understand from the code now ASO (Address Size Overide) wasn't implemented for guest-x86 at all. Code below implements it, although I guess it actually is buggy. I used the sz variable but after reinspection of the code I assume that sz is actually the operand size and no variable exists for the address size. Apperantly the ASO isn't used that much ;) Current implementation would have the same effect for a OSO prefix on JCXZ as for an ASO prefix. I guess that is not the behaviour for the real CPU? BTW, as far as the licensing. I hereby waive my rights to these lines of code. Index: toIR.c =================================================================== --- toIR.c (revision 1671) +++ toIR.c (working copy) @@ -7228,6 +7228,9 @@ 0x66 to appear. */ while (getIByte(delta) == 0x66) { sz = 2; delta++; }; + /* Address Size Overide */ + while (getIByte(delta) == 0x67) { sz = 2; delta++; }; + /* segment override prefixes come after the operand-size override, it seems */ switch (getIByte(delta)) { @@ -11157,9 +11160,9 @@ DIP("j%s-8 0x%x\n", name_X86Condcode(opc - 0x70), d32); break; - case 0xE3: /* JECXZ or perhaps JCXZ, depending on OSO ? Intel + case 0xE3: /* JECXZ or perhaps JCXZ, depending on ASO ! Intel manual says it depends on address size override. */ - if (sz != 4) goto decode_failure; + if (sz != 4 && sz != 2) goto decode_failure; d32 = (((Addr32)guest_EIP_bbstart)+delta+1) + getSDisp8(delta); delta++; ty = szToITy(sz);
For general purposes the test app i used to reproduce the problem #include <stdio.h> void main() { int i; asm("movl $0x10001,%ecx"); asm(".byte 0x67 , 0xe3 , 16 "); //16 IS EXACTLY THE LENGTH OG THE PRINTF printf("CX!=0\n"); asm(".byte 0xe3 , 16 "); printf("ECX!=0\n"); }; ~
FYI, just tested: an OSO prefix (0x66) indeed doesn't influence the test: it will still be an ECX test. I guess a better implementation would have both an sz (OSO) and asz (ASO) flag. I will check if can make a patch for this as well.
patch file that holds a seperate variable for ASO: Index: priv/guest-x86/toIR.c =================================================================== --- priv/guest-x86/toIR.c (revision 1671) +++ priv/guest-x86/toIR.c (working copy) @@ -7120,6 +7120,9 @@ /* sz denotes the nominal data-op size of the insn; we change it to 2 if an 0x66 prefix is seen */ Int sz = 4; + /* asz denotes the nominal address size of the insn; we change it to + 2 if an 0x67 prefix is seen only used on J(E)CXZ now */ + Int asz = 4; /* sorb holds the segment-override-prefix byte, if any. Zero if no prefix has been seen, else one of {0x26, 0x3E, 0x64, 0x65} @@ -7228,6 +7231,9 @@ 0x66 to appear. */ while (getIByte(delta) == 0x66) { sz = 2; delta++; }; + /* Address Size Overide */ + while (getIByte(delta) == 0x67) { asz = 2; delta++; }; + /* segment override prefixes come after the operand-size override, it seems */ switch (getIByte(delta)) { @@ -11157,21 +11163,21 @@ DIP("j%s-8 0x%x\n", name_X86Condcode(opc - 0x70), d32); break; - case 0xE3: /* JECXZ or perhaps JCXZ, depending on OSO ? Intel + case 0xE3: /* JECXZ or perhaps JCXZ, depending on ASO ! Intel manual says it depends on address size override. */ - if (sz != 4) goto decode_failure; + if (asz != 4 && asz != 2) goto decode_failure; d32 = (((Addr32)guest_EIP_bbstart)+delta+1) + getSDisp8(delta); delta++; - ty = szToITy(sz); + ty = szToITy(asz); stmt( IRStmt_Exit( binop(mkSizedOp(ty,Iop_CmpEQ8), - getIReg(sz,R_ECX), + getIReg(asz,R_ECX), mkU(ty,0)), Ijk_Boring, IRConst_U32(d32)) ); - DIP("j%sz 0x%x\n", nameIReg(sz, R_ECX), d32); + DIP("j%sz 0x%x\n", nameIReg(asz, R_ECX), d32); break; case 0xE0: /* LOOPNE disp8: decrement count, jump if count != 0 && ZF==0 */
For some reason, when I close it sets it to WORKSFORME. Clearly the patch needs to be reviewew and folded into SVN. Some action is needed by the committers ;)
*** Bug 126261 has been marked as a duplicate of this bug. ***
*** Bug 126184 has been marked as a duplicate of this bug. ***
*** Bug 126238 has been marked as a duplicate of this bug. ***
I'm not at all convinced that your patch is correct, as I would need to understand what effect (if any) the operand size override should be having on this instruction because as things stand you hand removed all that code. It is certainly correct that the ASO controls how much of %ecx is used though. Julian will need to handle this anyway, given that this is a VEX issue. BTW you shouldn't try and close things until they are committed, and it is better to resolve a bug as duplicate rather than add a comment - or were you not able to do that?
I am not a real VEX hacker either so a good review wouldn't hurt... as fas as the OSO prefix is concerned. I validated that my test prog would have identical results (with or without valgrind) when using the OSO prefix. Best regards, Ruud BTW 1: I couldn't know that the problems were real duplicates after troubleshooting VEX. BTW 2: Also it is not possible as bug reporter to mark other bugs as duplicates of your own bug (or mark your own as a duplicate of the other). The bugdb hides this bugzilla feature for me. Tested to be identical are: OSO + $cx=0 asm("movl $0x10000,%ecx"); asm(".byte 0x66 , 0xe3 , 16 "); //16 IS EXACTLY THE LENGTH OG THE PRINTF printf("CX!=0\n"); asm(".byte 0xe3 , 16 "); printf("ECX!=0\n"); OSO + $cx=1 asm("movl $0x10001,%ecx"); asm(".byte 0x66 , 0xe3 , 16 "); //16 IS EXACTLY THE LENGTH OG THE PRINTF printf("CX!=0\n"); asm(".byte 0xe3 , 16 "); printf("ECX!=0\n"); ASO + $cx=0 asm("movl $0x10000,%ecx"); asm(".byte 0x67 , 0xe3 , 16 "); //16 IS EXACTLY THE LENGTH OG THE PRINTF printf("CX!=0\n"); asm(".byte 0xe3 , 16 "); printf("ECX!=0\n"); ASO + $cx=1 asm("movl $0x10001,%ecx"); asm(".byte 0x67 , 0xe3 , 16 "); //16 IS EXACTLY THE LENGTH OG THE PRINTF printf("CX!=0\n"); asm(".byte 0xe3 , 16 "); printf("ECX!=0\n");
Julian fixed this in VEX r1678.