Version: valgrind-SVN.3.3.0-r7204 (vex-r1800) (using KDE KDE 3.5.7) Installed from: Gentoo Packages Compiler: gcc version 4.2.2 (Gentoo 4.2.2 p1.0) Configured with: /var/tmp/portage/sys-devel/gcc-4.2.2/work/gcc-4.2.2/configure --prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/4.2.2 --includedir=/usr/lib/gcc/i686-pc-linux-gnu/4.2.2/include --datadir=/usr/share/gcc-data/i686-pc-linux-gnu/4.2.2 --mandir=/usr/share/gcc-data/i686-pc-linux-gnu/4.2.2/man --infodir=/usr/share/gcc-data/i686-pc-linux-gnu/4.2.2/info --with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/4.2.2/include/g++-v4 --host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib --enable-libmudflap --disable-libssp --disable-libgcj --with-arch=i686 --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu OS: Linux I use valgrind for testing an x86 emulator running different x86 binaries (DOS, MS WINDOWS) checking instructions and operands validness and runs instructions on real processor. Some of binaries contain instructions: REPZ LODSB, etc. Valgrind crashes on them. (Motivation as in http://bugs.kde.org/show_bug.cgi?id=152501)
What emulator is this? Got a test case for checking correct implementation of "rep lods" ?
Created attachment 22185 [details] An rep lods{b,w,l} test. Sorry, not repz, REP LODS{b,w,l}. Some explanations about needness for theese insns in test source. Thanks!
Fixed (vex r1801). Please verify.
(buggy?) test seems to fail on rep lodsw, though lodsw itself works correctly. -REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 3 (EAX = 1234FFFE, EFLAGS = ) -REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 2 (EAX = 1234FFFF, EFLAGS = ) +REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 3 (EAX = FFFEFFFD, EFLAGS = ) +REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 2 (EAX = 00AAFFFF, EFLAGS = ) -REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 2 (EAX = 12340001, EFLAGS = ) -REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 3 (EAX = 12340002, EFLAGS = ) +REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 2 (EAX = 00020001, EFLAGS = ) +REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 3 (EAX = FEFD0003, EFLAGS = )
Valgrind and my Core 2 produce identical results. What CPU are you comparing against? I believe "rep lods" is basically a meaningless (or stupid) instruction. What does it mean? To load %ecx values from %esi into %al/%ax/%eax? I don't really understand. "rep stos" I can understand -- to implement memset etc, but "rep lods" I don't understand. So I believe that because "rep lods" is meaningless, different hardware implements it differently.
I use hyperthreaded pentium4. Yes I understang meaningless of this insn. Some googling on this theme gave me "no sense" responses, mumbling on short delay implementation and software memory regeneration. But I think end result should be identical on all hardware. I'd agree it's hardware implementation dependant. But valgrind shows less sane result of execution (it messed up EAX in lodsw: %esi = {2,3}). My results are below. Can I look at yours? original: REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 0, count = 3 (EAX = 123487FE, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 0, count = 2 (EAX = 123487FF, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 0, count = 1 (EAX = 123487AA, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 0, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 1, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 1, count = 1 (EAX = 123487AA, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 1, count = 2 (EAX = 12348701, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 1, count = 3 (EAX = 12348702, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 3 (EAX = 1234FFFE, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 2 (EAX = 1234FFFF, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 1 (EAX = 123400AA, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 1 (EAX = 123400AA, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 2 (EAX = 12340001, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 3 (EAX = 12340002, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 0, count = 3 (EAX = FFFFFFFE, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 0, count = 2 (EAX = FFFFFFFF, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 0, count = 1 (EAX = 000000AA, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 0, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 1, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 1, count = 1 (EAX = 000000AA, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 1, count = 2 (EAX = 00000001, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 1, count = 3 (EAX = 00000002, EFLAGS = ) =----------------------------------= under valgrind: REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 0, count = 3 (EAX = 123487FE, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 0, count = 2 (EAX = 123487FF, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 0, count = 1 (EAX = 123487AA, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 0, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 1, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 1, count = 1 (EAX = 123487AA, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 1, count = 2 (EAX = 12348701, EFLAGS = ) REP lodsb (EAX = 12348765, EFLAGS = ) => DF = 1, count = 3 (EAX = 12348702, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 3 (EAX = FFFEFFFD, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 2 (EAX = 00AAFFFF, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 1 (EAX = 123400AA, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 0, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 1 (EAX = 123400AA, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 2 (EAX = 00020001, EFLAGS = ) REP lodsw (EAX = 12348765, EFLAGS = ) => DF = 1, count = 3 (EAX = FEFD0003, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 0, count = 3 (EAX = FFFFFFFE, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 0, count = 2 (EAX = FFFFFFFF, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 0, count = 1 (EAX = 000000AA, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 0, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 1, count = 0 (EAX = 12348765, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 1, count = 1 (EAX = 000000AA, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 1, count = 2 (EAX = 00000001, EFLAGS = ) REP lodsl (EAX = 12348765, EFLAGS = ) => DF = 1, count = 3 (EAX = 00000002, EFLAGS = )
> My results are below. Can I look at yours? Not sure I understand the question. My results on Core 2 natively are identical to the results that Valgrind now gives.
Created attachment 22197 [details] This file exposes rep lodsw bug in my hardware. Running without valgrind (seems to be okay), ax contains predictable values: REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000004 (EAX = 1234FFFD, d = -8) REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000003 (EAX = 1234FFFE, d = -6) REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000002 (EAX = 1234FFFF, d = -4) REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000001 (EAX = 123400AA, d = -2) REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000000 (EAX = 12348765, d = 0) REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000000 (EAX = 12348765, d = 0) REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000001 (EAX = 123400AA, d = 2) REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000002 (EAX = 12340001, d = 4) REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000003 (EAX = 12340002, d = 6) REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000004 (EAX = 12340003, d = 8) (vex-r1801) Running with valgrind (corrupted esi (d param), corrupted EAX): REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000004 (EAX = FFFC0000, d = -14) REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000003 (EAX = FFFEFFFD, d = -10) REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000002 (EAX = 00AAFFFF, d = -6) REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000001 (EAX = 123400AA, d = -2) REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000000 (EAX = 12348765, d = 0) REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000000 (EAX = 12348765, d = 0) REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000001 (EAX = 123400AA, d = 2) REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000002 (EAX = 00020001, d = 6) REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000003 (EAX = 00040003, d = 10) REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000004 (EAX = 00000000, d = 14) Processor: $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 3.00GHz stepping : 9 cpu MHz : 2999.716 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts sync_rdtsc cid xtpr bogomips : 6001.60 clflush size : 64 (two times, HT) Can you post valgrind result on your Core2 here? Thanks!
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ Works correctly. Seems to be a bug in my p4 building environment. Sorry.
I figured out where this bug came from. When I buld attached test with binutils-2.17 - I have: a5: 66 f3 ad rep lods %ds:(%esi),%ax (works correctly in valgrind) When I buld attached test with binutils-2.17 - I have: a5: f3 66 ad rep lods %ds:(%esi),%ax (prefix order changed! works INcorrectly in valgrind, seems to skip f3 prefix) One single difference in whole test leads to different valgrind emulations. The same problem in rep cmpsw+gas-2.18. Should I open new bug entry?
Sorry messed up all-in-the-world. Resending fixed version: I figured out where this bug came from. When I buld attached[0] test with binutils-2.17 - I have: a5: f3 66 ad rep lods %ds:(%esi),%ax (works correctly in valgrind) When I buld attached[0] test with binutils-2.18 - I have: a5: 66 f3 ad rep lods %ds:(%esi),%ax (prefix order changed! works INcorrectly in valgrind, seems to skip f3 prefix) One single difference in whole test leads to different valgrind emulations. The same problem in rep cmpsw+binutils-2.18. So, vanilla binutils-2.18 generates code valgrind can't emulate properly. Should I open new bug entry? -- [0] - http://bugs.kde.org/attachment.cgi?id=22197&action=view
> So, vanilla binutils-2.18 generates code valgrind can't emulate properly. > > Should I open new bug entry? No, but what would be very useful is to send a new version of the test program, that tests both prefix orders. I guess you will have to change "__asm__ __volatile__("rep lods"), etc, with __asm__ __volatile__(".byte 0x66,0xf3,0xad") and __asm__ __volatile__(".byte 0xF3,0x66,0xad") etc if you see what I mean.
Created attachment 24681 [details] bug exposing prefix seqence Added byte squences in place rep lodsw.
Attached test shows difference in native/valgrind run for insn `REP lodsw[rep/addr]` aka ".byte 0x66,0xf3,0xad"
> a5: f3 66 ad rep lods %ds:(%esi),%ax > (works correctly in valgrind) > > When I buld attached[0] test with binutils-2.18 - I have: > > a5: 66 f3 ad rep lods %ds:(%esi),%ax > (prefix order changed! works INcorrectly in valgrind, seems to skip f3 > prefix) Are you 110% sure this is correct? From looking at the valgrind sources, I would say that "66 F3 AD" is handled correctly but "F3 66 AD" is not.
> Are you 110% sure this is correct? I've just reran attached test and compared results of ./bench and valgrind ./bench. I think you can easily reproduce it. > From looking at the valgrind > sources, I would say that "66 F3 AD" is handled correctly but > "F3 66 AD" is not. I'm not familar with VEX, but: guest-x86/toIR:13392 case 0xF3: { ... abyte = getIByte(delta); delta++; ... if (abyte == 0x66) { sz = 2; abyte = getIByte(delta); delta++; } switch (abyte) { ... case 0xAD: Seems correct for "F3 66 AD" Then I've inserted here such code: case 0xAD: +++ vex_printf ("%s: dasmed 0x%02X 0x%02X 0x%02X sz = %d\n", +++ __func__, +++ getIByte (delta - 3), getIByte (delta - 2),getIByte (delta - 1), +++ sz); commented out lodsb,lodsd instructions in test (so they are not present in binary) and reran valgrind on it. So the resuts are: disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 2 disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 4 REP lodsw[rep/addr] (EAX = 12348765, EFLAGS = ) => DF = 0, count = 4 (EAX = FFFC0000, EFLAGS = ) REP lodsw[rep/addr] (EAX = 12348765, EFLAGS = ) => DF = 0, count = 3 (EAX = FFFEFFFD, EFLAGS = ) REP lodsw[rep/addr] (EAX = 12348765, EFLAGS = ) => DF = 0, count = 2 (EAX = 00AAFFFF, EFLAGS = ) REP lodsw[rep/addr] (EAX = 12348765, EFLAGS = ) => DF = 0, count = 1 (EAX = 123400AA, EFLAGS = ) REP lodsw[rep/addr] (EAX = 12348765, EFLAGS = ) => DF = 0, count = 0 (EAX = 12348765, EFLAGS = ) disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 2 REP lodsw[rep/addr] (EAX = 12348765, EFLAGS = ) => DF = 1, count = 0 (EAX = 12348765, EFLAGS = ) disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 4 REP lodsw[rep/addr] (EAX = 12348765, EFLAGS = ) => DF = 1, count = 1 (EAX = 123400AA, EFLAGS = ) REP lodsw[rep/addr] (EAX = 12348765, EFLAGS = ) => DF = 1, count = 2 (EAX = 00020001, EFLAGS = ) REP lodsw[rep/addr] (EAX = 12348765, EFLAGS = ) => DF = 1, count = 3 (EAX = 00040003, EFLAGS = ) REP lodsw[rep/addr] (EAX = 12348765, EFLAGS = ) => DF = 1, count = 4 (EAX = 00000000, EFLAGS = ) disInstr_X86_WRK: dasmed 0xF3 0x66 0xAD sz = 2 disInstr_X86_WRK: dasmed 0xF3 0x66 0xAD sz = 2 REP lodsw[addr/rep] (EAX = 12348765, EFLAGS = ) => DF = 0, count = 4 (EAX = 1234FFFD, EFLAGS = ) REP lodsw[addr/rep] (EAX = 12348765, EFLAGS = ) => DF = 0, count = 3 (EAX = 1234FFFE, EFLAGS = ) REP lodsw[addr/rep] (EAX = 12348765, EFLAGS = ) => DF = 0, count = 2 (EAX = 1234FFFF, EFLAGS = ) REP lodsw[addr/rep] (EAX = 12348765, EFLAGS = ) => DF = 0, count = 1 (EAX = 123400AA, EFLAGS = ) REP lodsw[addr/rep] (EAX = 12348765, EFLAGS = ) => DF = 0, count = 0 (EAX = 12348765, EFLAGS = ) disInstr_X86_WRK: dasmed 0xF3 0x66 0xAD sz = 2 REP lodsw[addr/rep] (EAX = 12348765, EFLAGS = ) => DF = 1, count = 0 (EAX = 12348765, EFLAGS = ) disInstr_X86_WRK: dasmed 0xF3 0x66 0xAD sz = 2 REP lodsw[addr/rep] (EAX = 12348765, EFLAGS = ) => DF = 1, count = 1 (EAX = 123400AA, EFLAGS = ) REP lodsw[addr/rep] (EAX = 12348765, EFLAGS = ) => DF = 1, count = 2 (EAX = 12340001, EFLAGS = ) REP lodsw[addr/rep] (EAX = 12348765, EFLAGS = ) => DF = 1, count = 3 (EAX = 12340002, EFLAGS = ) REP lodsw[addr/rep] (EAX = 12348765, EFLAGS = ) => DF = 1, count = 4 (EAX = 12340003, EFLAGS = ) Looks odd. Sometimes sz != 2, but I can't figure out why.
> disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 2 > ... > disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 4 Maybe troubles are not in decode, but in code generation? toIR.c: ... +++ We could backup eip_orig here [0] n_prefixes = 0; while (True) { if (n_prefixes > 7) goto decode_failure; pre = getUChar(delta); switch (pre) { case 0x66: sz = 2; break; .. case 0xF3:{ Addr32 eip_orig = guest_EIP_bbstart + delta - 1; if (sorb != 0) goto decode_failure; abyte = getIByte(delta); delta++; if (abyte == 0x66) { sz = 2; abyte = getIByte(delta); delta++; } When 0x66 sits before 0xF3 - eip_orig has pointer to 0xF3(invalid), so code generated by dis_REP_op misses REP insn start when translates into loop, and does not decode 0x66 in the second and further iterations (op extends to m32). That's why decode of this guest insn occurs two times in vex (0x66 0xF3 0xAD and 0xF3 0xAD). vex could save eip_orig before general prefix parsing (somewhere in [0]). In coclusion: > Are you 110% sure this is correct? Yes, now I am sure :] I suspect it's very hard to understand my bad english. Should I try to make a patch for vex and attach it here to clarify what I tried to say?
> When 0x66 sits before 0xF3 - eip_orig has pointer to 0xF3(invalid), so code > generated by dis_REP_op misses REP insn start when translates into loop, > and does not decode 0x66 in the second and further iterations (op extends > to m32). > > That's why decode of this guest insn occurs two times in vex (0x66 0xF3 > 0xAD and 0xF3 0xAD). Ah, I understand. eip_orig is wrong. > I suspect it's very hard to understand my bad english. Your English is fine. It's hard to understand the problem because it's complicated. Can you try this: change case 0xF3: { Addr32 eip_orig = guest_EIP_bbstart + delta - 1; to case 0xF3: { Addr32 eip_orig = guest_EIP_bbstart + delta_orig; does that fix it?
> case 0xF3: { > Addr32 eip_orig = guest_EIP_bbstart + delta_orig; priv/guest-x86/toIR.c: In function 'disInstr_X86_WRK': priv/guest-x86/toIR.c:13393: error: 'delta_orig' undeclared (first use in this function) BTW, what is the difference between getUChar and getIByte there?
> priv/guest-x86/toIR.c: In function 'disInstr_X86_WRK': > priv/guest-x86/toIR.c:13393: error: 'delta_orig' undeclared (first use in > this function) Sorry. I meant delta_start.
> case 0xF3: { > Addr32 eip_orig = guest_EIP_bbstart + delta_start; Works correctly for lods{b,w,d}/cmps{b,w,d}.
Fixed (vex r1838). Thanks for the analysis of the problem. Is it OK to add "bug exposing prefix seqence" test program to the Valgrind test suite (GPL v2 or later) ?
> Is it OK to add "bug exposing prefix seqence" test program to > the Valgrind test suite (GPL v2 or later) ? Sure. Thanks!