Bug 152818

Summary: (repz lodsb) vex x86->IR: unhandled instruction bytes: 0xF3 0xAC 0xFC 0x9C
Product: [Developer tools] valgrind Reporter: Sergei Trofimovich <slyich>
Component: vexAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: crash    
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: An rep lods{b,w,l} test.
This file exposes rep lodsw bug in my hardware.
bug exposing prefix seqence

Description Sergei Trofimovich 2007-11-24 11:22:40 UTC
Version:           valgrind-SVN.3.3.0-r7204 (vex-r1800) (using KDE KDE 3.5.7)
Installed from:    Gentoo Packages
Compiler:          gcc version 4.2.2 (Gentoo 4.2.2 p1.0) Configured with: /var/tmp/portage/sys-devel/gcc-4.2.2/work/gcc-4.2.2/configure --prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/4.2.2 --includedir=/usr/lib/gcc/i686-pc-linux-gnu/4.2.2/include --datadir=/usr/share/gcc-data/i686-pc-linux-gnu/4.2.2 --mandir=/usr/share/gcc-data/i686-pc-linux-gnu/4.2.2/man --infodir=/usr/share/gcc-data/i686-pc-linux-gnu/4.2.2/info --with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/4.2.2/include/g++-v4 --host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib --enable-libmudflap --disable-libssp --disable-libgcj --with-arch=i686 --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
OS:                Linux

I use valgrind for testing an x86 emulator running
different x86 binaries (DOS, MS WINDOWS)
checking instructions and operands validness and runs
instructions on real processor.
Some of binaries contain instructions: REPZ LODSB, etc.

Valgrind crashes on them.

(Motivation as in http://bugs.kde.org/show_bug.cgi?id=152501)
Comment 1 Julian Seward 2007-11-24 16:37:58 UTC
What emulator is this?

Got a test case for checking correct implementation of "rep lods" ?
Comment 2 Sergei Trofimovich 2007-11-25 00:04:33 UTC
Created attachment 22185 [details]
An rep lods{b,w,l} test.

Sorry, not repz, REP LODS{b,w,l}.

Some explanations about needness
for theese insns in test source.

Thanks!
Comment 3 Julian Seward 2007-11-25 02:36:08 UTC
Fixed (vex r1801).  Please verify.
Comment 4 Sergei Trofimovich 2007-11-25 10:03:32 UTC
(buggy?) test seems to fail on rep lodsw, though lodsw itself works correctly.


-REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  3 (EAX = 1234FFFE, EFLAGS =         )
-REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  2 (EAX = 1234FFFF, EFLAGS =         )
+REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  3 (EAX = FFFEFFFD, EFLAGS =         )
+REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  2 (EAX = 00AAFFFF, EFLAGS =         )


-REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  2 (EAX = 12340001, EFLAGS =         )
-REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  3 (EAX = 12340002, EFLAGS =         )
+REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  2 (EAX = 00020001, EFLAGS =         )
+REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  3 (EAX = FEFD0003, EFLAGS =         )
Comment 5 Julian Seward 2007-11-25 13:00:43 UTC
Valgrind and my Core 2 produce identical results.  What CPU are you
comparing against?

I believe "rep lods" is basically a meaningless (or stupid) instruction.
What does it mean?  To load %ecx values from %esi into %al/%ax/%eax?
I don't really understand.  "rep stos" I can understand -- to implement
memset etc, but "rep lods" I don't understand.

So I believe that because "rep lods" is meaningless, different hardware
implements it differently.
Comment 6 Sergei Trofimovich 2007-11-25 16:57:09 UTC
I use hyperthreaded pentium4.

Yes I understang meaningless of this insn.
Some googling on this theme gave me "no sense" responses,
mumbling on short delay implementation and software
memory regeneration.

But I think end result should be identical on all hardware.
I'd agree it's hardware implementation dependant. But valgrind
shows less sane result of execution (it messed up EAX
in lodsw: %esi = {2,3}).

My results are below. Can I look at yours?

original:

REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  3 (EAX = 123487FE, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  2 (EAX = 123487FF, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  1 (EAX = 123487AA, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  1 (EAX = 123487AA, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  2 (EAX = 12348701, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  3 (EAX = 12348702, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  3 (EAX = 1234FFFE, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  2 (EAX = 1234FFFF, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  1 (EAX = 123400AA, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  1 (EAX = 123400AA, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  2 (EAX = 12340001, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  3 (EAX = 12340002, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  3 (EAX = FFFFFFFE, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  2 (EAX = FFFFFFFF, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  1 (EAX = 000000AA, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  1 (EAX = 000000AA, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  2 (EAX = 00000001, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  3 (EAX = 00000002, EFLAGS =         )

=----------------------------------=

under valgrind:
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  3 (EAX = 123487FE, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  2 (EAX = 123487FF, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  1 (EAX = 123487AA, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  1 (EAX = 123487AA, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  2 (EAX = 12348701, EFLAGS =         )
REP lodsb (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  3 (EAX = 12348702, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  3 (EAX = FFFEFFFD, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  2 (EAX = 00AAFFFF, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  1 (EAX = 123400AA, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  1 (EAX = 123400AA, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  2 (EAX = 00020001, EFLAGS =         )
REP lodsw (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  3 (EAX = FEFD0003, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  3 (EAX = FFFFFFFE, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  2 (EAX = FFFFFFFF, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  1 (EAX = 000000AA, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  0 (EAX = 12348765, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  1 (EAX = 000000AA, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  2 (EAX = 00000001, EFLAGS =         )
REP lodsl (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  3 (EAX = 00000002, EFLAGS =         )
Comment 7 Julian Seward 2007-11-25 18:12:31 UTC
> My results are below. Can I look at yours?


Not sure I understand the question.  My results on Core 2 natively are
identical to the results that Valgrind now gives.
Comment 8 Sergei Trofimovich 2007-11-25 22:21:04 UTC
Created attachment 22197 [details]
This file exposes rep lodsw bug in my hardware.

Running without valgrind (seems to be okay), ax contains predictable values:
REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000004 (EAX = 1234FFFD, d = -8)
REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000003 (EAX = 1234FFFE, d = -6)
REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000002 (EAX = 1234FFFF, d = -4)
REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000001 (EAX = 123400AA, d = -2)
REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000000 (EAX = 12348765, d = 0)
REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000000 (EAX = 12348765, d = 0)
REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000001 (EAX = 123400AA, d = 2)
REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000002 (EAX = 12340001, d = 4)
REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000003 (EAX = 12340002, d = 6)
REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000004 (EAX = 12340003, d = 8)

(vex-r1801)
Running with valgrind (corrupted esi (d param), corrupted EAX):

REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000004 (EAX = FFFC0000, d = -14)
REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000003 (EAX = FFFEFFFD, d = -10)
REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000002 (EAX = 00AAFFFF, d = -6)
REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000001 (EAX = 123400AA, d = -2)
REP lodsw (EAX = 12348765) => DF = 0, ECX = 00000000 (EAX = 12348765, d = 0)
REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000000 (EAX = 12348765, d = 0)
REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000001 (EAX = 123400AA, d = 2)
REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000002 (EAX = 00020001, d = 6)
REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000003 (EAX = 00040003, d = 10)
REP lodsw (EAX = 12348765) => DF = 1, ECX = 00000004 (EAX = 00000000, d = 14)

Processor:
$ cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping	: 9
cpu MHz 	: 2999.716
cache size	: 512 KB
physical id	: 0
siblings	: 2
core id 	: 0
cpu cores	: 1
fdiv_bug	: no
hlt_bug 	: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts sync_rdtsc
cid xtpr
bogomips	: 6001.60
clflush size	: 64
(two times, HT)

Can you post valgrind result on your Core2 here?

Thanks!
Comment 9 Sergei Trofimovich 2007-11-26 10:00:22 UTC
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 4000+

Works correctly.

Seems to be a bug in my p4 building environment.

Sorry.
Comment 10 Sergei Trofimovich 2008-05-09 13:17:59 UTC
I figured out where this bug came from.
When I buld attached test with binutils-2.17 - I have:
a5:	66 f3 ad             	rep lods %ds:(%esi),%ax
(works correctly in valgrind)

When I buld attached test with binutils-2.17 - I have:
a5:	f3 66 ad             	rep lods %ds:(%esi),%ax
(prefix order changed! works INcorrectly in valgrind, seems to skip f3 prefix)

One single difference in whole test leads to different valgrind emulations.
The same problem in rep cmpsw+gas-2.18.

Should I open new bug entry?
Comment 11 Sergei Trofimovich 2008-05-09 13:37:08 UTC
Sorry messed up all-in-the-world. Resending fixed version:

I figured out where this bug came from.

When I buld attached[0] test with binutils-2.17 - I have:

a5: f3 66 ad             rep lods %ds:(%esi),%ax
(works correctly in valgrind)

When I buld attached[0] test with binutils-2.18 - I have:

a5: 66 f3 ad             rep lods %ds:(%esi),%ax
(prefix order changed! works INcorrectly in valgrind, seems to skip f3 prefix)

One single difference in whole test leads to different valgrind emulations.
The same problem in rep cmpsw+binutils-2.18.

So, vanilla binutils-2.18 generates code valgrind can't emulate properly.

Should I open new bug entry?

--
[0] - http://bugs.kde.org/attachment.cgi?id=22197&action=view
Comment 12 Julian Seward 2008-05-09 14:17:47 UTC
> So, vanilla binutils-2.18 generates code valgrind can't emulate properly.
>
> Should I open new bug entry?


No, but what would be very useful is to send a new version of the test
program, that tests both prefix orders.  I guess you will have to
change "__asm__ __volatile__("rep lods"), etc, with
__asm__ __volatile__(".byte 0x66,0xf3,0xad") and
__asm__ __volatile__(".byte 0xF3,0x66,0xad") etc
if you see what I mean.
Comment 13 Sergei Trofimovich 2008-05-09 14:46:57 UTC
Created attachment 24681 [details]
bug exposing prefix seqence

Added byte squences in place rep lodsw.
Comment 14 Sergei Trofimovich 2008-05-09 15:14:59 UTC
Attached test shows difference in native/valgrind run for insn
`REP lodsw[rep/addr]` aka ".byte 0x66,0xf3,0xad"
Comment 15 Julian Seward 2008-05-09 15:47:11 UTC
> a5: f3 66 ad             rep lods %ds:(%esi),%ax
> (works correctly in valgrind)
>
> When I buld attached[0] test with binutils-2.18 - I have:
>
> a5: 66 f3 ad             rep lods %ds:(%esi),%ax
> (prefix order changed! works INcorrectly in valgrind, seems to skip f3
> prefix)


Are you 110% sure this is correct?  From looking at the valgrind
sources, I would say that "66 F3 AD" is handled correctly but
"F3 66 AD" is not.
Comment 16 Sergei Trofimovich 2008-05-09 18:30:25 UTC
> Are you 110% sure this is correct?
I've just reran attached test and compared results of ./bench and valgrind ./bench. I think you can easily reproduce it.

> From looking at the valgrind
> sources, I would say that "66 F3 AD" is handled correctly but
> "F3 66 AD" is not. 

I'm not familar with VEX, but:
guest-x86/toIR:13392

case 0xF3: {
    ...
    abyte = getIByte(delta); delta++;
    ...
    if (abyte == 0x66) { sz = 2; abyte = getIByte(delta); delta++; }

    switch (abyte) {
        ...
        case 0xAD:
Seems correct for "F3 66 AD"

Then I've inserted here such code:

      case 0xAD:
+++         vex_printf ("%s: dasmed 0x%02X 0x%02X 0x%02X sz = %d\n",
+++                     __func__,
+++                     getIByte (delta - 3), getIByte (delta - 2),getIByte (delta - 1),
+++                     sz);

commented out lodsb,lodsd instructions in test (so they are not present in binary) and reran valgrind on it.
So the resuts are:


disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 2
disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 4
REP lodsw[rep/addr] (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  4 (EAX = FFFC0000, EFLAGS =         )
REP lodsw[rep/addr] (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  3 (EAX = FFFEFFFD, EFLAGS =         )
REP lodsw[rep/addr] (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  2 (EAX = 00AAFFFF, EFLAGS =         )
REP lodsw[rep/addr] (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  1 (EAX = 123400AA, EFLAGS =         )
REP lodsw[rep/addr] (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  0 (EAX = 12348765, EFLAGS =         )
disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 2
REP lodsw[rep/addr] (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  0 (EAX = 12348765, EFLAGS =         )
disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 4
REP lodsw[rep/addr] (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  1 (EAX = 123400AA, EFLAGS =         )
REP lodsw[rep/addr] (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  2 (EAX = 00020001, EFLAGS =         )
REP lodsw[rep/addr] (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  3 (EAX = 00040003, EFLAGS =         )
REP lodsw[rep/addr] (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  4 (EAX = 00000000, EFLAGS =         )
disInstr_X86_WRK: dasmed 0xF3 0x66 0xAD sz = 2
disInstr_X86_WRK: dasmed 0xF3 0x66 0xAD sz = 2
REP lodsw[addr/rep] (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  4 (EAX = 1234FFFD, EFLAGS =         )
REP lodsw[addr/rep] (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  3 (EAX = 1234FFFE, EFLAGS =         )
REP lodsw[addr/rep] (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  2 (EAX = 1234FFFF, EFLAGS =         )
REP lodsw[addr/rep] (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  1 (EAX = 123400AA, EFLAGS =         )
REP lodsw[addr/rep] (EAX = 12348765, EFLAGS =         ) => DF = 0, count =  0 (EAX = 12348765, EFLAGS =         )
disInstr_X86_WRK: dasmed 0xF3 0x66 0xAD sz = 2
REP lodsw[addr/rep] (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  0 (EAX = 12348765, EFLAGS =         )
disInstr_X86_WRK: dasmed 0xF3 0x66 0xAD sz = 2
REP lodsw[addr/rep] (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  1 (EAX = 123400AA, EFLAGS =         )
REP lodsw[addr/rep] (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  2 (EAX = 12340001, EFLAGS =         )
REP lodsw[addr/rep] (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  3 (EAX = 12340002, EFLAGS =         )
REP lodsw[addr/rep] (EAX = 12348765, EFLAGS =         ) => DF = 1, count =  4 (EAX = 12340003, EFLAGS =         )

Looks odd. Sometimes sz != 2, but I can't figure out why.
Comment 17 Sergei Trofimovich 2008-05-10 19:43:48 UTC
> disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 2
> ...
> disInstr_X86_WRK: dasmed 0x66 0xF3 0xAD sz = 4

Maybe troubles are not in decode, but in code generation?

toIR.c:
...
+++ We could backup eip_orig here [0]
   n_prefixes = 0;
   while (True) {
      if (n_prefixes > 7) goto decode_failure;
      pre = getUChar(delta);
      switch (pre) {
         case 0x66:
             sz = 2;
             break;
..
   case 0xF3:{
      Addr32 eip_orig = guest_EIP_bbstart + delta - 1;
      if (sorb != 0) goto decode_failure;
      abyte = getIByte(delta); delta++;
      if (abyte == 0x66) { sz = 2; abyte = getIByte(delta); delta++; }

When 0x66 sits before 0xF3 - eip_orig has pointer to 0xF3(invalid), so code generated by dis_REP_op misses REP insn start when translates into loop, and does not decode 0x66 in the second and further iterations (op extends to m32).

That's why decode of this guest insn occurs two times in vex (0x66 0xF3 0xAD and 0xF3 0xAD).

vex could save eip_orig before general prefix parsing (somewhere in [0]).

In coclusion:
> Are you 110% sure this is correct?
Yes, now I am sure :]

I suspect it's very hard to understand my bad english.
Should I try to make a patch for vex and attach it here to clarify what I tried to say?
Comment 18 Julian Seward 2008-05-11 09:49:44 UTC
> When 0x66 sits before 0xF3 - eip_orig has pointer to 0xF3(invalid), so code
> generated by dis_REP_op misses REP insn start when translates into loop,
> and does not decode 0x66 in the second and further iterations (op extends
> to m32).
>
> That's why decode of this guest insn occurs two times in vex (0x66 0xF3
> 0xAD and 0xF3 0xAD).


Ah, I understand.  eip_orig is wrong.

> I suspect it's very hard to understand my bad english.


Your English is fine.  It's hard to understand the problem because
it's complicated.

Can you try this: change

   case 0xF3: { 
      Addr32 eip_orig = guest_EIP_bbstart + delta - 1;

to

   case 0xF3: { 
      Addr32 eip_orig = guest_EIP_bbstart + delta_orig;

does that fix it?
Comment 19 Sergei Trofimovich 2008-05-11 10:05:20 UTC
>   case 0xF3: {
>      Addr32 eip_orig = guest_EIP_bbstart + delta_orig;

priv/guest-x86/toIR.c: In function 'disInstr_X86_WRK':
priv/guest-x86/toIR.c:13393: error: 'delta_orig' undeclared (first use in this function)

BTW, what is the difference between getUChar and getIByte there?
Comment 20 Julian Seward 2008-05-11 10:13:51 UTC
> priv/guest-x86/toIR.c: In function 'disInstr_X86_WRK':
> priv/guest-x86/toIR.c:13393: error: 'delta_orig' undeclared (first use in
> this function)


Sorry.  I meant delta_start.
Comment 21 Sergei Trofimovich 2008-05-11 10:31:55 UTC
>   case 0xF3: {
>      Addr32 eip_orig = guest_EIP_bbstart + delta_start;

Works correctly for lods{b,w,d}/cmps{b,w,d}.
Comment 22 Julian Seward 2008-05-11 12:16:51 UTC
Fixed (vex r1838).  Thanks for the analysis of the problem.
Is it OK to add "bug exposing prefix seqence" test program to
the Valgrind test suite (GPL v2 or later) ?
Comment 23 Sergei Trofimovich 2008-05-11 12:21:00 UTC
> Is it OK to add "bug exposing prefix seqence" test program to
> the Valgrind test suite (GPL v2 or later) ?

Sure.
Thanks!