Bug 148447

Summary: x86_64 : new NOP codes: 66 66 66 66 2e 0f 1f nopw %cs:0x0(%rax,%rax,1)
Product: [Developer tools] valgrind Reporter: Dirk Mueller <mueller>
Component: vexAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: normal CC: duraid, esigra, j, jos, kde, veaceslav.munteanu90, woebbeking
Priority: NOR    
Version: 3.2.3   
Target Milestone: ---   
Platform: Unlisted Binaries   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Dirk Mueller 2007-08-02 14:54:12 UTC
Hi, 

valgrind x86_64 currently stumbles over this instruction: 

66 66 66 66 2e 0f 1f    nopw   %cs:0x0(%rax,%rax,1)
Comment 1 Frans Oliehoek 2007-08-23 13:51:56 UTC
I think I have the same problem?

valgrind -v ./tst_GMAA_FSPC.debug
==1977== Memcheck, a memory error detector.
==1977== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==1977== Using LibVEX rev 1732, a library for dynamic binary translation.
==1977== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==1977== Using valgrind-3.2.3-Debian, a dynamic binary instrumentation framework.
==1977== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==1977==
--1977-- Command line
--1977--    ./tst_GMAA_FSPC.debug
--1977-- Startup, with flags:
--1977--    --suppressions=/usr/lib/valgrind/debian-libc6-dbg.supp
--1977--    -v
--1977-- Contents of /proc/version:
--1977--   Linux version 2.6.18-4-amd64 (Debian 2.6.18.dfsg.1-12etch2) (dannf@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Fri May 4 00:37:33 UTC 2007
--1977-- Arch and hwcaps: AMD64, amd64-sse2
--1977-- Page sizes: currently 4096, max supported 4096
--1977-- Valgrind library directory: /usr/lib/valgrind
--1977-- Reading syms from /home/faolieho/Documents/implementation/madp/trunk/src/tests/tst_GMAA_FSPC.debug (0x400000)
--1977-- Reading syms from /lib/ld-2.6.1.so (0x4000000)
--1977-- Reading debug info from /lib/ld-2.6.1.so...
--1977-- ... CRC mismatch (computed 635CD41D wanted 1F3B7BF3)
--1977--    object doesn't have a symbol table
--1977-- Reading syms from /usr/lib/valgrind/amd64-linux/memcheck (0x38000000)
--1977--    object doesn't have a dynamic symbol table
--1977-- Reading suppressions file: /usr/lib/valgrind/debian-libc6-dbg.supp
--1977-- Reading suppressions file: /usr/lib/valgrind/default.supp
vex amd64->IR: unhandled instruction bytes: 0x66 0x66 0x66 0x66
==1977== valgrind: Unrecognised instruction at address 0x4016321.
==1977== Your program just tried to execute an instruction that Valgrind
==1977== did not recognise.  There are two possible reasons for this.
==1977== 1. Your program has a bug and erroneously jumped to a non-code
==1977==    location.  If you are running Memcheck and you just saw a
==1977==    warning about a bad jump, it's probably your program's fault.
==1977== 2. The instruction is legitimate but Valgrind doesn't handle it,
==1977==    i.e. it's Valgrind's fault.  If you think this is the case or
==1977==    you are not sure, please let us know and we'll try to fix it.
==1977== Either way, Valgrind will now raise a SIGILL signal which will
==1977== probably kill your program.
==1977==
==1977== Process terminating with default action of signal 4 (SIGILL)
==1977==  Illegal opcode at address 0x4016321
==1977==    at 0x4016321: (within /lib/ld-2.6.1.so)
==1977==    by 0x4007CC2: (within /lib/ld-2.6.1.so)
==1977==    by 0x4003329: (within /lib/ld-2.6.1.so)
==1977==    by 0x4014457: (within /lib/ld-2.6.1.so)
==1977==    by 0x400230A: (within /lib/ld-2.6.1.so)
==1977==    by 0x4000A67: (within /lib/ld-2.6.1.so)
==1977==
Comment 2 Frans Oliehoek 2007-08-23 14:09:48 UTC
*** This bug has been confirmed by popular vote. ***
Comment 3 Julian Seward 2007-08-23 14:31:24 UTC
Am looking at this now, but a bit confused because I can't reproduce the
failure on svn trunk or the 3.2 branch.  Maybe 66 66 66 66 2e 0f 1f is
only the initial part of the instruction.  Could one of you please send
the complete objdump -d output for the instruction so I can see what all
the instruction bytes are?
Comment 4 Derick Rethans 2007-08-23 14:37:29 UTC
Sure:

derick@kossu:~$ objdump -d /lib/ld-2.6.1.so | grep "66 66 66 66"
     c13:       66 66 66 66 2e 0f 1f    nopw   %cs:0x0(%rax,%rax,1)
    13e3:       66 66 66 66 2e 0f 1f    nopw   %cs:0x0(%rax,%rax,1)
    55a1:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
    8ed1:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
    9aa3:       66 66 66 66 2e 0f 1f    nopw   %cs:0x0(%rax,%rax,1)
    d171:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
    e191:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
    e611:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
    ede3:       66 66 66 66 2e 0f 1f    nopw   %cs:0x0(%rax,%rax,1)
    f752:       66 66 66 66 66 2e 0f    nopw   %cs:0x0(%rax,%rax,1)
   106f3:       66 66 66 66 2e 0f 1f    nopw   %cs:0x0(%rax,%rax,1)
   107d1:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
   10ae1:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
   118a1:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
   13bc2:       66 66 66 66 66 2e 0f    nopw   %cs:0x0(%rax,%rax,1)
   148e2:       66 66 66 66 66 2e 0f    nopw   %cs:0x0(%rax,%rax,1)
   14961:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
   15111:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
   156c1:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
   15f93:       66 66 66 66 2e 0f 1f    nopw   %cs:0x0(%rax,%rax,1)
   160e1:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
   16321:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
   16d12:       66 66 66 66 66 2e 0f    nopw   %cs:0x0(%rax,%rax,1)

Full dump is here:
http://files.derickrethans.nl/ld.dump.txt
Comment 5 Dirk Mueller 2007-08-23 14:41:11 UTC
whoopsie, indeed. the complete context is:


     ab7:       c3                      retq
     ab8:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
     abf:       00
     ac0:       83 47 04 01             addl   $0x1,0x4(%rdi)
     ac4:       c3                      retq
     ac5:       66 66 2e 0f 1f 84 00    nopw   %cs:0x0(%rax,%rax,1)
     acc:       00 00 00 00
     ad0:       83 6f 04 01             subl   $0x1,0x4(%rdi)
     ad4:       c3                      retq
     ad5:       66 66 2e 0f 1f 84 00    nopw   %cs:0x0(%rax,%rax,1)
Comment 7 Julian Seward 2007-08-23 16:19:16 UTC
Um, ok.  I still can't reproduce it using the program before on amd64.
What am I doing wrong?

int main ( void )
{
  __asm__ __volatile__(
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x2e\n\t"
     ".byte 0x0f\n\t"
     ".byte 0x1f\n\t"
     ".byte 0x84\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n"
  );
  return 0;
}
Comment 8 Derick Rethans 2007-08-23 16:24:21 UTC
I even get it with an empty executable:

int main ( void )
{
  return 0;
}

$ gcc -static -o prog-test prog.c

$ valgrind ./prog-test 
==25513== Memcheck, a memory error detector.
==25513== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==25513== Using LibVEX rev 1732, a library for dynamic binary translation.
==25513== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==25513== Using valgrind-3.2.3-Debian, a dynamic binary instrumentation framework.
==25513== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==25513== For more details, rerun with: -v
==25513== 
vex amd64->IR: unhandled instruction bytes: 0x66 0x66 0x66 0x66
==25513== valgrind: Unrecognised instruction at address 0x451C22.
==25513== Your program just tried to execute an instruction that Valgrind
==25513== did not recognise.  There are two possible reasons for this.
==25513== 1. Your program has a bug and erroneously jumped to a non-code
==25513==    location.  If you are running Memcheck and you just saw a
==25513==    warning about a bad jump, it's probably your program's fault.
==25513== 2. The instruction is legitimate but Valgrind doesn't handle it,
==25513==    i.e. it's Valgrind's fault.  If you think this is the case or
==25513==    you are not sure, please let us know and we'll try to fix it.
==25513== Either way, Valgrind will now raise a SIGILL signal which will
==25513== probably kill your program.
==25513== 
==25513== Process terminating with default action of signal 4 (SIGILL)
==25513==  Illegal opcode at address 0x451C22
==25513==    at 0x451C22: strpbrk (in /tmp/prog-test)
==25513==    by 0x448229: strsep (in /tmp/prog-test)
==25513==    by 0x42C8B0: fillin_rpath (in /tmp/prog-test)
==25513==    by 0x42E6DB: _dl_init_paths (in /tmp/prog-test)
==25513==    by 0x40AFBE: _dl_non_dynamic_init (in /tmp/prog-test)
==25513==    by 0x40B6CA: __libc_init_first (in /tmp/prog-test)
==25513==    by 0x400403: (below main) (in /tmp/prog-test)
==25513== 
==25513== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==25513== malloc/free: in use at exit: 0 bytes in 0 blocks.
==25513== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==25513== For counts of detected errors, rerun with: -v
==25513== All heap blocks were freed -- no leaks are possible.
Illegal instruction

$ valgrind --version
valgrind-3.2.3-Debian

I'll try fresh sources now, see if that helps. Anything else I could try ?
Comment 9 Derick Rethans 2007-08-23 16:28:43 UTC
That didn't go very far:

configure: error: Valgrind requires glibc version 2.2 - 2.5

$ dpkg -p libc6
Package: libc6
Priority: required
Section: libs
Installed-Size: 11328
Maintainer: GNU Libc Maintainers <debian-glibc@lists.debian.org>
Architecture: amd64
Source: glibc
Version: 2.6.1-1
Provides: glibc-2.6-1
Depends: libgcc1
Suggests: locales, glibc-doc
Conflicts: libterm-readline-gnu-perl (<< 1.15-2), tzdata (<< 2007e-2)
Size: 4911700
Description: GNU C Library: Shared libraries
 Contains the standard libraries that are used by nearly all programs on
 the system. This package includes shared versions of the standard C library
 and the standard math library, as well as many others.
Comment 10 Frans Oliehoek 2007-08-23 16:37:38 UTC
Just to make sure, does everybody with this bug uses debian (testing/lenny) on AMD64, with libc 2.6.1-1 ?


> dpkg -s libc6
Package: libc6
Status: install ok installed
Priority: required
Section: libs
Installed-Size: 11328
Maintainer: GNU Libc Maintainers <debian-glibc@lists.debian.org>
Architecture: amd64
Source: glibc
Version: 2.6.1-1
Provides: glibc-2.6-1
Depends: libgcc1
Suggests: locales, glibc-doc
...

>  md5sum /lib/ld-2.6.1.so
f68b7e0311528195934658fa43a67cb6  /lib/ld-2.6.1.so
Comment 11 Derick Rethans 2007-08-23 16:43:27 UTC
I also saw it on the suse list:

https://bugzilla.novell.com/show_bug.cgi?id=296803#c1

Which writes "Dirk Mueller told me that this was triggered by the new binutils which uses a new way of writing NOPs which is not yet known to valgrind."
Comment 12 Julian Seward 2007-08-23 16:46:20 UTC
What I need is for someone to construct a modified version of the
program I posted, which does cause Valgrind to bomb when it runs
the __asm__ __volatile__ section (and not before that point).
I can't figure out how to do so, although I could be doing something
stupid.
Comment 13 Dirk Mueller 2007-08-23 17:34:46 UTC
int main ( void )
{
  // 66 66 66 66 66 66 2e
  __asm__ __volatile__(
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x66\n\t"
     ".byte 0x2e\n\t"
     ".byte 0x0f\n\t"
     ".byte 0x1f\n\t"
     ".byte 0x84\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n\t"
     ".byte 0x00\n"
  );
  return 0;
}
Comment 14 Dirk Mueller 2007-08-23 17:35:29 UTC
surrounding code is

   14755:       ff c1                   inc    %ecx
   14757:       48 8d 76 01             lea    0x1(%rsi),%rsi
   1475b:       48 8d 7f 01             lea    0x1(%rdi),%rdi
   1475f:       75 ef                   jne    14750 <calloc+0x13e0>
   14761:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
   14768:       0f 1f 84 00 00 00 00
   1476f:       00
Comment 15 Dirk Mueller 2007-08-23 17:36:52 UTC
   1475f:       75 ef                   jne    14750 <calloc+0x13e0>
   14761:       66 66 66 66 66 66 2e    nopw   %cs:0x0(%rax,%rax,1)
   14768:       0f 1f 84 00 00 00 00
   1476f:       00
   14770:       48 81 fa 00 04 00 00    cmp    $0x400,%rdx
   14777:       77 77                   ja     147f0 <calloc+0x1480>
   14779:       89 d1                   mov    %edx,%ecx


better pasto. its very hard to find an application that doesn`t use calloc() ;(

Comment 16 Julian Seward 2007-08-23 20:56:40 UTC
Ah, my mistake.  My test case did not have enough 66s.  Now fixed;
vex r1776 - a one byte change :-)

Index: priv/guest-amd64/toIR.c
===================================================================
--- priv/guest-amd64/toIR.c     (revision 1775)
+++ priv/guest-amd64/toIR.c     (working copy)
@@ -8387,7 +8387,7 @@
       as many invalid combinations as possible. */
    n_prefixes = 0;
    while (True) {
-      if (n_prefixes > 5) goto decode_failure;
+      if (n_prefixes > 7) goto decode_failure;
       pre = getUChar(delta);
       switch (pre) {
          case 0x66: pfx |= PFX_66; break;
Comment 17 Julian Seward 2007-08-23 21:04:15 UTC
> Ah, my mistake.  My test case did not have enough 66s.  Now fixed;
> vex r1776 - a one byte change :-)


And vex r1777 on the 3.2 branch.
Comment 18 Derick Rethans 2007-08-23 21:21:41 UTC
Works great, thanks!
Comment 19 Frans Oliehoek 2007-08-24 15:09:58 UTC
yes, indeed. Great work, thanks!
Comment 20 Julian Seward 2007-08-24 15:33:38 UTC
Fixed in both trunk and 3.2 branch.
Comment 21 Jos van den Oever 2007-10-02 12:59:11 UTC
*** Bug 150408 has been marked as a duplicate of this bug. ***
Comment 22 Veaceslav Munteanu 2014-02-23 19:38:57 UTC
Hello, I have a similar problem when trying to run valgrind with digikam, valgrind version 3.9

http://pastebin.com/G1tEyJEe