Bug 384230 - vex x86->IR: unhandled instruction bytes: 0x67 0xE8 0xAB 0x68
Summary: vex x86->IR: unhandled instruction bytes: 0x67 0xE8 0xAB 0x68
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Unclassified
Component: vex (show other bugs)
Version: 3.13.0
Platform: Other Linux
: NOR normal (vote)
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 384156 386115 388407 394903 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-08-31 14:42 UTC by Laszlo Papp
Modified: 2018-09-03 07:05 UTC (History)
9 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
ld objdump -d (1.56 MB, text/plain)
2017-08-31 14:42 UTC, Laszlo Papp
Details
Patch to implement "addr16 call" aka "call rel16" (1.99 KB, patch)
2017-08-31 14:57 UTC, Tom Hughes
Details
Patch to ignore addr16 prefix on relative call (763 bytes, patch)
2017-09-01 11:11 UTC, Tom Hughes
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Laszlo Papp 2017-08-31 14:42:45 UTC
Created attachment 107620 [details]
ld objdump -d

uname -a
Linux pol-lx-007 4.12.8-2-ARCH #1 SMP PREEMPT Fri Aug 18 14:08:02 UTC 2017 x86_64 GNU/Linux


cat /etc/issue
Arch Linux \r (\l)


cat main.c
int main()
{
        return 0;
}


gcc -m32 main.c


valgrind -v --sym-offsets=yes ./a.out 
==1594== Memcheck, a memory error detector
==1594== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1594== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==1594== Command: ./a.out
==1594== 
--1594-- Valgrind options:
--1594--    -v
--1594--    --sym-offsets=yes
--1594-- Contents of /proc/version:
--1594--   Linux version 4.12.8-2-ARCH (builduser@foutrelis) (gcc version 7.2.0 (GCC) ) #1 SMP PREEMPT Fri Aug 18 14:08:02 UTC 2017
--1594-- 
--1594-- Arch and hwcaps: X86, LittleEndian, x86-mmxext-sse1-sse2-sse3
--1594-- Page sizes: currently 4096, max supported 4096
--1594-- Valgrind library directory: /usr/lib/valgrind
--1594-- Reading syms from /tmp/a.out
--1594-- Reading syms from /usr/lib32/ld-2.25.so
--1594-- Reading syms from /usr/lib/valgrind/memcheck-x86-linux
--1594--    object doesn't have a symbol table
--1594--    object doesn't have a dynamic symbol table
--1594-- Scheduler: using generic scheduler lock implementation.
--1594-- Reading suppressions file: /usr/lib/valgrind/default.supp
==1594== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-1594-by-lpapp-on-???
==1594== embedded gdbserver: writing to   /tmp/vgdb-pipe-to-vgdb-from-1594-by-lpapp-on-???
==1594== embedded gdbserver: shared mem   /tmp/vgdb-pipe-shared-mem-vgdb-1594-by-lpapp-on-???
==1594== 
==1594== TO CONTROL THIS PROCESS USING vgdb (which you probably
==1594== don't want to do, unless you know exactly what you're doing,
==1594== or are doing some strange experiment):
==1594==   /usr/lib/valgrind/../../bin/vgdb --pid=1594 ...command...
==1594== 
==1594== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==1594==   /path/to/gdb ./a.out
==1594== and then give GDB the following command
==1594==   target remote | /usr/lib/valgrind/../../bin/vgdb --pid=1594
==1594== --pid is optional if only one valgrind process is running
==1594== 
--1594-- REDIR: 0x40195a0 (ld-linux.so.2:strlen) redirected to 0x58057063 (???)
vex x86->IR: unhandled instruction bytes: 0x67 0xE8 0xAB 0x68
==1594== valgrind: Unrecognised instruction at address 0x400301f.
==1594==    at 0x400301F: dl_main+4975 (in /usr/lib32/ld-2.25.so)
==1594==    by 0x4016A65: _dl_sysdep_start+1269 (in /usr/lib32/ld-2.25.so)
==1594==    by 0x4001861: _dl_start+513 (in /usr/lib32/ld-2.25.so)
==1594==    by 0x4000AF6: ??? (in /usr/lib32/ld-2.25.so)
==1594== Your program just tried to execute an instruction that Valgrind
==1594== did not recognise.  There are two possible reasons for this.
==1594== 1. Your program has a bug and erroneously jumped to a non-code
==1594==    location.  If you are running Memcheck and you just saw a
==1594==    warning about a bad jump, it's probably your program's fault.
==1594== 2. The instruction is legitimate but Valgrind doesn't handle it,
==1594==    i.e. it's Valgrind's fault.  If you think this is the case or
==1594==    you are not sure, please let us know and we'll try to fix it.
==1594== Either way, Valgrind will now raise a SIGILL signal which will
==1594== probably kill your program.
==1594== 
==1594== Process terminating with default action of signal 4 (SIGILL): dumping core
==1594==  Illegal opcode at address 0x400301F
==1594==    at 0x400301F: dl_main+4975 (in /usr/lib32/ld-2.25.so)
==1594==    by 0x4016A65: _dl_sysdep_start+1269 (in /usr/lib32/ld-2.25.so)
==1594==    by 0x4001861: _dl_start+513 (in /usr/lib32/ld-2.25.so)
==1594==    by 0x4000AF6: ??? (in /usr/lib32/ld-2.25.so)
==1594== 
==1594== HEAP SUMMARY:
==1594==     in use at exit: 0 bytes in 0 blocks
==1594==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==1594== 
==1594== All heap blocks were freed -- no leaks are possible
==1594== 
==1594== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==1594== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction (core dumped)

ld --version
GNU ld (GNU Binutils) 2.28.0.20170506
Copyright (C) 2017 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
Comment 1 Tom Hughes 2017-08-31 14:57:25 UTC
Created attachment 107621 [details]
Patch to implement "addr16 call" aka "call rel16"

Try building valgrind with this patch applied and see if that helps...
Comment 2 Tom Hughes 2017-09-01 11:11:35 UTC
Created attachment 107639 [details]
Patch to ignore addr16 prefix on relative call

So it seems that this isn't actually (as we first thought) a relative call with a 16 bit offset but rather is still a 32 bit offset but just with an (apparently redundant) addr16 prefix.

This patch just ignores that prefix and has been confirmed to fix the problem.
Comment 3 Laszlo Papp 2017-09-01 11:36:09 UTC
Yes, thank you so much for the quick replies and work.
Comment 4 Tom Hughes 2017-10-23 16:30:25 UTC
*** Bug 386115 has been marked as a duplicate of this bug. ***
Comment 5 Steve Folta 2017-12-05 01:00:32 UTC
I had the same problem when using the ifstream constructor, ONLY when using "-m32".  The patch (107639) does fix it.  (However, the bug as reported -- on dl_main() with a super-minimal main() -- did not occur for me.)

Pardon my ignorance, but why isn't that patch in Git already?

Here's my test case:

$ cat main.cpp
#include <fstream>


int main(int argc, char* argv[])
{
        std::ifstream file(argv[1], std::ios_base::in);

        return 0;
}


$ cat Makefile
test: main.cpp
        g++ -g -m32 -o $@ $^

$ valgrind test test.txt
==7628== Memcheck, a memory error detector
==7628== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==7628== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==7628== Command: test test.txt
==7628==
vex x86->IR: unhandled instruction bytes: 0x67 0xE8 0x7D 0xFF
==7628== valgrind: Unrecognised instruction at address 0x48eb53d.
==7628==    at 0x48EB53D: std::locale::locale() (locale_init.cc:250)
==7628==    by 0x4940A26: basic_ios (basic_ios.h:462)
==7628==    by 0x4940A26: std::basic_ifstream<char, std::char_traits<char> >::basic_ifstream(char const*, std::_Ios_Openmode) (fstream:496)
==7628==    by 0x1086B0: main (main.cpp:6)
==7628== Your program just tried to execute an instruction that Valgrind
==7628== did not recognise.  There are two possible reasons for this.
==7628== 1. Your program has a bug and erroneously jumped to a non-code
==7628==    location.  If you are running Memcheck and you just saw a
==7628==    warning about a bad jump, it's probably your program's fault.
==7628== 2. The instruction is legitimate but Valgrind doesn't handle it,
==7628==    i.e. it's Valgrind's fault.  If you think this is the case or
==7628==    you are not sure, please let us know and we'll try to fix it.
==7628== Either way, Valgrind will now raise a SIGILL signal which will
==7628== probably kill your program.
==7628==
==7628== Process terminating with default action of signal 4 (SIGILL): dumping core
==7628==  Illegal opcode at address 0x48EB53D
==7628==    at 0x48EB53D: std::locale::locale() (locale_init.cc:250)
==7628==    by 0x4940A26: basic_ios (basic_ios.h:462)
==7628==    by 0x4940A26: std::basic_ifstream<char, std::char_traits<char> >::basic_ifstream(char const*, std::_Ios_Openmode) (fstream:496)
==7628==    by 0x1086B0: main (main.cpp:6)
==7628==
==7628== HEAP SUMMARY:
==7628==     in use at exit: 0 bytes in 0 blocks
==7628==   total heap usage: 1 allocs, 1 frees, 18,944 bytes allocated
==7628==
==7628== All heap blocks were freed -- no leaks are possible
==7628==
==7628== For counts of detected and suppressed errors, rerun with: -v
==7628== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction (core dumped)
Comment 6 Tom Hughes 2017-12-05 07:09:51 UTC
Because we still don't really understand what that instruction means - that patch is just a best guess at how to decode it.

If that patch is correct then the prefix is redundant in which case why is the compiler generating it? Is there some subtle hidden meaning that we are unaware of and which we need to emulate?
Comment 7 Tom Hughes 2018-01-01 09:27:39 UTC
*** Bug 388407 has been marked as a duplicate of this bug. ***
Comment 8 Tom Hughes 2018-03-16 13:53:18 UTC
Committed as fb6157165bb46e9f5e8c4d5fb3287306fc2453b8.
Comment 9 Laszlo Papp 2018-03-16 13:58:29 UTC
So is this now a proper solution or a workaround in the end?
Comment 10 John Reiser 2018-03-16 14:14:46 UTC
(In reply to Laszlo Papp from comment #9)
> So is this now a proper solution or a workaround in the end?

The "proper fix" is for the compiler not to generate totally wasteful code in the first place.

The change in valgrind correctly accommodates poor code.
Comment 11 Laszlo Papp 2018-03-16 14:16:56 UTC
I see; thanks. Is there an issue opened against the compiler then?
Comment 12 Tom Hughes 2018-03-16 14:24:08 UTC
I don't believe anybody has a test case so there wouldn't be any point.

If you can come up some C that will cause the compiler to generate this sequence then please feel free to report it.
Comment 13 Laszlo Papp 2018-03-16 14:31:50 UTC
According to my opening post, "int main() { return 0; }" seems to generate it.
Comment 14 Tom Hughes 2018-03-16 14:51:04 UTC
Not really - the actual bogus instruction was in ld.so which is not part of the code generated by that compilation.

So you would need to isolate the relevant file in the glibc source and try and create a minimised preprocessed test case from that then find a set of compiler options which would cause it to generate that code sequence for the test case.
Comment 15 Jannik Vogel 2018-03-16 20:20:55 UTC
(In reply to John Reiser from comment #10)
> The "proper fix" is for the compiler not to generate totally wasteful code
> in the first place.
> 
> The change in valgrind correctly accommodates poor code.

To me, valgrind emulation always try to be transparent to the user. So if something runs on the host CPU, it should run through valgrind too.
So arguing with a compiler is not the correct thing to do here.

A fix for valgrind was necessary and Toms patch gets the job done.
Wether this might be necessary for other instructions too should be tested to make this patch more generic.

(In reply to Tom Hughes from comment #14)
> Not really - the actual bogus instruction was in ld.so which is not part of
> the code generated by that compilation.

Not 100% true. The code was also found in arch linux libGLdispatch and libopenal. That's at least where I was affected, but other instances probably exist too.

> So you would need to isolate the relevant file in the glibc source and try
> and create a minimised preprocessed test case from that then find a set of
> compiler options which would cause it to generate that code sequence for the
> test case.

When I originally found out about this issue, I reached out to the arch linux openal package maintainer to figure out how packages are build.
While the openal maintainer (heftig) did not respond, other people on arch linux IRC helped me figure out what I needed to know.

Packge https://www.archlinux.org/packages/multilib/x86_64/lib32-openal/ (version 1.18.2-1), which is affected by this bug:

- Was compiled using "gcc-multilib-7.2.0-1" (see arch repos)
- Configured using this: https://git.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=packages/lib32-openal
- Systemwide compile flags can be found in makepkg.conf of the devtools helper package

So we at least know which compiler generates such a sequence.

I've also responded in #valgrind-dev (IRC) later, when heftig responded about why this instruction exists (following is a direct quote from heftig):

> the purpose is to make the call instruction a byte longer
> https://sourceware.org/ml/binutils/2016-05/msg00322.html
> it's the linker patching indirect -fno-plt calls (which were 6 bytes) to direct calls (which are 5 bytes)
> since you have an extra byte to fill, you either need to insert a nop (which would be an extra instruction) or add a pointless prefix

so this is likely done to not waste an additional nop cycle or to avoid getting interrupts between those instructions. It might have other implications though.
So while this looks like broken code, it's actually a clever hack to make instructions longer. As a result the patch might be incomplete as this can probably done with other instructions too.

---

If anybody still has doubts about this patch I'd recommend checking the x86 docs to see if that says it's an illegal instruction - if not it should be supported; if it is contact intel about it.

If you want to know more about why this sequence is generated, I'd recommend to contact the people who worked on gcc 7.2.0
Comment 16 Tom Hughes 2018-03-16 20:34:09 UTC
Were not actually daft you know and I did check the Intel manual multiple times. To be clear I believe this quote covers it:

"The address-size override prefix (67H) allows programs to switch between 16- and 32-bit addressing. Either size can be the default; the prefix selects the non-default size. Using this prefix and/or other undefined opcodes when operands for the instruction do not reside in memory is reserved; such use may cause unpredictable behavior."

In this case the only operand to the instruction is an immediate offset encoded as part of the instruction, which does not "reside in memory" and as such this use of the prefix constitutes a reserved instruction and using it may cause "unpredictable behaviour".
Comment 17 Julian Seward 2018-07-25 07:07:37 UTC
*** Bug 384156 has been marked as a duplicate of this bug. ***
Comment 18 Julian Seward 2018-09-03 07:05:02 UTC
*** Bug 394903 has been marked as a duplicate of this bug. ***