Bug 409999

Summary: vex amd64->IR: unhandled instruction bytes: 0x62 0xD1 0xFE 0x8 0x6F 0x84 0x24 0x8 0x0 0x0
Product: [Developer tools] valgrind Reporter: Andras Szabo <andrei.hu>
Component: generalAssignee: Julian Seward <jseward>
Status: RESOLVED DUPLICATE    
Severity: normal CC: gabravier, tom
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Andras Szabo 2019-07-19 13:47:49 UTC
SUMMARY

Most likely g++ generated an instruction, which is not recognized by valgrind. Remark: SSE 4.2 is enabbled during rocksdb's build process.


STEPS TO REPRODUCE
1. Build rocksdb 6.1.2 (https://github.com/facebook/rocksdb)
2. Link a program with librocksdb
3. Start valgrind with the program.

OBSERVED RESULT

Some nasty happens along the lines of:
==93== Memcheck, a memory error detector
==93== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==93== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==93== Command: build/correlation/common/persistence/test/test_persistence
==93== 
vex amd64->IR: unhandled instruction bytes: 0x62 0xD1 0xFE 0x8 0x6F 0x84 0x24 0x8 0x0 0x0
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==93== valgrind: Unrecognised instruction at address 0x51e81ce.
==93==    at 0x51E81CE: std::_Hashtable<std::string, std::pair<std::string const, rocksdb::OptionTypeInfo>, std::allocator<std::pair<std::string const, rocksdb::OptionTypeInfo> >, std::_
_detail::_Select1st, std::equal_to<std::string>, std::hash<std::string>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std:
:__detail::_Hashtable_traits<true, false, true> >::_Hashtable<std::pair<std::string const, rocksdb::OptionTypeInfo> const*>(std::pair<std::string const, rocksdb::OptionTypeInfo> const*, 
std::pair<std::string const, rocksdb::OptionTypeInfo> const*, unsigned long, std::hash<std::string> const&, std::__detail::_Mod_range_hashing const&, std::__detail::_Default_ranged_hash 
const&, std::equal_to<std::string> const&, std::__detail::_Select1st const&, std::allocator<std::pair<std::string const, rocksdb::OptionTypeInfo> > const&) (in /usr/lib64/librocksdb.so)
==93==    by 0x519CEBF: __static_initialization_and_destruction_0(int, int) [clone .constprop.642] (in /usr/lib64/librocksdb.so)
==93==    by 0x400F552: _dl_init (in /usr/lib64/ld-2.17.so)
==93==    by 0x40011A9: ??? (in /usr/lib64/ld-2.17.so)
==93== Your program just tried to execute an instruction that Valgrind
==93== did not recognise.  There are two possible reasons for this.
==93== 1. Your program has a bug and erroneously jumped to a non-code
==93==    location.  If you are running Memcheck and you just saw a
==93==    warning about a bad jump, it's probably your program's fault.
==93== 2. The instruction is legitimate but Valgrind doesn't handle it,
==93==    i.e. it's Valgrind's fault.  If you think this is the case or
==93==    you are not sure, please let us know and we'll try to fix it.
==93== Either way, Valgrind will now raise a SIGILL signal which will
==93== probably kill your program.
==93== 
==93== Process terminating with default action of signal 4 (SIGILL): dumping core
==93==  Illegal opcode at address 0x51E81CE
==93==    at 0x51E81CE: std::_Hashtable<std::string, std::pair<std::string const, rocksdb::OptionTypeInfo>, std::allocator<std::pair<std::string const, rocksdb::OptionTypeInfo> >, std::_
_detail::_Select1st, std::equal_to<std::string>, std::hash<std::string>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std:
:__detail::_Hashtable_traits<true, false, true> >::_Hashtable<std::pair<std::string const, rocksdb::OptionTypeInfo> const*>(std::pair<std::string const, rocksdb::OptionTypeInfo> const*, 
std::pair<std::string const, rocksdb::OptionTypeInfo> const*, unsigned long, std::hash<std::string> const&, std::__detail::_Mod_range_hashing const&, std::__detail::_Default_ranged_hash 
const&, std::equal_to<std::string> const&, std::__detail::_Select1st const&, std::allocator<std::pair<std::string const, rocksdb::OptionTypeInfo> > const&) (in /usr/lib64/librocksdb.so)
==93==    by 0x519CEBF: __static_initialization_and_destruction_0(int, int) [clone .constprop.642] (in /usr/lib64/librocksdb.so)
==93==    by 0x400F552: _dl_init (in /usr/lib64/ld-2.17.so)
==93==    by 0x40011A9: ??? (in /usr/lib64/ld-2.17.so)

EXPECTED RESULT

Run normally reporting eventual memory leaks and undefined behaviour.


SOFTWARE/OS VERSIONS
Linux: RHEL 7 

ADDITIONAL INFORMATION
Comment 1 Tom Hughes 2019-07-19 14:10:27 UTC
In 32 bit mode 0x62 would be BOUND which would win some sort of obscurity contest, but in 64 bit mode it doesn't appear to be valid and as far as I can see it hasn't been replaced by anything else, at least in the version of the Intel manual I am looking at...
Comment 2 Tom Hughes 2019-07-19 14:14:25 UTC
The latest (May 2019) edition seems to agree.

Did you compile this yourself, and if you did what architecture did you target exactly?
Comment 3 Andras Szabo 2019-07-22 14:52:24 UTC
Here is the cpuinfo of the build host:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0-79
Thread(s) per core:    2
Core(s) per socket:    20
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping:              4
CPU MHz:               3099.937
BogoMIPS:              4804.84
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              28160K
NUMA node0 CPU(s):     0-19,40-59
NUMA node1 CPU(s):     20-39,60-79

As I look into the CMakeLists.txt file of rocksdb, i am seeing: -msse4.2 -mpclmul that stands out of the crowd. We used g++ 7.3 to build rocksdb. I am gonna collect build logs to provide more specific info.
Comment 4 Tom Hughes 2019-08-26 06:11:45 UTC

*** This bug has been marked as a duplicate of bug 393351 ***
Comment 5 Julian Seward 2019-12-29 09:35:01 UTC
This bug has been reported 5 times in the past year, as bug numbers 393351,
409999, 414944, 411303 and 414053.  I would like to fix it.  I tried the
steps-to-reproduce shown in bugs 393351 and 414053, but without success: I
can't reproduce it either with the trunk or with 3.15.0.

Without being able to reproduce it, I can't fix it.  The first unhandled byte,
0x62, isn't the start of any known instruction (in 64-bit mode), so I suspect
there has been some failure earlier on.  Maybe Valgrind's instruction decoder
lost track of where it was on the previous instruction.  That's just a guess,
though.

What would be really helpful is if someone could reproduce the failure, and
then use objdump -d to show the instructions around the failure point.  I can
give guidance on how to use objdump if that helps.  If you want to try this, I
suggest you first reproduce the failure while giving --demangle=no
--sym-offsets=yes to Valgrind.  That will make it much easier to relate the
stack trace that Valgrind produces at the failure point, to the output of
objdump -d.