387940 – amd64->IR: unhandled instruction bytes: 0xF 0xC7 0xF0 0x89 0x44 (__x86_rdrand)

Bug 387940 - amd64->IR: unhandled instruction bytes: 0xF 0xC7 0xF0 0x89 0x44 (__x86_rdrand)

Summary: amd64->IR: unhandled instruction bytes: 0xF 0xC7 0xF0 0x89 0x44 (__x86_rdrand)

Status:	RESOLVED DUPLICATE of bug 353370

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	vex (other bugs)
Version First Reported In:	3.14 SVN
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-12-15 17:52 UTC by Edward Yang
Modified:	2018-04-18 11:32 UTC (History)
CC List:	2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Edward Yang 2017-12-15 17:52:49 UTC

I reproduced with Valgrind HEAD:

commit 3a5c5cecbd44b2daea146eeb5109d2b96353ef6d
Author: Ivo Raisr <ivosh@ivosh.net>
Date:   Wed Dec 13 16:59:03 2017 +0100

    Remove compiler warning about possibly uninitialized variable.
    
    This happened only with quite an old gcc version.
    Anyway, this commit simplifies the situation a bit.

/proc/cpuinfo

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
stepping        : 1
microcode       : 0xb000024
cpu MHz         : 2700.523
cache size      : 46080 KB
physical id     : 0
siblings        : 32
core id         : 0
cpu cores       : 16
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
bugs            :
bogomips        : 4600.09
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Full crash log:

==27432== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==27432==    This could cause spurious value errors to appear.
==27432==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==27432== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==27432==    This could cause spurious value errors to appear.
==27432==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==27432== Warning: noted but unhandled ioctl 0x7ff with no size/direction hints.
==27432==    This could cause spurious value errors to appear.
==27432==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==27432== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
==27432==    This could cause spurious value errors to appear.
==27432==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==27432== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
==27432==    This could cause spurious value errors to appear.
==27432==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==27432== Warning: set address range perms: large range [0x1000000000, 0x4e00000000) (noaccess)
==27432== Warning: set address range perms: large range [0x200000000, 0x700000000) (noaccess)
vex amd64->IR: unhandled instruction bytes: 0xF 0xC7 0xF0 0x89 0x44 0x24 0xC 0xF
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==27432== valgrind: Unrecognised instruction at address 0x2e48f009.
==27432==    at 0x2E48F009: std::(anonymous namespace)::__x86_rdrand() (random.cc:69)
==27432==    by 0x2E48F0F2: std::random_device::_M_getval() (random.cc:130)
==27432==    by 0x23D7F5B9: THCRandom_init (in /opt/conda/lib/python2.7/site-packages/torch/lib/libATen.so.1)
==27432==    by 0x23D580AB: THCudaInit (in /opt/conda/lib/python2.7/site-packages/torch/lib/libATen.so.1)
==27432==    by 0x23594EAF: at::Context::doInitCUDA() (in /opt/conda/lib/python2.7/site-packages/torch/lib/libATen.so.1)
==27432==    by 0x5225A98: __pthread_once_slow (pthread_once.c:116)
==27432==    by 0x103777C4: __gthread_once (gthr-default.h:699)
==27432==    by 0x103777C4: call_once<at::Context::lazyInitCUDA()::<lambda()> > (mutex:738)
==27432==    by 0x103777C4: lazyInitCUDA (Context.h:44)
==27432==    by 0x103777C4: THCPModule_initCuda(_object*) (Module.cpp:334)
==27432==    by 0x10377DCC: THCPModule_initExtension(_object*) (Module.cpp:368)
==27432==    by 0x4F1C191: PyEval_EvalFrameEx (in /opt/conda/lib/libpython2.7.so.1.0)
==27432==    by 0x4F1CDAB: PyEval_EvalFrameEx (in /opt/conda/lib/libpython2.7.so.1.0)
==27432==    by 0x4F1CDAB: PyEval_EvalFrameEx (in /opt/conda/lib/libpython2.7.so.1.0)
==27432==    by 0x4F1E4E8: PyEval_EvalCodeEx (in /opt/conda/lib/libpython2.7.so.1.0)
==27432== Your program just tried to execute an instruction that Valgrind
==27432== did not recognise.  There are two possible reasons for this.
==27432== 1. Your program has a bug and erroneously jumped to a non-code
==27432==    location.  If you are running Memcheck and you just saw a
==27432==    warning about a bad jump, it's probably your program's fault.
==27432== 2. The instruction is legitimate but Valgrind doesn't handle it,
==27432==    i.e. it's Valgrind's fault.  If you think this is the case or
==27432==    you are not sure, please let us know and we'll try to fix it.
==27432== Either way, Valgrind will now raise a SIGILL signal which will
==27432== probably kill your program.
==27432== 
==27432== Process terminating with default action of signal 4 (SIGILL): dumping core
==27432==  Illegal opcode at address 0x2E48F009
==27432==    at 0x2E48F009: std::(anonymous namespace)::__x86_rdrand() (random.cc:69)
==27432==    by 0x2E48F0F2: std::random_device::_M_getval() (random.cc:130)
==27432==    by 0x23D7F5B9: THCRandom_init (in /opt/conda/lib/python2.7/site-packages/torch/lib/libATen.so.1)
==27432==    by 0x23D580AB: THCudaInit (in /opt/conda/lib/python2.7/site-packages/torch/lib/libATen.so.1)
==27432==    by 0x23594EAF: at::Context::doInitCUDA() (in /opt/conda/lib/python2.7/site-packages/torch/lib/libATen.so.1)
==27432==    by 0x5225A98: __pthread_once_slow (pthread_once.c:116)
==27432==    by 0x103777C4: __gthread_once (gthr-default.h:699)
==27432==    by 0x103777C4: call_once<at::Context::lazyInitCUDA()::<lambda()> > (mutex:738)
==27432==    by 0x103777C4: lazyInitCUDA (Context.h:44)
==27432==    by 0x103777C4: THCPModule_initCuda(_object*) (Module.cpp:334)
==27432==    by 0x10377DCC: THCPModule_initExtension(_object*) (Module.cpp:368)
==27432==    by 0x4F1C191: PyEval_EvalFrameEx (in /opt/conda/lib/libpython2.7.so.1.0)
==27432==    by 0x4F1CDAB: PyEval_EvalFrameEx (in /opt/conda/lib/libpython2.7.so.1.0)
==27432==    by 0x4F1CDAB: PyEval_EvalFrameEx (in /opt/conda/lib/libpython2.7.so.1.0)
==27432==    by 0x4F1E4E8: PyEval_EvalCodeEx (in /opt/conda/lib/libpython2.7.so.1.0)

Let me know if you need more information.

Comment 1 Thomas A. F. Thorne 2018-04-18 10:56:11 UTC

I believe that I have replicated at least part of the issue covered by this bug report.  That could move the status to confirmed as happening to more than 1 user on one machine.  

The background is stored on https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1765001 

In short, I was running valgrind --leak-check=yes against a binary that I generated with g++ (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 and the resulting output from Valgrind included:
vex amd64->IR: unhandled instruction bytes: 0xF 0xC7 0xF0 0x89 0x6 0xF 0x42 0xC1
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==9424== valgrind: Unrecognised instruction at address 0x4ef1b15.
==9424==    at 0x4EF1B15: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==9424==    by 0x4EF1CB1: std::random_device::_M_getval() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==9424==    by 0x43CFBC: std::random_device::operator()() (random.h:1612)vex amd64->IR: unhandled instruction bytes: 0xF 0xC7 0xF0 0x89 0x6 0xF 0x42 0xC1
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==9424== valgrind: Unrecognised instruction at address 0x4ef1b15.
==9424==    at 0x4EF1B15: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==9424==    by 0x4EF1CB1: std::random_device::_M_getval() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==9424==    by 0x43CFBC: std::random_device::operator()() (random.h:1612)


The Ubuntu bug was generated after I got a crash report prompt.  It seems that running:
valgrind --leak-check=yes -v
without having the valgrind-dbg package installed casued something that Ubuntu did not like to happen.  Once I installed the debug package to see if it would help the Ubuntu debug tracing, the crash stopped happening.

Comment 2 Thomas A. F. Thorne 2018-04-18 11:02:24 UTC

Launchpad's magic has decided that my bug report might be https://bugs.launchpad.net/bugs/1301850 but I cannot view that bug to see if it is or is not.  Maybe someone here in the KDE group would be able to do so though so I include it for reference.  

Please let me know if there is any further action that I should take.

Comment 3 Tom Hughes 2018-04-18 11:30:48 UTC

This is a duplicate - the short version is that we don't support the RDRAND instruction and (since 3.12.0) we have deliberately excluded it from the CPU capabilities we announce.

If you are seeing this in current svn then your program is apparently ignoring the CPU capabilities we announce and trying to use the instruction regardless.

*** This bug has been marked as a duplicate of bug 353370 ***

Comment 4 Tom Hughes 2018-04-18 11:32:50 UTC

As this is your own code the most likely cause is that you have compiled with -march=native or similar, which is never a good idea when using valgrind as we are not always up to date with the latest instructions and there may be some (like RDRAND) which is is hard or impossible for us to support.