With the latest git version of fio, the latest git version of Valgrind and gcc version 7.3.1 I encountered the following: $ ~bart/software/valgrind/vg-in-place ~bart/software/fio/fio --name=sata --filename=/dev/sdc --ioengine=libaio --ioscheduler=none --rw=randread --offset=200G --size=200G --direct=1 --thread=1 --iodepth=64 --norandommap=1 ==25027== Memcheck, a memory error detector ==25027== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==25027== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info [ ... ] vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x48 0x6F 0xD 0xE1 0xEC 0x8 0x0 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==25027== valgrind: Unrecognised instruction at address 0x40f695. ==25027== at 0x40F695: fio_client_hash_init (client.c:94) ==25027== by 0x48D48C: __libc_csu_init (elf-init.c:88) ==25027== by 0x5E5DA17: (below main) (in /lib64/libc-2.27.so) ==25027== Your program just tried to execute an instruction that Valgrind ==25027== did not recognise. There are two possible reasons for this. ==25027== 1. Your program has a bug and erroneously jumped to a non-code ==25027== location. If you are running Memcheck and you just saw a ==25027== warning about a bad jump, it's probably your program's fault. ==25027== 2. The instruction is legitimate but Valgrind doesn't handle it, ==25027== i.e. it's Valgrind's fault. If you think this is the case or ==25027== you are not sure, please let us know and we'll try to fix it. ==25027== Either way, Valgrind will now raise a SIGILL signal which will ==25027== probably kill your program. ==25027== ==25027== Process terminating with default action of signal 4 (SIGILL): dumping core ==25027== Illegal opcode at address 0x40F695 ==25027== at 0x40F695: fio_client_hash_init (client.c:94) ==25027== by 0x48D48C: __libc_csu_init (elf-init.c:88) ==25027== by 0x5E5DA17: (below main) (in /lib64/libc-2.27.so)
This is reproducible with the latest Valgrind version (git commit ID 0375954c18d918045d4d7bfb30061f445511c3d3).
I'm sure you're correct. But I'm still a bit surprised to see this. Can you do some objdump -d -ery to find out what the instruction is?
*** Bug 394582 has been marked as a duplicate of this bug. ***
*** Bug 411303 has been marked as a duplicate of this bug. ***
*** Bug 409999 has been marked as a duplicate of this bug. ***
As I said on 393351 according to the Intel manual 0x62 is BOUND but that is not valid in 64 bit mode. As far as I can see in the latest manual it hasn't been reused for anything else so this is very odd indeed... A disassembly of something that is triggering this would probably be helpful.
Sorry I mean on https://bugs.kde.org/show_bug.cgi?id=409999...
*** Bug 414053 has been marked as a duplicate of this bug. ***
*** Bug 414944 has been marked as a duplicate of this bug. ***
only first two bytes match `unhandled instruction bytes: 0x62 0xF1 0x7D 0x48 0xEF 0xC0 0xC5 0xFB 0x11 0x40` reproduce on Debian 9 with rocksdb db_bench git clone https://github.com/facebook/rocksdb.git cd rocksdb mkdir build cd build cmake .. make -j valgrind ./db_bench root@n18-065-078:~/open_source/rocksdb/build# valgrind ./db_bench ==1607038== Memcheck, a memory error detector ==1607038== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==1607038== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info ==1607038== Command: ./db_bench ==1607038== vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0x7D 0x48 0xEF 0xC0 0xC5 0xFB 0x11 0x40 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==1607038== valgrind: Unrecognised instruction at address 0x5a33409. ==1607038== at 0x5A33409: rocksdb::AdvancedColumnFamilyOptions::AdvancedColumnFamilyOptions() (in /root/open_source/rocksdb/build/librocksdb.so.6.6.0) ==1607038== by 0x5A33C43: rocksdb::ColumnFamilyOptions::ColumnFamilyOptions() (in /root/open_source/rocksdb/build/librocksdb.so.6.6.0) ==1607038== by 0x5A45CA7: __static_initialization_and_destruction_0(int, int) (in /root/open_source/rocksdb/build/librocksdb.so.6.6.0) ==1607038== by 0x5A484D9: _GLOBAL__sub_I_options_helper.cc (in /root/open_source/rocksdb/build/librocksdb.so.6.6.0) ==1607038== by 0x400F799: call_init.part.0 (dl-init.c:72) ==1607038== by 0x400F8AA: call_init (dl-init.c:30) ==1607038== by 0x400F8AA: _dl_init (dl-init.c:120) ==1607038== by 0x4000C59: ??? (in /lib/x86_64-linux-gnu/ld-2.24.so) ==1607038== Your program just tried to execute an instruction that Valgrind ==1607038== did not recognise. There are two possible reasons for this. ==1607038== 1. Your program has a bug and erroneously jumped to a non-code ==1607038== location. If you are running Memcheck and you just saw a ==1607038== warning about a bad jump, it's probably your program's fault. ==1607038== 2. The instruction is legitimate but Valgrind doesn't handle it, ==1607038== i.e. it's Valgrind's fault. If you think this is the case or ==1607038== you are not sure, please let us know and we'll try to fix it. ==1607038== Either way, Valgrind will now raise a SIGILL signal which will ==1607038== probably kill your program. ==1607038== ==1607038== Process terminating with default action of signal 4 (SIGILL): dumping core ==1607038== Illegal opcode at address 0x5A33409 ==1607038== at 0x5A33409: rocksdb::AdvancedColumnFamilyOptions::AdvancedColumnFamilyOptions() (in /root/open_source/rocksdb/build/librocksdb.so.6.6.0) ==1607038== by 0x5A33C43: rocksdb::ColumnFamilyOptions::ColumnFamilyOptions() (in /root/open_source/rocksdb/build/librocksdb.so.6.6.0) ==1607038== by 0x5A45CA7: __static_initialization_and_destruction_0(int, int) (in /root/open_source/rocksdb/build/librocksdb.so.6.6.0) ==1607038== by 0x5A484D9: _GLOBAL__sub_I_options_helper.cc (in /root/open_source/rocksdb/build/librocksdb.so.6.6.0) ==1607038== by 0x400F799: call_init.part.0 (dl-init.c:72) ==1607038== by 0x400F8AA: call_init (dl-init.c:30) ==1607038== by 0x400F8AA: _dl_init (dl-init.c:120) ==1607038== by 0x4000C59: ??? (in /lib/x86_64-linux-gnu/ld-2.24.so) ==1607038== ==1607038== HEAP SUMMARY: ==1607038== in use at exit: 92,064 bytes in 1,424 blocks ==1607038== total heap usage: 1,967 allocs, 543 frees, 180,484 bytes allocated ==1607038== ==1607038== LEAK SUMMARY: ==1607038== definitely lost: 0 bytes in 0 blocks ==1607038== indirectly lost: 0 bytes in 0 blocks ==1607038== possibly lost: 0 bytes in 0 blocks ==1607038== still reachable: 92,064 bytes in 1,424 blocks ==1607038== suppressed: 0 bytes in 0 blocks ==1607038== Rerun with --leak-check=full to see details of leaked memory ==1607038== ==1607038== For counts of detected and suppressed errors, rerun with: -v ==1607038== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Illegal instruction (core dumped)
Linux n18-065-078 4.19.28.bsk.4-amd64 #4.19.28.bsk.4 SMP Debian 4.19.28.bsk.4 Wed Apr 24 12:15:15 UTC x86_64 GNU/Linux
Can you please try and disassemble the problem function - this should do it I think: objdump --disassemble=_ZN7rocksdb27AdvancedColumnFamilyOptionsC2Ev /root/open_source/rocksdb/build/librocksdb.so.6.6.0 Then post the output here?
This bug has been reported 3 times now (also as 414944 and 409999). All very strange.
This bug has been reported 5 times in the past year, as bug numbers 393351, 409999, 414944, 411303 and 414053. I would like to fix it. I tried the steps-to-reproduce shown in bugs 393351 and 414053, but without success: I can't reproduce it either with the trunk or with 3.15.0. Without being able to reproduce it, I can't fix it. The first unhandled byte, 0x62, isn't the start of any known instruction (in 64-bit mode), so I suspect there has been some failure earlier on. Maybe Valgrind's instruction decoder lost track of where it was on the previous instruction. That's just a guess, though. What would be really helpful is if someone could reproduce the failure, and then use objdump -d to show the instructions around the failure point. I can give guidance on how to use objdump if that helps. If you want to try this, I suggest you first reproduce the failure while giving --demangle=no --sym-offsets=yes to Valgrind. That will make it much easier to relate the stack trace that Valgrind produces at the failure point, to the output of objdump -d.
Created attachment 124846 [details] objdump.txt Hi Julian I can still reproduce the failure. When i run valgrind with '--demangle=no --sym-offsets=yes' i get a stack dump as follows: ----- vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFE 0x8 0x6F 0x45 0x0 0xC5 0xF8 0x11 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==6297== valgrind: Unrecognised instruction at address 0x4a54820. ==6297== at 0x4A54820: H5P_dup_prop+64 (in /usr/lib64/libhdf5.so.103.1.0) ==6297== by 0x4A56465: H5P__do_prop_cb1.part.13+85 (in /usr/lib64/libhdf5.so.103.1.0) ==6297== by 0x4A55C01: H5P_create_id+385 (in /usr/lib64/libhdf5.so.103.1.0) ==6297== by 0x4A56155: H5P__init_package+373 (in /usr/lib64/libhdf5.so.103.1.0) ==6297== by 0x4A562C7: H5P_init+39 (in /usr/lib64/libhdf5.so.103.1.0) ==6297== by 0x48CA0ED: H5_init_library+285 (in /usr/lib64/libhdf5.so.103.1.0) ==6297== by 0x48CA90F: H5open+63 (in /usr/lib64/libhdf5.so.103.1.0) ==6297== by 0x1091C8: main+67 (h5simple.cpp:6) ----- I have no experience at all with objdump. I tried 'objdump -d ./h5s_8.3.0' (h5s_8.3.0 is my test-application compiled with g++ 8.3.0), I see some references to hdf5 calls (e.g. H5open, but no H5P_dup_prop), but i can't really interpret the output. I attached both the output of Valgrind and of objdump to this mail. Let me know how i can be of further help Regards Jody On Sun, Dec 29, 2019 at 10:34 AM Julian Seward <bugzilla_noreply@kde.org> wrote: > https://bugs.kde.org/show_bug.cgi?id=393351 > > --- Comment #14 from Julian Seward <jseward@acm.org> --- > This bug has been reported 5 times in the past year, as bug numbers 393351, > 409999, 414944, 411303 and 414053. I would like to fix it. I tried the > steps-to-reproduce shown in bugs 393351 and 414053, but without success: I > can't reproduce it either with the trunk or with 3.15.0. > > Without being able to reproduce it, I can't fix it. The first unhandled > byte, > 0x62, isn't the start of any known instruction (in 64-bit mode), so I > suspect > there has been some failure earlier on. Maybe Valgrind's instruction > decoder > lost track of where it was on the previous instruction. That's just a > guess, > though. > > What would be really helpful is if someone could reproduce the failure, and > then use objdump -d to show the instructions around the failure point. I > can > give guidance on how to use objdump if that helps. If you want to try > this, I > suggest you first reproduce the failure while giving --demangle=no > --sym-offsets=yes to Valgrind. That will make it much easier to relate the > stack trace that Valgrind produces at the failure point, to the output of > objdump -d. > > -- > You are receiving this mail because: > You are on the CC list for the bug.
Created attachment 124847 [details] vg_log.txt
It's /usr/lib64/libhdf5.so.103.1.0 you need to run objdump on, not your application, as that library is where the problem instruction is found.
As Tom says .. > ==6297== valgrind: Unrecognised instruction at address 0x4a54820. > ==6297== at 0x4A54820: H5P_dup_prop+64 (in /usr/lib64/libhdf5.so.103.1.0) > I have no experience at all with objdump. I tried 'objdump -d ./h5s_8.3.0' Nearly, not quite right. The problem is in /usr/lib64/libhdf5.so.103.1.0. Can you try objdump -d /usr/lib64/libhdf5.so.103.1.0 instead? Then look for the code near the entry point (to be precise, at offset 64 from) for H5P_dup_prop. You can find that entry point by searching the objdump output for the string H5P_dup_prop>:
In fact you should be able to just disassemble that function with: objdump --disassemble=H5P_dup_prop /usr/lib64/libhdf5.so.103.1.0
Created attachment 124849 [details] attachment-781-0.html I did objdump -d /usr/lib64/libhdf5.so.103.1.0 > objdump_libhdf5.so.103.1.0.txt but the string "H5P_dup_prop" is not in the output. The command 'grep "H5P_d" objdump_libhdf5.so.103.1.0.txt' returns nothing. The next function in the stack (H5P__do_prop_cb1) does also not occur in the output of objdump, but all others are. I sent a dropbox link for objdump_libhdf5.so.103.1.0.txt to bug-control@kde.org On Thu, Jan 2, 2020 at 12:28 PM Julian Seward <bugzilla_noreply@kde.org> wrote: > https://bugs.kde.org/show_bug.cgi?id=393351 > > --- Comment #18 from Julian Seward <jseward@acm.org> --- > As Tom says .. > > > ==6297== valgrind: Unrecognised instruction at address 0x4a54820. > > ==6297== at 0x4A54820: H5P_dup_prop+64 (in > /usr/lib64/libhdf5.so.103.1.0) > > > I have no experience at all with objdump. I tried 'objdump -d > ./h5s_8.3.0' > > Nearly, not quite right. The problem is in /usr/lib64/libhdf5.so.103.1.0. > > Can you try objdump -d /usr/lib64/libhdf5.so.103.1.0 instead? > > Then look for the code near the entry point (to be precise, at offset 64 > from) > for H5P_dup_prop. You can find that entry point by searching the objdump > output for the string > > H5P_dup_prop>: > > -- > You are receiving this mail because: > You are on the CC list for the bug.
Created attachment 124852 [details] attachment-1766-0.html oops - this did not work. Which mail address should i use to share the link with? On Thu, Jan 2, 2020 at 11:59 AM jody <jody.xha@gmail.com> wrote: > I did > objdump -d /usr/lib64/libhdf5.so.103.1.0 > > objdump_libhdf5.so.103.1.0.txt > but the string "H5P_dup_prop" is not in the output. > The command 'grep "H5P_d" objdump_libhdf5.so.103.1.0.txt' returns nothing. > The next function in the stack (H5P__do_prop_cb1) does also not occur in > the output of objdump, but all others are. > > I sent a dropbox link for objdump_libhdf5.so.103.1.0.txt to > bug-control@kde.org > > > On Thu, Jan 2, 2020 at 12:28 PM Julian Seward <bugzilla_noreply@kde.org> > wrote: > >> https://bugs.kde.org/show_bug.cgi?id=393351 >> >> --- Comment #18 from Julian Seward <jseward@acm.org> --- >> As Tom says .. >> >> > ==6297== valgrind: Unrecognised instruction at address 0x4a54820. >> > ==6297== at 0x4A54820: H5P_dup_prop+64 (in >> /usr/lib64/libhdf5.so.103.1.0) >> >> > I have no experience at all with objdump. I tried 'objdump -d >> ./h5s_8.3.0' >> >> Nearly, not quite right. The problem is in /usr/lib64/libhdf5.so.103.1.0. >> >> Can you try objdump -d /usr/lib64/libhdf5.so.103.1.0 instead? >> >> Then look for the code near the entry point (to be precise, at offset 64 >> from) >> for H5P_dup_prop. You can find that entry point by searching the objdump >> output for the string >> >> H5P_dup_prop>: >> >> -- >> You are receiving this mail because: >> You are on the CC list for the bug. > >
I was afraid that might happen as it's a local function that isn't exported... Are you able to just gzip the output it and attach it here?
(In reply to Tom Hughes from comment #22) > I was afraid that might happen as it's a local function that isn't > exported... Yeah, that happened to me -- I tried to find the function in my F31 installation of /usr/lib64/libhdf5.so.103.1.0 and got exactly nowhere. Even after installing the debuginfo .rpm. (But then it may be that I gave objdump, nm or readelf the wrong flags.)
Ok - here it is On Thu, Jan 2, 2020 at 1:32 PM Tom Hughes <bugzilla_noreply@kde.org> wrote: > https://bugs.kde.org/show_bug.cgi?id=393351 > > --- Comment #22 from Tom Hughes <tom@compton.nu> --- > I was afraid that might happen as it's a local function that isn't > exported... > > Are you able to just gzip the output it and attach it here? > > -- > You are receiving this mail because: > You are on the CC list for the bug.
(In reply to jody from comment #24) > Ok - here it is Where is it? I don't see it.
Here's a link to the dropbox https://tinyurl.com/tmtxhzr
So there really are instructions starting with 0x62 it seems... Some examples: 5ca6f: 62 f1 fe 08 6f 45 00 vmovdqu64 0x0(%rbp),%xmm0 90bf6: 62 e1 7c 08 11 56 0a vmovups %xmm18,0xa0(%rsi) 92ab0: 62 f1 fd 08 6f 0d 46 vmovdqa64 0x2e8e46(%rip),%xmm1 I think this is the H5P_dup_prop one from the most recent report: 1d2820: 62 f1 fe 08 6f 45 00 vmovdqu64 0x0(%rbp),%xmm0
So 0x62 is an EVEX prefix, but the opcode map fails to mention that even in the October 2019 edition of the Intel manual...
So the four byte EVEX prefix of 62 f1 fe 08 decodes as: EVEX.mm = 0b01 / 1 EVEX.pp = 0b10 / 2 EVEX.RXB = 0b111 / 7 EVEX.R’ = 0b1 / 1 EVEX.X = 0b1 / 1 EVEX.vvvv = 0b1111 / 15 EVEX.V’ = 0b1 / 1 EVEX.aaa = 0b000 / 0 EVEX.W = 0b1 / 1 EVEX.z = 0b0 / 0 EVEX.b = 0b0 / 0 EVEX.L’L = 0b00 / 0 or EVEX.128.F3.0F.W1 for short, which makes 62 f1 fe 08 6f become: EVEX.128.F3.0F.W1 6F /r VMOVDQU64 xmm1 {k1}{z}, xmm2/m128
The short version however is that valgrind doesn't support AVX512 yet so don't compile your code for AVX512 if you want to be able to use valgrind on it, and that includes any libraries you use. Note that valgrind won't advertise AVX512 support in CPUID even if your CPU supports it, but this code has been compiled to use it unconditionally. Bug https://bugs.kde.org/show_bug.cgi?id=383010 is tracking AVX512 support.
Are there any recommendations on compiler switches for gcc? Do some newer ersions of it use AVX512 by default?
No version of gcc is going to enable AVX512 by default though I think you may be able to specify a different default target when compiling gcc. Usually this sort of problem is caused by using -march=native which tells gcc to compile for the current machine - typically this happens with distributions like gentoo where everything is compile from source.
I encountered the same kind of error vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFE 0x8 0x6F 0x46 0x1 0xC5 0xF8 0x11 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==15878== valgrind: Unrecognised instruction at address 0x4a0ed10. in a C++ application using std::map and std::string. Should i give more details?
No, we understand the problem now. You are trying to run valgrind on code compiled to target the AVX512 instruction set extensions and that is not currently supported by valgrind.
This is an instruction with an EVEX prefix which is part of the AVX512 instruction support that is currently being worked on. *** This bug has been marked as a duplicate of bug 383010 ***