Valgrind version is 3.8.1 but that option doesn't exist in the dropdown. I'm running an application compiled with the Intel compiler suite (and the shared libraries are also compiled with the same suite) and the application runs (but has memory errors) without Valgrind, but crashes with the Intel runtime library saying it generated an Illegal Instruction. The error report is: forrtl: severe (168): Program Exception - illegal instruction Image PC Routine Line Source libirc.so 000000000405ED3C Unknown Unknown Unknown libhdf5.so.7 00000000061338E2 Unknown Unknown Unknown libhdf5.so.7 00000000061A73CE Unknown Unknown Unknown libhdf5.so.7 00000000061A9D6F Unknown Unknown Unknown libhdf5_fortran.s 0000000006B23233 Unknown Unknown Unknown libhdf5_fortran.s 0000000006B19FF9 Unknown Unknown Unknown les3d.x 00000000006D815C Unknown Unknown Unknown les3d.x 00000000006A983D Unknown Unknown Unknown les3d.x 00000000006DB0F4 Unknown Unknown Unknown les3d.x 00000000006F353D Unknown Unknown Unknown les3d.x 0000000000404CCC Unknown Unknown Unknown libc.so.6 0000000008F5F23D Unknown Unknown Unknown les3d.x 0000000000404BC9 Unknown Unknown Unknown The compile options for the library are only -O2 and the main executable is -O2 -xhost -g -traceback (which is also odd that it doesn't generate any traceback information for at least the les3d.x stack). Reproducible: Always Steps to Reproduce: 1. Any of my applications linked to the HDF5 libraries do this My CPU information is (it's a "12" core, only first core information is shown): processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Core(TM) i7 CPU 000 @ 3.33GHz stepping : 2 cpu MHz : 1600.000 cache size : 12288 KB physical id : 0 siblings : 12 core id : 0 cpu cores : 6 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid bogomips : 6674.04 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
Can you try running with --trace-signals=yes and post the output so we can see whether this is a real SIGILL from the kernel, or one valgrind has synthesised?
Created attachment 75523 [details] Full output with trace signals on
With the command valgrind --trace-signals=yes ./les3d.x 2> valgrindOutput.txt and then grep for sigaction: --24629-- sys_sigaction: sigNo 32, new 0x7fefff398, old 0x0, new flags 0x4000004 --24629-- sys_sigaction: sigNo 33, new 0x7fefff398, old 0x0, new flags 0x14000004 --24629-- sys_sigaction: sigNo 8, new 0x7fefff1d8, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 4, new 0x7fefff1d8, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 11, new 0x7fefff1d8, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 6, new 0x7fefff1d8, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 15, new 0x7fefff1d8, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 3, new 0x7fefff1d8, old 0x7fefff278, new flags 0x54000004 --24629-- sys_sigaction: sigNo 2, new 0x7fefff1d8, old 0x7fefff278, new flags 0x54000004 --24629-- sys_sigaction: sigNo 13, new 0x7feffec38, old 0x7feffecd8, new flags 0x14000000 --24629-- sys_sigaction: sigNo 10, new 0x7feffed58, old 0x7feffedf8, new flags 0x14000000 --24629-- sys_sigaction: sigNo 10, new 0x7feffec78, old 0x7feffed18, new flags 0x14000000 --24629-- sys_sigaction: sigNo 14, new 0x7feffec78, old 0x7feffed18, new flags 0x14000000 --24629-- sys_sigaction: sigNo 15, new 0x7feffec78, old 0x7feffed18, new flags 0x14000000 --24629-- sys_sigaction: sigNo 12, new 0x7feffec78, old 0x7feffed18, new flags 0x14000000 --24629-- sys_sigaction: sigNo 11, new 0x7feffbec0, old 0x7feffbf60, new flags 0x44000004 --24629-- sys_sigaction: sigNo 4, new 0x7feffbec0, old 0x7feffbf60, new flags 0x44000004 --24629-- sys_sigaction: sigNo 7, new 0x7feffbec0, old 0x7feffbf60, new flags 0x44000004 --24629-- sys_sigaction: sigNo 8, new 0x7feffbec0, old 0x7feffbf60, new flags 0x44000004 --24629-- sys_sigaction: sigNo 11, new 0x7feffbec0, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 4, new 0x7feffbec0, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 7, new 0x7feffbec0, old 0x0, new flags 0x4000000 --24629-- sys_sigaction: sigNo 8, new 0x7feffbec0, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 11, new 0x402bdf1e0, old 0x0, new flags 0x0 --24629-- sys_sigaction: sigNo 7, new 0x402bdf1e0, old 0x0, new flags 0x0 --24629-- sys_sigaction: sigNo 4, new 0x402bdf1e0, old 0x0, new flags 0x0 --24629-- sys_sigaction: sigNo 8, new 0x402bdf1e0, old 0x0, new flags 0x0 I attached the full output in case that's needed too, but it's full of the conditional jump uninitialized value and invalid read errors that come from using the Intel compiler suite.
Well it doesn't look like there was a SIGILL from the kernel so I think it must be the instruction decoder in the JIT which is triggering it, which is unusual as the JIT normally generates it's own error if it can't handle the instruction. The problem instruction appears to be at 0x405ED3C in __intel_sse4_strrchr in libirc.so - any chance you could disassemble that function and show us a few instructions around that point, given VEX hasn't reported the problem instruction in the normal way...
I've never disassembled anything before, so I'm sorry if this isn't what you needed (let me know and I'll adjust it): > objdump -t /opt/intel/2011.sp1.1/composer_xe_2011_sp1.7.256/compiler/lib/intel64/libirc.so | grep intel_sse4_strrchr 0000000000018d30 g F .text 0000000000000040 __intel_sse4_strrchr > objdump -d -f --start-address=0x0000000000018d30 --stop-address=0x0000000000018d70 /opt/intel/2011.sp1.1/composer_xe_2011_sp1.7.256/compiler/lib/intel64/libirc.so /opt/intel/2011.sp1.1/composer_xe_2011_sp1.7.256/compiler/lib/intel64/libirc.so: file format elf64-x86-64 architecture: i386:x86-64, flags 0x00000150: HAS_SYMS, DYNAMIC, D_PAGED start address 0x0000000000007040 Disassembly of section .text: 0000000000018d30 <__intel_sse4_strrchr>: 18d30: 56 push %rsi 18d31: 66 0f 38 17 c9 ptest %xmm1,%xmm1 18d36: 74 21 je 18d59 <__intel_sse4_strrchr+0x29> 18d38: 48 83 c2 10 add $0x10,%rdx 18d3c: 66 0f 3a 63 0a 42 pcmpistri $0x42,(%rdx),%xmm1 18d42: 74 09 je 18d4d <__intel_sse4_strrchr+0x1d> 18d44: 73 f2 jae 18d38 <__intel_sse4_strrchr+0x8> 18d46: 89 c8 mov %ecx,%eax 18d48: 48 03 c2 add %rdx,%rax 18d4b: eb eb jmp 18d38 <__intel_sse4_strrchr+0x8> 18d4d: 73 05 jae 18d54 <__intel_sse4_strrchr+0x24> 18d4f: 89 c8 mov %ecx,%eax 18d51: 48 03 c2 add %rdx,%rax 18d54: 48 83 c4 08 add $0x8,%rsp 18d58: c3 retq 18d59: 48 89 d0 mov %rdx,%rax 18d5c: e8 67 db fe ff callq 68c8 <__intel_sse4_strend@plt> 18d61: 48 83 c4 08 add $0x8,%rsp 18d65: c3 retq 18d66: 48 89 f6 mov %rsi,%rsi 18d69: 48 8d bf 00 00 00 00 lea 0x0(%rdi),%rdi
No, that's fine - the problem is the pcmpistri instruction which appears to be a variant we don't handle yet. Julian can probably say more.
It's unlikely, but is there a workaround and/or a development/unstable version that might have this fixed in it already or soon? I'm trying to track down a memory problem and the 6 other tools I've tried to use as a replacement due to this are completely useless but I'm hoping valgrind will find it. It's never let me down before.
Patrick LoPresti has another patch on some bug somewhere, which makes Memcheck work more sanely with optimised icc code. BTW, you should use the trunk and make sure you give the flag --partial-loads-ok=yes. Re the 0x42 pcmpistri case, that might just be a case of allowing that particular immediate value to be handled by the existing code. But to be sure we'd have to test it with none/tests/amd64/pcmpstr64.c/
I've just been discussing it with Julian on IRC and we think that just adding 0x42 to the list of cases in the switch statement at line 17069 of VEX/priv/guest_amd64_toIR.c may be all that is needed if you want to give that a try.
I tried what I think you suggested and ran with /data3/valgrindInstall/bin/valgrind --partial-loads-ok=yes ./les3d.x: > svn diff VEX/priv/guest_amd64_toIR.c Index: VEX/priv/guest_amd64_toIR.c =================================================================== --- VEX/priv/guest_amd64_toIR.c (revision 2574) +++ VEX/priv/guest_amd64_toIR.c (working copy) @@ -17070,7 +17070,7 @@ case 0x00: case 0x02: case 0x08: case 0x0A: case 0x0C: case 0x12: case 0x1A: case 0x38: case 0x3A: case 0x44: case 0x4A: - case 0x46: + case 0x46: case 0x42: break; case 0x01: // the 16-bit character versions of the above case 0x03: case 0x09: case 0x0B: case 0x0D: case 0x13: And it resulted in: vex: priv/guest_amd64_helpers.c:3266 (amd64g_dirtyhelper_PCMPxSTRx): Assertion `ok' failed. vex storage: T total 829967584 bytes allocated vex storage: P total 640 bytes allocated valgrind: the 'impossible' happened: LibVEX called failure_exit(). ==20981== at 0x38056EB6: report_and_quit (m_libcassert.c:235) ==20981== by 0x38056F13: panic (m_libcassert.c:319) ==20981== by 0x380570C8: vgPlain_core_panic_at (m_libcassert.c:324) ==20981== by 0x380570DA: vgPlain_core_panic (m_libcassert.c:329) ==20981== by 0x3806EC22: failure_exit (m_translate.c:713) ==20981== by 0x380FD1A8: vex_assert_fail (main_util.c:219) ==20981== by 0x3813B6F4: amd64g_dirtyhelper_PCMPxSTRx (guest_amd64_helpers.c:3266) ==20981== by 0x403B1245D: ??? ==20981== by 0x402ABEF4F: ??? sched status: running_tid=1 Thread 1: status = VgTs_Runnable ==20981== at 0x7D6F9EB: __intel_sse4_strrchr (in /opt/intel/2013.initial/composer_xe_2013.0.079/compiler/lib/intel64/libirc.so) ==20981== by 0x612EC11: H5_build_extpath (in /opt/hdf5/intel/2013.initial/lib64/libhdf5.so.7.0.2) ==20981== by 0x61A25BB: H5F_open (in /opt/hdf5/intel/2013.initial/lib64/libhdf5.so.7.0.2) ==20981== by 0x61A4F51: H5Fopen (in /opt/hdf5/intel/2013.initial/lib64/libhdf5.so.7.0.2) ==20981== by 0x6B245C2: h5fopen_c_ (in /opt/hdf5/intel/2013.initial/lib64/libhdf5_fortran.so.7.0.2) ==20981== by 0x6B1BFA4: h5f_mp_h5fopen_f_ (in /opt/hdf5/intel/2013.initial/lib64/libhdf5_fortran.so.7.0.2) ==20981== by 0x6CD9C9: gridoperations_m_mp_readgridhdf5_ (in /data1/randomPack/bin_randPackMSDM_van/les3d.x) ==20981== by 0x69F967: gridoperations_m_mp_readgrid_ (in /data1/randomPack/bin_randPackMSDM_van/les3d.x) ==20981== by 0x6D0953: simulationsetup_m_mp_readinputsandallocatedata_ (in /data1/randomPack/bin_randPackMSDM_van/les3d.x) ==20981== by 0x6E8194: MAIN__ (in /data1/randomPack/bin_randPackMSDM_van/les3d.x) ==20981== by 0x404BDB: main (in /data1/randomPack/bin_randPackMSDM_van/les3d.x) I have joined the IRC channel in the event there are other quick things to test out. I've never looked at the source before, but I'm willing to put in any work I can to help resolve it.
I think you need a similar change to the switch statement in guest_generic_x87.c:compute_PCMPxSTRx() Also you definitely want to add this test case to none/tests/amd64/pcmpstr64.c. (Actually getting the validity bit tracking to work properly on this one will be... Fun. I will take a crack at it eventually.)
(In reply to comment #11) > I think you need a similar change to the switch statement in > guest_generic_x87.c:compute_PCMPxSTRx() > > Also you definitely want to add this test case to > none/tests/amd64/pcmpstr64.c. > > (Actually getting the validity bit tracking to work properly on this one > will be... Fun. I will take a crack at it eventually.) Yes, we discussed that in the IRC channel. Adding it there does in fact make it run, sort of. I'm getting another illegal instruction in another function (I'm sure it's the same issue, different instruction) but I can't figure out how to translate the instruction into the ID's in the code. Ie. how pcmpistri = 0x42. Once I can figure that out, I can identify the problem instruction.
From my searching through guest_x86_toIR.c, I think the problem instruction is pcmpistrm. That's the only one that I don't find any reference to in guest_x86_toIR.c.
Created attachment 75557 [details] Added imm8 codes Two imm8 codes needed to be added, one for pcmpistri (0x42) and one for pcmpistrm (0x62).
Patch added that makes my case run. Testing not included in the patch, but I could take a crack at it once I finish fixing my own code that started all this! But that may be awhile.
I just hit this same bug today in valgrind 3.9. I added the same imm8 codes and that got me past the __intel_sse4_strrchr call only to have it die in __intel_sse4_strspn. Using the same objdump trick I found calls to pcmpistrm with a value of 0x72. [arwild1@hplcslsp1 ~]$ /apps_hpc/GCC/gcc-4.7.2/bin/objdump -t /apps_hpc/intel/composer_xe_2013.3.163/compiler/lib/intel64/libintlc.so.5 | egrep __intel_sse4_strspn 00000000000209e0 g F .text 0000000000000310 __intel_sse4_strspn [arwild1@hplcslsp1 ~]$ /apps_hpc/GCC/gcc-4.7.2/bin/objdump -d -f --start-address=0x00000000000209e0 --stop-address=0x0000000000020cf0 /apps_hpc/intel/composer_xe_2013.3.163/compiler/lib/intel64/libintlc.so.5 | egrep pcmp 20a7b: 66 0f 74 c8 pcmpeqb %xmm0,%xmm1 20c41: 66 0f 3a 62 d1 72 pcmpistrm $0x72,%xmm1,%xmm2 20c70: 66 0f 3a 62 e1 72 pcmpistrm $0x72,%xmm1,%xmm4 20c99: 66 0f 3a 63 c9 3a pcmpistri $0x3a,%xmm1,%xmm1 (NOTE: 0x3a is already handled in the above source files). After adding 0x72 to the same two files I was able to run without error. I'll attach my diff against 3.9.0 below.
Created attachment 84130 [details] adds imm code 0x72 0x42 and 0x62 for pcmpistri / pcmpistrm
At the risk of repeating myself... Fixing the bug is good, of course. But if you want it to stay fixed, please create a test case and add it to none/tests/amd64/pcmpstr64.c. The test case should have the property that it fails without your patch and succeeds with it. It is impossible to overstate how important this is. Sooner or later, some compiler will start emitting these instructions inline, at which point Valgrind will need to expand them into IR code that memcheck can actually instrument. (A few memcheck suppression rules for a handful of string library routines is feasible. Hundreds or thousands of them for inline operations is not.) Taking a few minutes now to add a test case or two could save somebody else hours of time in the future.