| Summary: | Application being checked generates illegal instruction | ||
|---|---|---|---|
| Product: | [Developer tools] valgrind | Reporter: | tim.gallagher |
| Component: | memcheck | Assignee: | Julian Seward <jseward> |
| Status: | REPORTED --- | ||
| Severity: | critical | CC: | alan, cpigat242, lopresti, tim.gallagher, tom |
| Priority: | NOR | ||
| Version First Reported In: | 3.8.0 | ||
| Target Milestone: | --- | ||
| Platform: | openSUSE | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: |
Full output with trace signals on
Added imm8 codes adds imm code 0x72 0x42 and 0x62 for pcmpistri / pcmpistrm |
||
|
Description
tim.gallagher
2012-11-28 21:48:56 UTC
Can you try running with --trace-signals=yes and post the output so we can see whether this is a real SIGILL from the kernel, or one valgrind has synthesised? Created attachment 75523 [details]
Full output with trace signals on
With the command valgrind --trace-signals=yes ./les3d.x 2> valgrindOutput.txt and then grep for sigaction: --24629-- sys_sigaction: sigNo 32, new 0x7fefff398, old 0x0, new flags 0x4000004 --24629-- sys_sigaction: sigNo 33, new 0x7fefff398, old 0x0, new flags 0x14000004 --24629-- sys_sigaction: sigNo 8, new 0x7fefff1d8, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 4, new 0x7fefff1d8, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 11, new 0x7fefff1d8, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 6, new 0x7fefff1d8, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 15, new 0x7fefff1d8, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 3, new 0x7fefff1d8, old 0x7fefff278, new flags 0x54000004 --24629-- sys_sigaction: sigNo 2, new 0x7fefff1d8, old 0x7fefff278, new flags 0x54000004 --24629-- sys_sigaction: sigNo 13, new 0x7feffec38, old 0x7feffecd8, new flags 0x14000000 --24629-- sys_sigaction: sigNo 10, new 0x7feffed58, old 0x7feffedf8, new flags 0x14000000 --24629-- sys_sigaction: sigNo 10, new 0x7feffec78, old 0x7feffed18, new flags 0x14000000 --24629-- sys_sigaction: sigNo 14, new 0x7feffec78, old 0x7feffed18, new flags 0x14000000 --24629-- sys_sigaction: sigNo 15, new 0x7feffec78, old 0x7feffed18, new flags 0x14000000 --24629-- sys_sigaction: sigNo 12, new 0x7feffec78, old 0x7feffed18, new flags 0x14000000 --24629-- sys_sigaction: sigNo 11, new 0x7feffbec0, old 0x7feffbf60, new flags 0x44000004 --24629-- sys_sigaction: sigNo 4, new 0x7feffbec0, old 0x7feffbf60, new flags 0x44000004 --24629-- sys_sigaction: sigNo 7, new 0x7feffbec0, old 0x7feffbf60, new flags 0x44000004 --24629-- sys_sigaction: sigNo 8, new 0x7feffbec0, old 0x7feffbf60, new flags 0x44000004 --24629-- sys_sigaction: sigNo 11, new 0x7feffbec0, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 4, new 0x7feffbec0, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 7, new 0x7feffbec0, old 0x0, new flags 0x4000000 --24629-- sys_sigaction: sigNo 8, new 0x7feffbec0, old 0x0, new flags 0x54000004 --24629-- sys_sigaction: sigNo 11, new 0x402bdf1e0, old 0x0, new flags 0x0 --24629-- sys_sigaction: sigNo 7, new 0x402bdf1e0, old 0x0, new flags 0x0 --24629-- sys_sigaction: sigNo 4, new 0x402bdf1e0, old 0x0, new flags 0x0 --24629-- sys_sigaction: sigNo 8, new 0x402bdf1e0, old 0x0, new flags 0x0 I attached the full output in case that's needed too, but it's full of the conditional jump uninitialized value and invalid read errors that come from using the Intel compiler suite. Well it doesn't look like there was a SIGILL from the kernel so I think it must be the instruction decoder in the JIT which is triggering it, which is unusual as the JIT normally generates it's own error if it can't handle the instruction. The problem instruction appears to be at 0x405ED3C in __intel_sse4_strrchr in libirc.so - any chance you could disassemble that function and show us a few instructions around that point, given VEX hasn't reported the problem instruction in the normal way... I've never disassembled anything before, so I'm sorry if this isn't what you needed (let me know and I'll adjust it): > objdump -t /opt/intel/2011.sp1.1/composer_xe_2011_sp1.7.256/compiler/lib/intel64/libirc.so | grep intel_sse4_strrchr 0000000000018d30 g F .text 0000000000000040 __intel_sse4_strrchr > objdump -d -f --start-address=0x0000000000018d30 --stop-address=0x0000000000018d70 /opt/intel/2011.sp1.1/composer_xe_2011_sp1.7.256/compiler/lib/intel64/libirc.so /opt/intel/2011.sp1.1/composer_xe_2011_sp1.7.256/compiler/lib/intel64/libirc.so: file format elf64-x86-64 architecture: i386:x86-64, flags 0x00000150: HAS_SYMS, DYNAMIC, D_PAGED start address 0x0000000000007040 Disassembly of section .text: 0000000000018d30 <__intel_sse4_strrchr>: 18d30: 56 push %rsi 18d31: 66 0f 38 17 c9 ptest %xmm1,%xmm1 18d36: 74 21 je 18d59 <__intel_sse4_strrchr+0x29> 18d38: 48 83 c2 10 add $0x10,%rdx 18d3c: 66 0f 3a 63 0a 42 pcmpistri $0x42,(%rdx),%xmm1 18d42: 74 09 je 18d4d <__intel_sse4_strrchr+0x1d> 18d44: 73 f2 jae 18d38 <__intel_sse4_strrchr+0x8> 18d46: 89 c8 mov %ecx,%eax 18d48: 48 03 c2 add %rdx,%rax 18d4b: eb eb jmp 18d38 <__intel_sse4_strrchr+0x8> 18d4d: 73 05 jae 18d54 <__intel_sse4_strrchr+0x24> 18d4f: 89 c8 mov %ecx,%eax 18d51: 48 03 c2 add %rdx,%rax 18d54: 48 83 c4 08 add $0x8,%rsp 18d58: c3 retq 18d59: 48 89 d0 mov %rdx,%rax 18d5c: e8 67 db fe ff callq 68c8 <__intel_sse4_strend@plt> 18d61: 48 83 c4 08 add $0x8,%rsp 18d65: c3 retq 18d66: 48 89 f6 mov %rsi,%rsi 18d69: 48 8d bf 00 00 00 00 lea 0x0(%rdi),%rdi No, that's fine - the problem is the pcmpistri instruction which appears to be a variant we don't handle yet. Julian can probably say more. It's unlikely, but is there a workaround and/or a development/unstable version that might have this fixed in it already or soon? I'm trying to track down a memory problem and the 6 other tools I've tried to use as a replacement due to this are completely useless but I'm hoping valgrind will find it. It's never let me down before. Patrick LoPresti has another patch on some bug somewhere, which makes Memcheck work more sanely with optimised icc code. BTW, you should use the trunk and make sure you give the flag --partial-loads-ok=yes. Re the 0x42 pcmpistri case, that might just be a case of allowing that particular immediate value to be handled by the existing code. But to be sure we'd have to test it with none/tests/amd64/pcmpstr64.c/ I've just been discussing it with Julian on IRC and we think that just adding 0x42 to the list of cases in the switch statement at line 17069 of VEX/priv/guest_amd64_toIR.c may be all that is needed if you want to give that a try. I tried what I think you suggested and ran with /data3/valgrindInstall/bin/valgrind --partial-loads-ok=yes ./les3d.x:
> svn diff VEX/priv/guest_amd64_toIR.c
Index: VEX/priv/guest_amd64_toIR.c
===================================================================
--- VEX/priv/guest_amd64_toIR.c (revision 2574)
+++ VEX/priv/guest_amd64_toIR.c (working copy)
@@ -17070,7 +17070,7 @@
case 0x00:
case 0x02: case 0x08: case 0x0A: case 0x0C: case 0x12:
case 0x1A: case 0x38: case 0x3A: case 0x44: case 0x4A:
- case 0x46:
+ case 0x46: case 0x42:
break;
case 0x01: // the 16-bit character versions of the above
case 0x03: case 0x09: case 0x0B: case 0x0D: case 0x13:
And it resulted in:
vex: priv/guest_amd64_helpers.c:3266 (amd64g_dirtyhelper_PCMPxSTRx): Assertion `ok' failed.
vex storage: T total 829967584 bytes allocated
vex storage: P total 640 bytes allocated
valgrind: the 'impossible' happened:
LibVEX called failure_exit().
==20981== at 0x38056EB6: report_and_quit (m_libcassert.c:235)
==20981== by 0x38056F13: panic (m_libcassert.c:319)
==20981== by 0x380570C8: vgPlain_core_panic_at (m_libcassert.c:324)
==20981== by 0x380570DA: vgPlain_core_panic (m_libcassert.c:329)
==20981== by 0x3806EC22: failure_exit (m_translate.c:713)
==20981== by 0x380FD1A8: vex_assert_fail (main_util.c:219)
==20981== by 0x3813B6F4: amd64g_dirtyhelper_PCMPxSTRx (guest_amd64_helpers.c:3266)
==20981== by 0x403B1245D: ???
==20981== by 0x402ABEF4F: ???
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable
==20981== at 0x7D6F9EB: __intel_sse4_strrchr (in /opt/intel/2013.initial/composer_xe_2013.0.079/compiler/lib/intel64/libirc.so)
==20981== by 0x612EC11: H5_build_extpath (in /opt/hdf5/intel/2013.initial/lib64/libhdf5.so.7.0.2)
==20981== by 0x61A25BB: H5F_open (in /opt/hdf5/intel/2013.initial/lib64/libhdf5.so.7.0.2)
==20981== by 0x61A4F51: H5Fopen (in /opt/hdf5/intel/2013.initial/lib64/libhdf5.so.7.0.2)
==20981== by 0x6B245C2: h5fopen_c_ (in /opt/hdf5/intel/2013.initial/lib64/libhdf5_fortran.so.7.0.2)
==20981== by 0x6B1BFA4: h5f_mp_h5fopen_f_ (in /opt/hdf5/intel/2013.initial/lib64/libhdf5_fortran.so.7.0.2)
==20981== by 0x6CD9C9: gridoperations_m_mp_readgridhdf5_ (in /data1/randomPack/bin_randPackMSDM_van/les3d.x)
==20981== by 0x69F967: gridoperations_m_mp_readgrid_ (in /data1/randomPack/bin_randPackMSDM_van/les3d.x)
==20981== by 0x6D0953: simulationsetup_m_mp_readinputsandallocatedata_ (in /data1/randomPack/bin_randPackMSDM_van/les3d.x)
==20981== by 0x6E8194: MAIN__ (in /data1/randomPack/bin_randPackMSDM_van/les3d.x)
==20981== by 0x404BDB: main (in /data1/randomPack/bin_randPackMSDM_van/les3d.x)
I have joined the IRC channel in the event there are other quick things to test out. I've never looked at the source before, but I'm willing to put in any work I can to help resolve it.
I think you need a similar change to the switch statement in guest_generic_x87.c:compute_PCMPxSTRx() Also you definitely want to add this test case to none/tests/amd64/pcmpstr64.c. (Actually getting the validity bit tracking to work properly on this one will be... Fun. I will take a crack at it eventually.) (In reply to comment #11) > I think you need a similar change to the switch statement in > guest_generic_x87.c:compute_PCMPxSTRx() > > Also you definitely want to add this test case to > none/tests/amd64/pcmpstr64.c. > > (Actually getting the validity bit tracking to work properly on this one > will be... Fun. I will take a crack at it eventually.) Yes, we discussed that in the IRC channel. Adding it there does in fact make it run, sort of. I'm getting another illegal instruction in another function (I'm sure it's the same issue, different instruction) but I can't figure out how to translate the instruction into the ID's in the code. Ie. how pcmpistri = 0x42. Once I can figure that out, I can identify the problem instruction. From my searching through guest_x86_toIR.c, I think the problem instruction is pcmpistrm. That's the only one that I don't find any reference to in guest_x86_toIR.c. Created attachment 75557 [details]
Added imm8 codes
Two imm8 codes needed to be added, one for pcmpistri (0x42) and one for pcmpistrm (0x62).
Patch added that makes my case run. Testing not included in the patch, but I could take a crack at it once I finish fixing my own code that started all this! But that may be awhile. I just hit this same bug today in valgrind 3.9. I added the same imm8 codes and that got me past the __intel_sse4_strrchr call only to have it die in __intel_sse4_strspn. Using the same objdump trick I found calls to pcmpistrm with a value of 0x72. [arwild1@hplcslsp1 ~]$ /apps_hpc/GCC/gcc-4.7.2/bin/objdump -t /apps_hpc/intel/composer_xe_2013.3.163/compiler/lib/intel64/libintlc.so.5 | egrep __intel_sse4_strspn 00000000000209e0 g F .text 0000000000000310 __intel_sse4_strspn [arwild1@hplcslsp1 ~]$ /apps_hpc/GCC/gcc-4.7.2/bin/objdump -d -f --start-address=0x00000000000209e0 --stop-address=0x0000000000020cf0 /apps_hpc/intel/composer_xe_2013.3.163/compiler/lib/intel64/libintlc.so.5 | egrep pcmp 20a7b: 66 0f 74 c8 pcmpeqb %xmm0,%xmm1 20c41: 66 0f 3a 62 d1 72 pcmpistrm $0x72,%xmm1,%xmm2 20c70: 66 0f 3a 62 e1 72 pcmpistrm $0x72,%xmm1,%xmm4 20c99: 66 0f 3a 63 c9 3a pcmpistri $0x3a,%xmm1,%xmm1 (NOTE: 0x3a is already handled in the above source files). After adding 0x72 to the same two files I was able to run without error. I'll attach my diff against 3.9.0 below. Created attachment 84130 [details]
adds imm code 0x72 0x42 and 0x62 for pcmpistri / pcmpistrm
At the risk of repeating myself... Fixing the bug is good, of course. But if you want it to stay fixed, please create a test case and add it to none/tests/amd64/pcmpstr64.c. The test case should have the property that it fails without your patch and succeeds with it. It is impossible to overstate how important this is. Sooner or later, some compiler will start emitting these instructions inline, at which point Valgrind will need to expand them into IR code that memcheck can actually instrument. (A few memcheck suppression rules for a handful of string library routines is feasible. Hundreds or thousands of them for inline operations is not.) Taking a few minutes now to add a test case or two could save somebody else hours of time in the future. |