The tests drd/tests/sem_open, sem_open2, sem_open3 and sem_open_traced all crash with unhandled instruction error ==14999== drd, a thread error detector ==14999== Copyright (C) 2006-2015, and GNU GPL'd, by Bart Van Assche. ==14999== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==14999== Command: ./sem_open -m ==14999== vex amd64->IR: unhandled instruction bytes: 0xFF 0xFF 0x48 0x85 0xC0 0x74 0x8 0x4D 0x89 0xE7 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==14999== valgrind: Unrecognised instruction at address 0x382bf0a2f0. ==14999== at 0x382BF0A2F0: check_add_mapping (in /lib64/tls/libpthread-2.3.4.so) ==14999== by 0x4A14EDF: sem_open_intercept (drd_pthread_intercepts.c:1391) ==14999== by 0x4A14EDF: sem_open (drd_pthread_intercepts.c:1398) ==14999== by 0x400AFA: main (sem_open.c:63) ==14999== Your program just tried to execute an instruction that Valgrind ==14999== did not recognise. There are two possible reasons for this. ==14999== 1. Your program has a bug and erroneously jumped to a non-code ==14999== location. If you are running Memcheck and you just saw a ==14999== warning about a bad jump, it's probably your program's fault. ==14999== 2. The instruction is legitimate but Valgrind doesn't handle it, ==14999== i.e. it's Valgrind's fault. If you think this is the case or ==14999== you are not sure, please let us know and we'll try to fix it. ==14999== Either way, Valgrind will now raise a SIGILL signal which will ==14999== probably kill your program. ==14999== ==14999== Process terminating with default action of signal 4 (SIGILL): dumping core ==14999== Illegal opcode at address 0x382BF0A2F0 ==14999== at 0x382BF0A2F0: check_add_mapping (in /lib64/tls/libpthread-2.3.4.so) ==14999== by 0x4A14EDF: sem_open_intercept (drd_pthread_intercepts.c:1391) ==14999== by 0x4A14EDF: sem_open (drd_pthread_intercepts.c:1398) ==14999== by 0x400AFA: main (sem_open.c:63) ==14999== ==14999== For counts of detected and suppressed errors, rerun with: -v ==14999== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Illegal instruction
cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU X5355 @ 2.66GHz stepping : 7 cpu MHz : 2660.004 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm pni monitor ds_cpl est tm2 cx16 xtpr lahf_lm bogomips : 5323.68 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: glibc version glibc-2.3.4-2.57.x86_64 Running the tests outside valgrind produces no errors Context form /lib64/tls/libpthread-2.3.4.so 382bf0a2d1: e8 92 ab ff ff callq 382bf04e68 <memcpy@plt> 382bf0a2d6: 49 83 fc ff cmp $0xffffffffffffffff,%r12 382bf0a2da: 74 23 je 382bf0a2ff <check_add_mapping+0x19f> 382bf0a2dc: 48 8d 15 3d fe ff ff lea -0x1c3(%rip),%rdx # 382bf0a120 <__sem_search> 382bf0a2e3: 48 8d 35 66 a0 10 00 lea 0x10a066(%rip),%rsi # 382c014350 <__sem_mappings> 382bf0a2ea: 48 89 df mov %rbx,%rdi 382bf0a2ed: e8 46 ad ff ff callq 382bf05038 <tsearch@plt> 382bf0a2f2: 48 85 c0 test %rax,%rax 382bf0a2f5: 74 08 je 382bf0a2ff <check_add_mapping+0x19f> 382bf0a2f7: 4d 89 e7 mov %r12,%r15 382bf0a2fa: e9 74 ff ff ff jmpq 382bf0a273 <check_add_mapping+0x113> 382bf0a2ff: 48 89 df mov %rbx,%rdi 382bf0a302: e8 71 ad ff ff callq 382bf05078 <free@plt> 382bf0a307: e9 67 ff ff ff jmpq 382bf0a273 <check_add_mapping+0x113> 382bf0a30c: 45 31 c9 xor %r9d,%r9d 382bf0a30f: 45 89 f0 mov %r14d,%r8d
This problem only seems to be happening with glibc 2.3.4 which is part of RHEL4. glibc 2.5 (RHEL5) and glibc 2.12 (RHEL 6) don't exhibit this problem
Hi Tushar, are you aware that the "unhandled instruction bytes" message indicates a bug in VEX and not in DRD nor in Helgrind? See also http://valgrind.org/docs/manual/faq.html#faq.msgdeath.
Thank you for pointing to the right component. Since the error was happening in the context of drd and helgrind, I picked those.
When running memcheck, this error does NOT occur. Does VEX instrument libraries differently depending on the checker used? The same problem persists in version 3.8.0 and 3.11.0
Any help/suggestions on how to proceed?
Everyone RHEL user I know either uses RHEL 6 or RHEL 7. How relevant is RHEL 4 today?
Our computers are part of a $20+ million semiconductor capital equipment whose annual maintenance contract itself runs into hundreds of thousands of dollars a year. So we can't drop support for legacy systems easily. So it is very relevant to us. I can help with investigation and testing but some pointers would definitely help. To summarize my findings are The crash only happens with 64 bit drd and helgrind test case. memcheck is OK. 32 bit test cases have not problem with either of the checkers. so I'm wondering if this issue contained inside VEX or do the checkers also have a role to play. Am I on the right track with my thought process?
Analyzing software with Helgrind and DRD may trigger other code paths compared to analyzing the same software with memcheck. My proposal is to start with looking into why VEX reports an unhandled instruction. Have you tried to disassemble the executable with objdump? If not, can you do that and post here the disassembly for the unhandled instruction bytes?
Based on the crash information, I did disassemble the library in question (/lib64/tls/libpthread-2.3.4.so) and looked at the instructions at the address 0x382BF0A2F0. The instructions are 382bf0a2d1: e8 92 ab ff ff callq 382bf04e68 <memcpy@plt> 382bf0a2d6: 49 83 fc ff cmp $0xffffffffffffffff,%r12 382bf0a2da: 74 23 je 382bf0a2ff <check_add_mapping+0x19f> 382bf0a2dc: 48 8d 15 3d fe ff ff lea -0x1c3(%rip),%rdx # 382bf0a120 <__sem_search> 382bf0a2e3: 48 8d 35 66 a0 10 00 lea 0x10a066(%rip),%rsi # 382c014350 <__sem_mappings> 382bf0a2ea: 48 89 df mov %rbx,%rdi 382bf0a2ed: e8 46 ad ff ff callq 382bf05038 <tsearch@plt> 382bf0a2f2: 48 85 c0 test %rax,%rax 382bf0a2f5: 74 08 je 382bf0a2ff <check_add_mapping+0x19f> 382bf0a2f7: 4d 89 e7 mov %r12,%r15 382bf0a2fa: e9 74 ff ff ff jmpq 382bf0a273 <check_add_mapping+0x113> 382bf0a2ff: 48 89 df mov %rbx,%rdi 382bf0a302: e8 71 ad ff ff callq 382bf05078 <free@plt> 382bf0a307: e9 67 ff ff ff jmpq 382bf0a273 <check_add_mapping+0x113> 382bf0a30c: 45 31 c9 xor %r9d,%r9d 382bf0a30f: 45 89 f0 mov %r14d,%r8d The thing that caught my attention is that data @ 0x382BF0A2F0 seems to be part of callq instruction @ 382bf0a2ed. VEX seems to think it is the start of a new instruction. The disassembly of the test code (sem_open) itself at the are of interest is 400aca: ba f6 0d 40 00 mov $0x400df6,%edx 400acf: be 20 00 00 00 mov $0x20,%esi 400ad4: b8 00 00 00 00 mov $0x0,%eax 400ad9: e8 e2 fd ff ff callq 4008c0 <snprintf@plt> 400ade: 48 8d 7d c0 lea -0x40(%rbp),%rdi 400ae2: b9 01 00 00 00 mov $0x1,%ecx 400ae7: ba 80 01 00 00 mov $0x180,%edx 400aec: be c0 00 00 00 mov $0xc0,%esi 400af1: b8 00 00 00 00 mov $0x0,%eax 400af6: e8 25 fe ff ff callq 400920 <sem_open@plt> 400afb: 48 89 05 b6 07 20 00 mov %rax,0x2007b6(%rip) # 6012b8 <s_sem> 400b02: 48 83 3d ae 07 20 00 cmpq $0x0,0x2007ae(%rip) # 6012b8 <s_sem> 400b09: 00 400b0a: 75 24 jne 400b30 <main+0xfa> 400b0c: 48 8d 55 c0 lea -0x40(%rbp),%rdx 400b10: be 10 0e 40 00 mov $0x400e10,%esi 400b15: 48 8b 3d 7c 07 20 00 mov 0x20077c(%rip),%rdi # 601298 <stderr@@GLIBC_2.2.5> 400b1c: b8 00 00 00 00 mov $0x0,%eax 400b21: e8 ea fd ff ff callq 400910 <fprintf@plt> I hope this information helps
Is the above information useful?
I looked at it but it is not immediately obvious to me what is going on. It's very strange that VEX is losing sync with the instruction stream. VEX is normally very reliable at that.
Julian, Thanks for getting back. I tried to find out what is happening and didn't make much headway. The problem can be reliably reproduced on an RHEL 4 system. So if you want me to instrument something or if you have debug binary for me to try, I will be more than happy to do so