Created attachment 93539 [details] Text output from failed valgrind run Attempting to test an i386 application (BBEdit) on OS X 10.9.5, `uname -a` reports: Darwin sandbox-109.lan 13.4.0 Darwin Kernel Version 13.4.0: Wed Mar 18 16:20:14 PDT 2015; root:xnu-2422.115.14~1/RELEASE_X86_64 x86_64 valgrind --version reports: valgrind-3.11.0.SVN Operation attempted: valgrind siegel$ valgrind --tool=none /path/to/BBEdit.app Expected result: valgrind initialization and normal application startup. Actual result: valgrind initialized, and application crashed at startup. Notes: this may be diagnostic: --21312-- UNKNOWN __pthread_sigmask is unsupported. vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x3A 0xB ==21312== valgrind: Unrecognised instruction at address 0x4fcd1d6. The full text output from the run is attached as a file.
Test results on OS X 10.10.4, `uname -a` reporting: Darwin RocketSled.lan 14.4.0 Darwin Kernel Version 14.4.0: Thu May 28 11:35:04 PDT 2015; root:xnu-2782.30.5~1/RELEASE_X86_64 x86_64 Similar to 10.9.5, so it may be easier to test and fix on the newer OS.
Created attachment 93540 [details] Output from valgrind --none on 10.10.4
Hello Rich, Thanks for trying out the SVN version of Valgrind and the bug report. As you know, BBEdit is a 32-bit process and it - or a middleware framework it relies upon - uses the 0x66 0xF 0x3A 0xB (roundsd) SSE4.1 instruction. With Valgrind SSE4 isn't supported in 32-bit mode, only 64-bit mode. 32-bit mode supports only up to and including SSSE3 instructions. http://www.valgrind.org/docs/manual/manual-core.html#manual-core.limits There are no current plans to support SSE4 on 32-bit Valgrind. Please use 64-bit. As the developer of BBEdit, I completely understand it is your prerogative whether to make the changes necessary to allow BBEdit to be built as a 64 bit binary, including changing out middleware frameworks. Unfortunately, Valgrind and 32-bit aren't going to play nice for your app. Also, worth noting this isn't an OS X specific restriction as the decision to not support SSE4 on 32-bit applies to all of Valgrind's platforms. *** This bug has been marked as a duplicate of bug 346023 ***
Thanks for the followup. Unfortunately the current condition of using i386 is not a matter of prerogative or preference. The "middleware framework" that the application relies on is supplied by Apple as part of the OS, and there is no 64-bit version of it. Therefore I cannot simply flip a switch and build for x86_64. So, we're in the process of rewriting the parts of the application that rely on that framework (essentially, its entire UI). I'm sure you can imagine the scope of work involved and that it's not as simple as "please use 64-bit". :-) Thanks again for the information, and I look forward to the day when I can again use valgrind in combat.
The 32 bit front end does actually handle a very limited selection of SSE4 insns (2, to be precise) precisely for the purpose of keeping 32 bit OSX programs running. That had worked ok up until earlier this year I think. The curious thing is that one of the supported insns is ROUNDSD, with the following encoding 66 0F 3A 0B /r ib = ROUNDSD imm8, xmm2/m64, xmm1 and what Rich got was vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x3A 0xB so it's a bit odd that it failed. Rich, can you find and disassemble the failing insn, so we can see what's with it?
Created attachment 94008 [details] Contains implementation of __CFArmNextTimerInMode().
...so, the first three frames in the backtrace are: __tanpi (in /usr/lib/system/libsystem_m.dylib) __CFArmNextTimerInMode (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation) __CFRepositionTimerInMode (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation) I have attached CFRunLoop.c, which contains the implementation of __CFArmNextTimerInMode() and __CFRepositionTimerInMode(). However, I believe that I have found a trivial case to reproduce the problem. Here is a little test program: === #include <stdio.h> #include <stdlib.h> #include <math.h> #include <stdio.h> #include <math.h> int main(int argc, char **argv) { double x = 1.1; double i = floor(x); (void)i; return 0; } === I compiled this with "clang -arch i386", which gave me an executable a.out. Then I did: valgrind --tool=none ./a.out And got the exact same crash: vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x3A 0xB ==20401== valgrind: Unrecognised instruction at address 0x2e1046. ==20401== at 0x2E1046: __tanpi (in /usr/lib/system/libsystem_m.dylib) ==20401== by 0x1176D8: start (in /usr/lib/system/libdyld.dylib) ==20401== Process terminating with default action of signal 4 (SIGILL) ==20401== Illegal opcode at address 0x2E1046 ==20401== at 0x2E1046: __tanpi (in /usr/lib/system/libsystem_m.dylib) ==20401== by 0x1176D8: start (in /usr/lib/system/libdyld.dylib) According to otool -tv, this all disassembles to: 00001f30 pushl %ebp 00001f31 movl %esp, %ebp 00001f33 pushl %esi 00001f34 subl $0x34, %esp 00001f37 calll 0x1f3c 00001f3c popl %eax 00001f3d movl 0xc(%ebp), %ecx 00001f40 movl 0x8(%ebp), %edx 00001f43 xorl %esi, %esi 00001f45 movsd 0x6c(%eax), %xmm0 00001f4d movl $0x0, -0x8(%ebp) 00001f54 movl %edx, -0xc(%ebp) 00001f57 movl %ecx, -0x10(%ebp) 00001f5a movsd %xmm0, -0x18(%ebp) 00001f5f movsd -0x18(%ebp), %xmm0 00001f64 movl %esp, %eax 00001f66 movsd %xmm0, (%eax) 00001f6a movl %esi, -0x2c(%ebp) 00001f6d calll 0x1f88 ## symbol stub for: _floor 00001f72 fstpl -0x28(%ebp) 00001f75 movsd -0x28(%ebp), %xmm0 00001f7a movsd %xmm0, -0x20(%ebp) 00001f7f movl -0x2c(%ebp), %eax 00001f82 addl $0x34, %esp 00001f85 popl %esi 00001f86 popl %ebp 00001f87 retl If I set a breakpoint in round() in lldb, when it stops I see: libsystem_m.dylib`___lldb_unnamed_function125$$libsystem_m.dylib: -> 0x96a40040 <+0>: movsd 0x4(%esp), %xmm0 0x96a40046 <+6>: roundsd $0x1, %xmm0, %xmm0 0x96a4004c <+12>: movsd %xmm0, 0x4(%esp) 0x96a40052 <+18>: fldl 0x4(%esp) I'm not sure how useful this is, hopefully it's not totally useless. :-)
(In reply to Rich Siegel from comment #7) > I'm not sure how useful this is, hopefully it's not totally useless. :-) You got the important bit -- the instruction with page offset 0x046: > 0x96a40046 <+6>: roundsd $0x1, %xmm0, %xmm0 So, looking again at the implementation of this in guest_x86_toIR.c, I see that the comment is kind of misleading. The code is there to implement both ROUNDSD and ROUNDSS, but the former is disabled :-/ /* 66 0F 3A 0B /r ib = ROUNDSD imm8, xmm2/m64, xmm1 (Partial implementation only -- only deal with cases where the rounding mode is specified directly by the immediate byte.) 66 0F 3A 0A /r ib = ROUNDSS imm8, xmm2/m32, xmm1 (Limitations ditto) */ if (sz == 2 && insn[0] == 0x0F && insn[1] == 0x3A && (/*insn[2] == 0x0B || */insn[2] == 0x0A)) { ^ here Try uncommenting that bit and see if that helps.
Well, it got a little farther this time. :-) After making that change and rebuilding valgrind, I tried it on my original application again. This time, after some time, I got a new one: vex x86->IR: unhandled instruction bytes: 0xC5 0xF8 0x28 0x1 ==92137== valgrind: Unrecognised instruction at address 0x19adec00. ==92137== at 0x19ADEC00: ??? (in /dev/ttys005) ==92137== by 0x1965EF41: glrCompCloseDevice (in /System/Library/Frameworks/OpenCL.framework/Versions/A/Libraries/libcldcpuengine.dylib) ==92137== by 0x56B3450: _dispatch_apply_invoke (in /usr/lib/system/libdispatch.dylib) ==92137== by 0x56AA17F: _dispatch_root_queue_drain (in /usr/lib/system/libdispatch.dylib) ==92137== by 0x56B963C: _dispatch_worker_thread3 (in /usr/lib/system/libdispatch.dylib) ==92137== by 0x598D1D9: _pthread_wqthread (in /usr/lib/system/libsystem_pthread.dylib) ==92137== by 0x598AE2D: start_wqthread (in /usr/lib/system/libsystem_pthread.dylib) Unfortunately I'm stymied by this one; the source to Apple's OpenCL framework does not seem to be available. That opcode looks like an x86 "LDS" with two opcodes, "load far pointer" or some such. I am happy to open a new bug for this new instruction, so as to not piggyback on this one and confuse the issue. Let me know how you'd like to proceed.
It's not LDS it's a VEX encoded SUB instruction.
Hmm, indeed, an AVX instruction. So you are pretty much scuppered now, at least in terms of short-term hacks. Alas.
I'll add a regression test for this, based on Rich's small test case (thanks!). Have also confirmed that uncommenting the ROUNDSD support on x86 does not cause any regressions, so will push that too thereby fixing the new test.
Resolved in r15547 and r3173 (VEX).
Thank you very much for the fix! It is indeed unfortunate that the AVX SUB won't be supported, since that puts me pretty well back to square one. :-( But as before, I appreciate the feedback and look forward to the day when I can use valgrind for this project again. :-)