Bug 350062 - vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x3A 0xB (ROUNDSD) on OS X
Summary: vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x3A 0xB (ROUNDSD) on OS X
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.10 SVN
Platform: Other macOS
: NOR crash
Target Milestone: ---
Assignee: Rhys Kidd
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-09 18:47 UTC by Rich Siegel
Modified: 2015-08-15 15:16 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Text output from failed valgrind run (4.90 KB, text/plain)
2015-07-09 18:47 UTC, Rich Siegel
Details
Output from valgrind --none on 10.10.4 (5.33 KB, text/plain)
2015-07-09 18:52 UTC, Rich Siegel
Details
Contains implementation of __CFArmNextTimerInMode(). (153.77 KB, text/x-csrc)
2015-08-13 03:02 UTC, Rich Siegel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rich Siegel 2015-07-09 18:47:30 UTC
Created attachment 93539 [details]
Text output from failed valgrind run

Attempting to test an i386 application (BBEdit) on OS X 10.9.5, `uname -a` reports:

     Darwin sandbox-109.lan 13.4.0 Darwin Kernel Version 13.4.0: Wed Mar 18 16:20:14 PDT 2015; root:xnu-2422.115.14~1/RELEASE_X86_64 x86_64

valgrind --version reports:

     valgrind-3.11.0.SVN

Operation attempted:

     valgrind siegel$ valgrind --tool=none /path/to/BBEdit.app 

Expected result: valgrind initialization and normal application startup.

Actual result: valgrind initialized, and application crashed at startup. 

Notes: this may be diagnostic:

--21312-- UNKNOWN __pthread_sigmask is unsupported.
vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x3A 0xB
==21312== valgrind: Unrecognised instruction at address 0x4fcd1d6.

The full text output from the run is attached as a file.
Comment 1 Rich Siegel 2015-07-09 18:51:56 UTC
Test results on OS X 10.10.4, `uname -a` reporting:

Darwin RocketSled.lan 14.4.0 Darwin Kernel Version 14.4.0: Thu May 28 11:35:04 PDT 2015; root:xnu-2782.30.5~1/RELEASE_X86_64 x86_64

Similar to 10.9.5, so it may be easier to test and fix on the newer OS.
Comment 2 Rich Siegel 2015-07-09 18:52:50 UTC
Created attachment 93540 [details]
Output from valgrind --none on 10.10.4
Comment 3 Rhys Kidd 2015-07-10 00:37:42 UTC
Hello Rich,
Thanks for trying out the SVN version of Valgrind and the bug report.

As you know, BBEdit is a 32-bit process and it - or a middleware framework it relies upon - uses the 0x66 0xF 0x3A 0xB (roundsd) SSE4.1 instruction.
With Valgrind SSE4 isn't supported in 32-bit mode, only 64-bit mode.  32-bit mode supports only up to and including SSSE3 instructions.
http://www.valgrind.org/docs/manual/manual-core.html#manual-core.limits

There are no current plans to support SSE4 on 32-bit Valgrind.  Please use 64-bit.

As the developer of BBEdit, I completely understand it is your prerogative whether to make the changes necessary to allow BBEdit to be built as a 64 bit binary, including changing out middleware frameworks. Unfortunately, Valgrind and 32-bit aren't going to play nice for your app.

Also, worth noting this isn't an OS X specific restriction as the decision to not support SSE4 on 32-bit applies to all of Valgrind's platforms.

*** This bug has been marked as a duplicate of bug 346023 ***
Comment 4 Rich Siegel 2015-07-10 13:07:41 UTC
Thanks for the followup. Unfortunately the current condition of using i386 is not a matter of prerogative or preference.

The "middleware framework" that the application relies on is supplied by Apple as part of the OS, and there is no 64-bit version of it. Therefore I cannot simply flip a switch and build for x86_64.

So, we're in the process of rewriting the parts of the application that rely on that framework (essentially, its entire UI). I'm sure you can imagine the scope of work involved and that it's not as simple as "please use 64-bit". :-)

Thanks again for the information, and I look forward to the day when I can again use valgrind in combat.
Comment 5 Julian Seward 2015-08-12 14:03:49 UTC
The 32 bit front end does actually handle a very limited selection of SSE4 insns
(2, to be precise) precisely for the purpose of keeping 32 bit OSX programs running.
That had worked ok up until earlier this year I think.

The curious thing is that one of the supported insns is ROUNDSD, with the
following encoding

66 0F 3A 0B /r ib = ROUNDSD imm8, xmm2/m64, xmm1

and what Rich got was

vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x3A 0xB

so it's a bit odd that it failed.  Rich, can you find and disassemble the failing
insn, so we can see what's with it?
Comment 6 Rich Siegel 2015-08-13 03:02:08 UTC
Created attachment 94008 [details]
Contains implementation of __CFArmNextTimerInMode().
Comment 7 Rich Siegel 2015-08-13 03:30:39 UTC
...so, the first three frames in the backtrace are:

 __tanpi (in /usr/lib/system/libsystem_m.dylib)
 __CFArmNextTimerInMode (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
 __CFRepositionTimerInMode (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)

I have attached CFRunLoop.c, which contains the implementation of __CFArmNextTimerInMode() and __CFRepositionTimerInMode().

However, I believe that I have found a trivial case to reproduce the problem.

Here is a little test program:

===

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#include <stdio.h>
#include <math.h>

int	main(int argc, char **argv)
{
	double x = 1.1;
	double i = floor(x);
	
	(void)i;
	
	return 0;
}

===

I compiled this with "clang -arch i386", which gave me an executable a.out. Then I did:

valgrind --tool=none ./a.out

And got the exact same crash:

vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x3A 0xB
==20401== valgrind: Unrecognised instruction at address 0x2e1046.
==20401==    at 0x2E1046: __tanpi (in /usr/lib/system/libsystem_m.dylib)
==20401==    by 0x1176D8: start (in /usr/lib/system/libdyld.dylib)

==20401== Process terminating with default action of signal 4 (SIGILL)
==20401==  Illegal opcode at address 0x2E1046
==20401==    at 0x2E1046: __tanpi (in /usr/lib/system/libsystem_m.dylib)
==20401==    by 0x1176D8: start (in /usr/lib/system/libdyld.dylib)

According to otool -tv, this all disassembles to:

00001f30	pushl	%ebp
00001f31	movl	%esp, %ebp
00001f33	pushl	%esi
00001f34	subl	$0x34, %esp
00001f37	calll	0x1f3c
00001f3c	popl	%eax
00001f3d	movl	0xc(%ebp), %ecx
00001f40	movl	0x8(%ebp), %edx
00001f43	xorl	%esi, %esi
00001f45	movsd	0x6c(%eax), %xmm0
00001f4d	movl	$0x0, -0x8(%ebp)
00001f54	movl	%edx, -0xc(%ebp)
00001f57	movl	%ecx, -0x10(%ebp)
00001f5a	movsd	%xmm0, -0x18(%ebp)
00001f5f	movsd	-0x18(%ebp), %xmm0
00001f64	movl	%esp, %eax
00001f66	movsd	%xmm0, (%eax)
00001f6a	movl	%esi, -0x2c(%ebp)
00001f6d	calll	0x1f88                  ## symbol stub for: _floor
00001f72	fstpl	-0x28(%ebp)
00001f75	movsd	-0x28(%ebp), %xmm0
00001f7a	movsd	%xmm0, -0x20(%ebp)
00001f7f	movl	-0x2c(%ebp), %eax
00001f82	addl	$0x34, %esp
00001f85	popl	%esi
00001f86	popl	%ebp
00001f87	retl

If I set a breakpoint in round() in lldb, when it stops I see:

libsystem_m.dylib`___lldb_unnamed_function125$$libsystem_m.dylib:
->  0x96a40040 <+0>:  movsd  0x4(%esp), %xmm0
    0x96a40046 <+6>:  roundsd $0x1, %xmm0, %xmm0
    0x96a4004c <+12>: movsd  %xmm0, 0x4(%esp)
    0x96a40052 <+18>: fldl   0x4(%esp)

I'm not sure how useful this is, hopefully it's not totally useless. :-)
Comment 8 Julian Seward 2015-08-13 08:53:44 UTC
(In reply to Rich Siegel from comment #7)
> I'm not sure how useful this is, hopefully it's not totally useless. :-)

You got the important bit -- the instruction with page offset 0x046:

>     0x96a40046 <+6>:  roundsd $0x1, %xmm0, %xmm0

So, looking again at the implementation of this in guest_x86_toIR.c, I
see that the comment is kind of misleading.  The code is there to
implement both ROUNDSD and ROUNDSS, but the former is disabled :-/


   /* 66 0F 3A 0B /r ib = ROUNDSD imm8, xmm2/m64, xmm1
      (Partial implementation only -- only deal with cases where
      the rounding mode is specified directly by the immediate byte.)
      66 0F 3A 0A /r ib = ROUNDSS imm8, xmm2/m32, xmm1
      (Limitations ditto)
   */
   if (sz == 2 
       && insn[0] == 0x0F && insn[1] == 0x3A
       && (/*insn[2] == 0x0B || */insn[2] == 0x0A)) {
           ^
           here

Try uncommenting that bit and see if that helps.
Comment 9 Rich Siegel 2015-08-13 14:02:51 UTC
Well, it got a little farther this time. :-) After making that change and rebuilding valgrind, I tried it on my original application again. This time, after some time, I got a new one:

vex x86->IR: unhandled instruction bytes: 0xC5 0xF8 0x28 0x1
==92137== valgrind: Unrecognised instruction at address 0x19adec00.
==92137==    at 0x19ADEC00: ??? (in /dev/ttys005)
==92137==    by 0x1965EF41: glrCompCloseDevice (in /System/Library/Frameworks/OpenCL.framework/Versions/A/Libraries/libcldcpuengine.dylib)
==92137==    by 0x56B3450: _dispatch_apply_invoke (in /usr/lib/system/libdispatch.dylib)
==92137==    by 0x56AA17F: _dispatch_root_queue_drain (in /usr/lib/system/libdispatch.dylib)
==92137==    by 0x56B963C: _dispatch_worker_thread3 (in /usr/lib/system/libdispatch.dylib)
==92137==    by 0x598D1D9: _pthread_wqthread (in /usr/lib/system/libsystem_pthread.dylib)
==92137==    by 0x598AE2D: start_wqthread (in /usr/lib/system/libsystem_pthread.dylib)

Unfortunately I'm stymied by this one; the source to Apple's OpenCL framework does not seem to be available. That opcode looks like an x86 "LDS" with two opcodes, "load far pointer" or some such.

I am happy to open a new bug for this new instruction, so as to not piggyback on this one and confuse the issue. Let me know how you'd like to proceed.
Comment 10 Tom Hughes 2015-08-13 14:21:26 UTC
It's not LDS it's a VEX encoded SUB instruction.
Comment 11 Julian Seward 2015-08-13 14:28:04 UTC
Hmm, indeed, an AVX instruction.  So you are pretty much scuppered now, at least
in terms of short-term hacks.  Alas.
Comment 12 Rhys Kidd 2015-08-15 06:40:48 UTC
I'll add a regression test for this, based on Rich's small test case (thanks!).

Have also confirmed that uncommenting the ROUNDSD support on x86 does not cause any regressions, so will push that too thereby fixing the new test.
Comment 13 Rhys Kidd 2015-08-15 07:40:48 UTC
Resolved in r15547 and r3173 (VEX).
Comment 14 Rich Siegel 2015-08-15 15:16:53 UTC
Thank you very much for the fix! It is indeed unfortunate that the AVX SUB won't be supported, since that puts me pretty well back to square one. :-( But as before, I appreciate the feedback and look forward to the day when I can use valgrind for this project again. :-)