Version: 3.7 SVN OS: MS Windows 32 bit app compiled with MinGW. Running into unimplemented x86 instruction LSL. I'm using current SVN trunk. Any chance you can implement this instruction? thanks, Chris Reproducible: Always Steps to Reproduce: (instruction in question) 77872680 0f03c1 lsl eax,ecx (instruction information) http://siyobik.info/main/reference/instruction/LSL Actual Results: vex x86->IR: unhandled instruction bytes: 0xF 0x3 0xC1 0x66 --3744-- SCHED[1]: TRC: NODECODE ==3744== valgrind: Unrecognised instruction at address 0x77872680. Expected Results: no assertion (assembly context) ntdll_77840000!ExInterlockedPopEntrySList: 77872658 53 push ebx 77872659 55 push ebp 7787265a 8be9 mov ebp,ecx 7787265c 83ec08 sub esp,8 7787265f 33c0 xor eax,eax 77872661 890424 mov dword ptr [esp],eax 77872664 64a130000000 mov eax,dword ptr fs:[00000030h] 7787266a f6804c02000004 test byte ptr [eax+24Ch],4 77872671 7408 je ntdll_77840000!ExpInterlockedPopEntrySListResume (7787267b) 77872673 55 push ebp 77872674 e84bfafeff call ntdll_77840000!ZwWow64InterlockedPopEntrySList (778620c4) 77872679 eb4e jmp ntdll_77840000!ExpInterlockedPopEntrySListEnd+0x16 (778726c9) ntdll_77840000!ExpInterlockedPopEntrySListResume: 7787267b b953000000 mov ecx,53h 77872680 0f03c1 lsl eax,ecx <------- F A I L ------- 77872683 668bd0 mov dx,ax 77872686 6681e2ff03 and dx,3FFh 7787268b c1e216 shl edx,16h 7787268e c1e002 shl eax,2 77872691 0bc2 or eax,edx 77872693 8b5504 mov edx,dword ptr [ebp+4] 77872696 8bc8 mov ecx,eax 77872698 c1e810 shr eax,10h 7787269b 668bca mov cx,dx 7787269e 3bca cmp ecx,edx ...
This must have been made by some mutant assembler. There's surely a shorter encoding of "shl %cl,%eax" ?
You misparsed lsl I'm afraid - it's Load Segment Limit not Logical Shift Left ;-)
Oh, that's a good one :-) /me is amused. Ok, well, load segment limit .. off the top of my head I have no idea whether we can support that.
hmmm... what about a dirty helper like x86g_dirtyhelper_*, amd64g_dirtyhelper_ and the like... in/out guest state and 2 args denoting register that holds the segment descriptor and target register. Still getting familiar with VEX, I might try myself. However, it is a low priority item.
tl;dr: I implemented the x86 LSL instruction according to the reference, but there is only one reported usage and even there, the instruction is not used as would seem appropriate (to me, anyway). Sane testing is hard and providing support above "doesn't crash" is questionable. --- The attached patch is my take at implementing the LSL instruction, a second attachment contains possible sample program to test that it works. Adding support for the instruction was indeed quite simple, just following the usual way of doing things as seen around for other opcodes on the decoder/IR generation side, and reusing the code from x86g_use_seg_selector to deal with the LDT/GDT at the helper side. But this is my first IR transformation - feel free to correct/optimize the way I implemented it. If it wasn't such a weird, uncommon instruction, I'd be a bit worried about the amount of necessary IRops. Anyway, so far so good. But... The usability of the unit test lies mostly in emitting the LSL instruction (both its variants, to be exact) and verifying it is correctly decoded and executed. Testing if it behaves well is a hard task, because Linux simply sets all segments registers to a default linear 0:ffffffff selector. Or, on some distributions, all registers but CS, which is limited for security purposes, so it's out of the way. Now, on Windows (which is relevant here due to it being the only platform where this opcode has been seen and makes some sense), there is interesting limit of the FS segment, 0xFFF, which could be tested nicely. And it is indeed returned as 0xFFF on a WinXP 32-bit machine. But on my Win7/x64 rig, the values returned don't make any sense to me, like 0xBC00. Of course Valgrind4win fills the GDT with correct values using Win32 API and returns 0xFFF as usual. I don't know if this is just some consequence of mixing x86/x64 modes or whatever, but it sure is weird and lessens the possibility of a good unit test again. Finally, the disassembly of the relevant user-land-side kernel function where the LSL instruction is used shows it is meant to act as some kind of a semaphore in lock-free list implementation or something, which questions the relevance of the instruction even more, at least in its implementation according to the specs...
Created attachment 71924 [details] Proposed patch (LSL support in x86 VEX)
Created attachment 71925 [details] Simple test case
I know this thread is very old, but the following bit of informationt might be useful for both Valgrind4win and whoever lands on this page, as information about lsl and segement descriptor 53h is pretty scarce. In short: Segment 0x53 limit actually contains the "current processor number" (the ID of the logical processor executing that instruction) In details... Afaik, on Windows 7 64-bit, segment 0x53 is a 32-bits r/w data segment, with 1-byte granularity (as opposed to 4kb, i.e. lsl won't shift the limit 12 bits left and fill with 1s before returning it). Windows 7 64-bit (maybe also 2008 Server and Win8, I don't know) does not load that descriptor into any of the selector (even fs and gs are equal to ds, although they can have a non-zero base address, but it comes from a MSR instead of the segment descriptor). Also, segment limit is not used by the CPU in 64-bit long mode, so basically, this gives 20 free bits. Windows uses them to store the "current processor number" (logical ID, accounting for cores and hyperthreading), as returned by GetCurrentProcessorNumberEx Win32 API, albeit with a slight transformation: bits 0-9: Processor group (always 0 on my machine) bits 10-13: ???? Aways all 1s on my system, my guess is that it makes a segment limit of at least 15360 bytes (should the two other fields be 0) when this segment used by fs while running 32-bits apps under wow64. bits 14-19: Processor number within group (for example, 0 to 7 on a desktop quad-core + hyper-threading CPU) The same information can be obtained through cpuid instruction, with EAX=1 (result in EBX bits 24-31) and probably EAX=0xb (result in EDX) too (I never tested the latter) I can't tell for sure how this ends up in a segment limit. It is probably set early in the OS loader when each processor/core is activated and initializes its own copy of the GDT. And why Windows does that this way? One might guess the GDT is one of the very few "processor-local" memory area that can be read (at least partly) from user-land, on top of being atomic (a single lsl instead of two cpuid) and pretty quick operation. Given the code from Chris above, it seems that some interlocked operation depends on the current processor it is executed on. Returning a "real limit" such as 0xfff (namely, always returning the same value) might break the locking logic, unless of course Valgrind4win assumes a single-processor environment.