Bug 180217 - vex amd64->IR: unhandled instruction bytes: 0xF3 0x48 0xF 0xBD 0xC0 0x4C
Summary: vex amd64->IR: unhandled instruction bytes: 0xF3 0x48 0xF 0xBD 0xC0 0x4C
Status: RESOLVED DUPLICATE of bug 212335
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.4.0
Platform: Debian testing Linux
: NOR crash (vote)
Target Milestone: ---
Assignee: Julian Seward
Depends on:
Reported: 2009-01-10 08:51 UTC by Peter
Modified: 2010-07-30 17:43 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:

Fix this bug by adding support for LZCNT (3.52 KB, patch)
2010-01-22 23:27 UTC, Gert Wollny
Revised patch that sets the carry flag properly (8.73 KB, patch)
2010-01-23 01:47 UTC, Gert Wollny

Note You need to log in before you can comment on or make changes to this bug.
Description Peter 2009-01-10 08:51:18 UTC
vex amd64->IR: unhandled instruction bytes: 0xF3 0x48 0xF 0xBD 0xC0 0x4C

Using version 3.4 not sure why only the SVN version exists in the list box. This instruction executes fine without VEX.
Comment 1 Filipe Cabecinhas 2009-05-02 16:26:11 UTC
Wouldn't this be an illegal instruction?

The F3 prefix can only appear with a string instruction (or in a SIMD instruction, followed by 0F).

Intel's Manual (Vol. 2) says:
—F3H—REP prefix (used only with string instructions). 
—F3H—REPE/REPZ prefix (used only with string instructions). 
—F3H—Streaming SIMD Extensions prefix. 

None of the string instructions has the opcode 0x48...
Comment 2 Tom Hughes 2009-05-02 18:19:25 UTC
The 0x48 is a REX prefix in the 64 bit instruction set.

So the actual instruction is 0x0F 0xBD which is BSR but I don't believe that allows a string prefix either, so your argument may still stand...
Comment 3 Filipe Cabecinhas 2009-05-03 11:29:05 UTC
Ah, yes, it's changing the instruction to use a 64-bit operand.

From page 2-2 of Vol. 2A of the Intel 64 and IA-32 Instruction Reference:

Repeat prefixes (F2H, F3H) cause an instruction to be repeated for each element of a 
string. Use these prefixes only with string instructions (MOVS, CMPS, SCAS, LODS, 
STOS, INS, and OUTS). Their use, followed by 0FH, is treated as a mandatory prefix 
by a number of SSE/SSE2/SSE3 instructions. Use of repeat prefixes and/or unde- 
fined opcodes with other Intel 64 or IA-32 instructions is reserved; such use may 
cause unpredictable behavior. 

So, I guess it's really invalid, as BSR is not a string instruction. It can work on some CPUs and not on other. I wonder which compiler was used and what source code triggered it.

Or maybe I'm missing something.
Comment 4 Peter 2009-05-03 12:19:42 UTC
This was a while ago but I think it was generated by GCC 4.x (4.3?) on an AMD Phenom with --mtune=native and --march=native under 64 bit debian linux. I don't have time to go through the opcodes for various instructions but maybe this is an AMD specific instruction subset?
Comment 5 Filipe Cabecinhas 2009-05-03 12:29:46 UTC
Ah, nice. From the AMD docs:

LZCNS - Counts the number of leading zero bits in the 16-, 32-, or 64-bit general purpose register or memory source operand. Counting starts downward from the most significant bit and stops when the highest bit having a value of 1 is encountered or when the least significant bit is encountered. The count is written to the destination register.

If the input operand is zero, CF is set to 1 and the size (in bits) of the input operand is written to the 
destination register. Otherwise, CF is cleared.

If the most significant bit is a one, the ZF flag is set to 1, zero is written to the destination register. 
Otherwise, ZF is cleared.

Support for the LZCNT instruction is indicated by ECX bit 5 (LZCNT) as returned by CPUID function 8000_0001h. If the LZCNT instruction is not available, the encoding is treated as the BSR instruction.

Software MUST check the CPUID bit once per program or library initialization before using the LZCNT instruction, or inconsistent behavior may result. 

Mnemonic Opcode Description 
LZCNT     reg16, reg/mem16 F3 0F BD /r Count the number of leading zeros in reg/mem16. 
LZCNT     reg32, reg/mem32 F3 0F BD /r Count the number of leading zeros in reg/mem32. 
LZCNT     reg64, reg/mem64 F3 0F BD /r Count the number of leading zeros in reg/mem64.

I don't know what the order restrictions are for the REX prefix... In the section about them, AMD's manual says:
If a REX prefix is used, it must immediately precede the first opcode byte in the instruction 

So... If we count the F3 as a prefix, we could say it's a valid instruction... For AMD(64?) processors.
Comment 6 yabo 2009-05-11 11:38:31 UTC

I have the same error on Gentoo amd64 with :

$ g++ --version
g++ (Gentoo 4.3.2-r3 p1.6, pie-10.1.5) 4.3.2

Using -march=amdfam10.

The error was triggered by /usr/lib64/libglib-2.0.so.0.1800.4 :

$ objdump -d /usr/lib64/libglib-2.0.so.0.1800.4 | grep -i 'f3 48 0f bd c8'
   55a8a:       f3 48 0f bd c8          lzcnt  %rax,%rcx
   56600:       f3 48 0f bd c8          lzcnt  %rax,%rcx
Comment 7 yabo 2009-05-16 11:07:04 UTC
Any idea when this can be fixed ?
Comment 8 Filipe Cabecinhas 2009-05-17 00:22:22 UTC
I can try to do that but only after putting the AMD64 machine to work... When I get the time
Comment 9 Gert Wollny 2010-01-22 23:27:07 UTC
Created attachment 40138 [details]
Fix this bug by adding support for LZCNT 

This is a patch against valgrind 3.5.0 but it should also work against SVN head. 
It fixes the crash, and seems to execute the code correctly, i.e. the tests pass (Tested on Gentoo amd64).
Comment 10 Gert Wollny 2010-01-23 00:49:13 UTC
With this patch the carry flag is not set correctly.
Comment 11 Gert Wollny 2010-01-23 01:47:05 UTC
Created attachment 40139 [details]
Revised patch that sets the carry flag properly 

- add tests to see if the flags are set properly 
- set the carry flag correctly 
- reorders the coda bit following the hints of Filipe,
Comment 12 Julian Seward 2010-05-12 00:19:49 UTC
Patch looks ok, but I am concerned/unclear about distinguishing
bsr from lzcnt.  Is it guaranteed safe to merely check for the
presence of a F3 prefix?  iow, are we guaranteed that 
(1) no BSR will have a F3 prefix, and 
(2) that F3 (rex) 0F BD is not interpreted as something else by
    Intel CPUs?
Comment 13 Gert Wollny 2010-05-13 12:41:43 UTC
After reading comment #5 again, I figured the code should also check for the CPUID that is emulated. However, I don't know the internals of valgrind well enough to add that check.
Comment 14 Julian Seward 2010-07-30 17:43:52 UTC

*** This bug has been marked as a duplicate of bug 212335 ***