vex amd64->IR: unhandled instruction bytes: 0xF3 0x48 0xF 0xBD 0xC0 0x4C
Using version 3.4 not sure why only the SVN version exists in the list box. This instruction executes fine without VEX.
Wouldn't this be an illegal instruction?
The F3 prefix can only appear with a string instruction (or in a SIMD instruction, followed by 0F).
Intel's Manual (Vol. 2) says:
—F3H—REP prefix (used only with string instructions).
—F3H—REPE/REPZ prefix (used only with string instructions).
—F3H—Streaming SIMD Extensions prefix.
None of the string instructions has the opcode 0x48...
The 0x48 is a REX prefix in the 64 bit instruction set.
So the actual instruction is 0x0F 0xBD which is BSR but I don't believe that allows a string prefix either, so your argument may still stand...
Ah, yes, it's changing the instruction to use a 64-bit operand.
From page 2-2 of Vol. 2A of the Intel 64 and IA-32 Instruction Reference:
Repeat prefixes (F2H, F3H) cause an instruction to be repeated for each element of a
string. Use these prefixes only with string instructions (MOVS, CMPS, SCAS, LODS,
STOS, INS, and OUTS). Their use, followed by 0FH, is treated as a mandatory prefix
by a number of SSE/SSE2/SSE3 instructions. Use of repeat prefixes and/or unde-
fined opcodes with other Intel 64 or IA-32 instructions is reserved; such use may
cause unpredictable behavior.
So, I guess it's really invalid, as BSR is not a string instruction. It can work on some CPUs and not on other. I wonder which compiler was used and what source code triggered it.
Or maybe I'm missing something.
This was a while ago but I think it was generated by GCC 4.x (4.3?) on an AMD Phenom with --mtune=native and --march=native under 64 bit debian linux. I don't have time to go through the opcodes for various instructions but maybe this is an AMD specific instruction subset?
Ah, nice. From the AMD docs:
LZCNS - Counts the number of leading zero bits in the 16-, 32-, or 64-bit general purpose register or memory source operand. Counting starts downward from the most significant bit and stops when the highest bit having a value of 1 is encountered or when the least significant bit is encountered. The count is written to the destination register.
If the input operand is zero, CF is set to 1 and the size (in bits) of the input operand is written to the
destination register. Otherwise, CF is cleared.
If the most significant bit is a one, the ZF flag is set to 1, zero is written to the destination register.
Otherwise, ZF is cleared.
Support for the LZCNT instruction is indicated by ECX bit 5 (LZCNT) as returned by CPUID function 8000_0001h. If the LZCNT instruction is not available, the encoding is treated as the BSR instruction.
Software MUST check the CPUID bit once per program or library initialization before using the LZCNT instruction, or inconsistent behavior may result.
Mnemonic Opcode Description
LZCNT reg16, reg/mem16 F3 0F BD /r Count the number of leading zeros in reg/mem16.
LZCNT reg32, reg/mem32 F3 0F BD /r Count the number of leading zeros in reg/mem32.
LZCNT reg64, reg/mem64 F3 0F BD /r Count the number of leading zeros in reg/mem64.
I don't know what the order restrictions are for the REX prefix... In the section about them, AMD's manual says:
If a REX prefix is used, it must immediately precede the first opcode byte in the instruction
So... If we count the F3 as a prefix, we could say it's a valid instruction... For AMD(64?) processors.
I have the same error on Gentoo amd64 with :
$ g++ --version
g++ (Gentoo 4.3.2-r3 p1.6, pie-10.1.5) 4.3.2
The error was triggered by /usr/lib64/libglib-2.0.so.0.1800.4 :
$ objdump -d /usr/lib64/libglib-2.0.so.0.1800.4 | grep -i 'f3 48 0f bd c8'
55a8a: f3 48 0f bd c8 lzcnt %rax,%rcx
56600: f3 48 0f bd c8 lzcnt %rax,%rcx
Any idea when this can be fixed ?
I can try to do that but only after putting the AMD64 machine to work... When I get the time
Created attachment 40138 [details]
Fix this bug by adding support for LZCNT
This is a patch against valgrind 3.5.0 but it should also work against SVN head.
It fixes the crash, and seems to execute the code correctly, i.e. the tests pass (Tested on Gentoo amd64).
With this patch the carry flag is not set correctly.
Created attachment 40139 [details]
Revised patch that sets the carry flag properly
- add tests to see if the flags are set properly
- set the carry flag correctly
- reorders the coda bit following the hints of Filipe,
Patch looks ok, but I am concerned/unclear about distinguishing
bsr from lzcnt. Is it guaranteed safe to merely check for the
presence of a F3 prefix? iow, are we guaranteed that
(1) no BSR will have a F3 prefix, and
(2) that F3 (rex) 0F BD is not interpreted as something else by
After reading comment #5 again, I figured the code should also check for the CPUID that is emulated. However, I don't know the internals of valgrind well enough to add that check.
*** This bug has been marked as a duplicate of bug 212335 ***