I tried to profile an app of mine, but stumbled over the unhandled instruction: 0xF3 0xF 0xBC 0xC0 It's the TZCNT instruction (from the BMI1 extention): $ objdump -d someobj.o .... 1f9: f3 0f bc c0 tzcnt %eax,%eax 1fd: 48 01 d0 add %rdx,%rax .... Like the LZCNT instruction, it falls back to the old bit scan instructions (LZCNT -> BSR, TZCNT -> BSF) when the CPU does not support it. I tried to look at the VEX code, and even tried to modify it, but i wasn't able to successfully add the instruction. Also i could not find the fallback handling for LZCNT (if the HW supports LZCNT, LZCNT get to IR, but since there is no fallback to BSR it should crash like now the TZCNT). Since both instructions (LZCNT and TZCNT) have a compatible fallback tools will start to output them more often and unconditionlly, esp. since new CPUs will only have the new instructions as HW, the old instructions as microcode. It would be a good start if at least the fallback to BSR/BSF would work. I know that between 3.7 and SVN there was a big rewrite in VEX, so maybe it is now handled, but i could not find any bugreport which would hint that the instruction was added, and since the new code started from the original VEX and even some instructions where disabled in the move, i think this bug is still valid.
http://article.gmane.org/gmane.comp.gcc.patches/261926 As i said, tools may start to emit TZCNT unconditionally.
Same issue here, but for a slightly different instruction: vex amd64->IR: unhandled instruction bytes: 0xF3 0x48 0xF 0xBC 0xD3 0x48 0x63 0xD2 vex amd64->IR: REX=1 REX.W=1 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=1 That's: 6667d6: f3 48 0f bc d3 tzcnt %rbx,%rdx
See also bug #301011 for this instruction in 32 bit mode.
This is pretty similar to LZCNT, which valgrind has somehow implemented already, but incorrectly so. At least the Intel docs are very clear on that on CPUs where LZCNT or TZCNT aren't supported, those opcodes execute the same as BSR resp. BSF. So, IMNSHO, for the time being we want something like (untested): --- valgrind/VEX/priv/guest_amd64_toIR.c.jj 2012-08-16 17:30:55.000000000 +0200 +++ valgrind/VEX/priv/guest_amd64_toIR.c 2012-08-16 17:51:52.234324781 +0200 @@ -20061,13 +20061,16 @@ Long dis_ESC_0F ( return delta; case 0xBC: /* BSF Gv,Ev */ - if (haveF2orF3(pfx)) goto decode_failure; + if (!haveF2(pfx)) goto decode_failure; delta = dis_bs_E_G ( vbi, pfx, sz, delta, True ); return delta; case 0xBD: /* BSR Gv,Ev */ - if (!haveF2orF3(pfx)) { - /* no-F2 no-F3 0F BD = BSR */ + if (!haveF2orF3(pfx) + || (haveF3noF2(pfx) + && 0 == (archinfo->hwcaps & VEX_HWCAPS_AMD64_LZCNT))) { + /* no-F2 no-F3 0F BD = BSR + or F3 0F BD = REP; BSR on older CPUs. */ delta = dis_bs_E_G ( vbi, pfx, sz, delta, False ); return delta; } so that if VEX_HWCAPS_AMD64_LZCNT isn't set, REP; BSR acts like BSR, and if it is defined, LZCNT acts the new way. Similarly for REP; BSF and TZCNT, except that this patchlet doesn't add BMI1 support yet, therefore it always handles TZCNT == REP; BSF as BSF.
And yes, GCC 4.8 now emits TZCNT (== REP; BSF) unconditionally, because it expects that on older CPUs it will behave like BSF and on newer CPUs as TZCNT.
Thanks Jakub. Haven't tested your patchlet, but as i said, if at least the fallback (for TZCNT and LZCNT) would work, this would be fine.
if (!haveF2(pfx)) goto decode_failure; should have been if (haveF2(pfx)) goto decode_failure; obviously.
Created attachment 73257 [details] valgrind-lzcnt-tzcnt.patch Updated patch, this time actually tested, which should fix this bugreport as well as the 32-bit one.
(In reply to comment #8) > Created attachment 73257 [details] > valgrind-lzcnt-tzcnt.patch Committed, r2478. Thanks for the patch.
*** Bug 318773 has been marked as a duplicate of this bug. ***
Is it known when to expect the next release with this and other patches? This particular PR holds adoption of gcc-4.8.X since gcc-4.8.X now generates offending instructions. Last valgrind release 3.8.1 was in Sep 2012.
This fix was included in 3.8.1. So this bug can be closed.