Bug 295808 - vex amd64->IR: unhandled instruction bytes: 0xF3 0xF 0xBC 0xC0 0x48 0x1 0xD0 0x48
Summary: vex amd64->IR: unhandled instruction bytes: 0xF3 0xF 0xBC 0xC0 0x48 0x1 0xD0 ...
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.7.0
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 318773 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-03-11 23:18 UTC by Jan Seiffert
Modified: 2013-09-19 18:39 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
valgrind-lzcnt-tzcnt.patch (1.99 KB, patch)
2012-08-17 15:17 UTC, Jakub Jelinek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Seiffert 2012-03-11 23:18:30 UTC
I tried to profile an app of mine, but stumbled over the unhandled instruction: 0xF3 0xF 0xBC 0xC0
It's the TZCNT instruction (from the BMI1 extention):
$ objdump -d someobj.o
....
1f9:   f3 0f bc c0             tzcnt  %eax,%eax
1fd:   48 01 d0                add    %rdx,%rax
....

Like the LZCNT instruction, it falls back to the old bit scan instructions (LZCNT -> BSR, TZCNT -> BSF) when the CPU does not support it.
I tried to look at the VEX code, and even tried to modify it, but i wasn't able to successfully add the instruction. Also i could not find the fallback handling for LZCNT (if the HW supports LZCNT, LZCNT get to IR, but since there is no fallback to BSR it should crash like now the TZCNT).

Since both instructions (LZCNT and TZCNT) have a compatible fallback tools will start to output them more often and unconditionlly, esp. since new CPUs will only have the new instructions as HW, the old instructions as microcode.

It would be a good start if at least the fallback to BSR/BSF would work.

I know that between 3.7 and SVN there was a big rewrite in VEX, so maybe it is now handled, but i could not find any bugreport which would hint that the instruction was added, and since the new code started from the original VEX and even some instructions where disabled in the move, i think this bug is still valid.
Comment 1 Jan Seiffert 2012-04-28 01:20:57 UTC
http://article.gmane.org/gmane.comp.gcc.patches/261926

As i said, tools may start to emit TZCNT unconditionally.
Comment 2 Tobias Burnus 2012-07-23 10:35:49 UTC
Same issue here, but for a slightly different instruction:

vex amd64->IR: unhandled instruction bytes: 0xF3 0x48 0xF 0xBC 0xD3 0x48 0x63 0xD2
vex amd64->IR:   REX=1 REX.W=1 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=1

That's:
  6667d6:       f3 48 0f bc d3          tzcnt  %rbx,%rdx
Comment 3 Tom Hughes 2012-07-23 11:13:31 UTC
See also bug #301011 for this instruction in 32 bit mode.
Comment 4 Jakub Jelinek 2012-08-16 15:59:46 UTC
This is pretty similar to LZCNT, which valgrind has somehow implemented already, but incorrectly so.  At least the Intel docs are very clear on that on CPUs where LZCNT or TZCNT aren't supported, those opcodes execute the same as BSR resp. BSF.
So, IMNSHO, for the time being we want something like (untested):
--- valgrind/VEX/priv/guest_amd64_toIR.c.jj	2012-08-16 17:30:55.000000000 +0200
+++ valgrind/VEX/priv/guest_amd64_toIR.c	2012-08-16 17:51:52.234324781 +0200
@@ -20061,13 +20061,16 @@ Long dis_ESC_0F (
       return delta;
 
    case 0xBC: /* BSF Gv,Ev */
-      if (haveF2orF3(pfx)) goto decode_failure;
+      if (!haveF2(pfx)) goto decode_failure;
       delta = dis_bs_E_G ( vbi, pfx, sz, delta, True );
       return delta;
 
    case 0xBD: /* BSR Gv,Ev */
-      if (!haveF2orF3(pfx)) {
-         /* no-F2 no-F3 0F BD = BSR */
+      if (!haveF2orF3(pfx)
+	  || (haveF3noF2(pfx)
+	      && 0 == (archinfo->hwcaps & VEX_HWCAPS_AMD64_LZCNT))) {
+         /* no-F2 no-F3 0F BD = BSR
+	   or F3 0F BD = REP; BSR on older CPUs.  */
          delta = dis_bs_E_G ( vbi, pfx, sz, delta, False );
          return delta;
       }

so that if VEX_HWCAPS_AMD64_LZCNT isn't set, REP; BSR acts like BSR, and if it is defined, LZCNT acts the new way.  Similarly for REP; BSF and TZCNT, except that this patchlet doesn't add BMI1 support yet, therefore it always handles TZCNT == REP; BSF as BSF.
Comment 5 Jakub Jelinek 2012-08-16 16:01:28 UTC
And yes, GCC 4.8 now emits TZCNT (== REP; BSF) unconditionally, because it expects that on older CPUs it will behave like BSF and on newer CPUs as TZCNT.
Comment 6 Jan Seiffert 2012-08-16 16:16:14 UTC
Thanks Jakub.
Haven't tested your patchlet, but as i said, if at least the fallback (for TZCNT and LZCNT) would work, this would be fine.
Comment 7 Jakub Jelinek 2012-08-16 16:20:43 UTC
if (!haveF2(pfx)) goto decode_failure;
should have been
if (haveF2(pfx)) goto decode_failure;
obviously.
Comment 8 Jakub Jelinek 2012-08-17 15:17:05 UTC
Created attachment 73257 [details]
valgrind-lzcnt-tzcnt.patch

Updated patch, this time actually tested, which should fix this bugreport as well as the 32-bit one.
Comment 9 Julian Seward 2012-08-23 20:15:29 UTC
(In reply to comment #8)
> Created attachment 73257 [details]
> valgrind-lzcnt-tzcnt.patch

Committed, r2478.  Thanks for the patch.
Comment 10 Yuri 2013-04-25 07:05:53 UTC
*** Bug 318773 has been marked as a duplicate of this bug. ***
Comment 11 Yuri 2013-06-02 14:07:00 UTC
Is it known when to expect the next release with this and other patches?

This particular PR holds adoption of gcc-4.8.X since gcc-4.8.X now generates offending instructions.

Last valgrind release 3.8.1 was in Sep 2012.
Comment 12 Mark Wielaard 2013-06-02 14:31:02 UTC
This fix was included in 3.8.1. So this bug can be closed.