Bug 465213

Summary: x86 tzcnt/lzcnt are incorrectly handled
Product: [Developer tools] valgrind Reporter: JunYoung Park <parkjuny>
Component: vexAssignee: Julian Seward <jseward>
Status: REPORTED ---    
Severity: normal    
Priority: NOR    
Version First Reported In: 3.21 GIT   
Target Milestone: ---   
Platform: unspecified   
OS: All   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description JunYoung Park 2023-02-03 06:45:10 UTC
commit `7003f40be9de1e10796578cba9e40ea6a548fc16` (current latest)

```c
// VEX/priv/guest_x86_toIR.c:14270
      switch (abyte) {
      case 0x0F:
         switch (getIByte(delta)) {
         /* On older CPUs, TZCNT behaves the same as BSF.  */
         case 0xBC: /* REP BSF Gv,Ev */
            delta = dis_bs_E_G ( sorb, sz, delta + 1, True );
            break;
         /* On older CPUs, LZCNT behaves the same as BSR.  */
         case 0xBD: /* REP BSR Gv,Ev */
            delta = dis_bs_E_G ( sorb, sz, delta + 1, False );
            break;
         default:
            goto decode_failure;
         }
         break;
```

In x86, `tzcnt` behaves not the same with `bsf`, and `lzcnt` behaves not the same with `bsr`. For operands that have values of 0, `tzcnt` and `lzcnt` give results of 0x20, while `bsf` and `bsr` give results of 0.

You can refer to https://www.felixcloutier.com/x86/tzcnt and https://www.felixcloutier.com/x86/lzcnt .

> The key difference between TZCNT and BSF instruction is that TZCNT provides
> operand size as output when source operand is zero while in the case of BSF 
> instruction, if source operand is zero, the content of destination operand are 
> undefined.
> LZCNT differs from BSR. For example, LZCNT will produce the operand size 
> when the input operand is zero.

I saw this problematic when I was lifting `tzcnt` and `lzcnt` with pyvex. I'm not sure if the code is the correct place to fix.