Bug 465213

Summary:	x86 tzcnt/lzcnt are incorrectly handled
Product:	[Developer tools] valgrind	Reporter:	JunYoung Park <parkjuny>
Component:	vex	Assignee:	Julian Seward <jseward>
Status:	REPORTED ---
Severity:	normal
Priority:	NOR
Version First Reported In:	3.21 GIT
Target Milestone:	---
Platform:	unspecified
OS:	All
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description JunYoung Park 2023-02-03 06:45:10 UTC

commit `7003f40be9de1e10796578cba9e40ea6a548fc16` (current latest)

```c
// VEX/priv/guest_x86_toIR.c:14270
      switch (abyte) {
      case 0x0F:
         switch (getIByte(delta)) {
         /* On older CPUs, TZCNT behaves the same as BSF.  */
         case 0xBC: /* REP BSF Gv,Ev */
            delta = dis_bs_E_G ( sorb, sz, delta + 1, True );
            break;
         /* On older CPUs, LZCNT behaves the same as BSR.  */
         case 0xBD: /* REP BSR Gv,Ev */
            delta = dis_bs_E_G ( sorb, sz, delta + 1, False );
            break;
         default:
            goto decode_failure;
         }
         break;
```

In x86, `tzcnt` behaves not the same with `bsf`, and `lzcnt` behaves not the same with `bsr`. For operands that have values of 0, `tzcnt` and `lzcnt` give results of 0x20, while `bsf` and `bsr` give results of 0.

You can refer to https://www.felixcloutier.com/x86/tzcnt and https://www.felixcloutier.com/x86/lzcnt .

> The key difference between TZCNT and BSF instruction is that TZCNT provides
> operand size as output when source operand is zero while in the case of BSF 
> instruction, if source operand is zero, the content of destination operand are 
> undefined.
> LZCNT differs from BSR. For example, LZCNT will produce the operand size 
> when the input operand is zero.

I saw this problematic when I was lifting `tzcnt` and `lzcnt` with pyvex. I'm not sure if the code is the correct place to fix.