Summary: | Remove Iop_Clz32/64 and Iop_Ctz32/64 | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Florian Krohm <flo2030> |
Component: | vex | Assignee: | Florian Krohm <flo2030> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | NOR | ||
Version First Reported In: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Patch for x86
Patch for amd64 Patch for nanomips Patch for mips Patch for arm Patch for arm64 Patch for ppc Remove Iop_Clz32/64 and Iop_Ctz32/64 |
Description
Florian Krohm
2025-07-14 16:45:58 UTC
The semantics of Iop_Clz32/64 are identical to Iop_ClzNat32/64 unless the input value is zero. And likewise for Iop_Ctz32/64. So.... audit the code, check occurrences of above operators and if their use is guarded to ensure that the input value is != 0 then the operator can be replaced with its "Nat" counterpart. Easy as pie. Spoiler alert: not always so easy as there are "interesting" uses of Iop_Clz .... Created attachment 183451 [details]
Patch for x86
All occurrences of Iop_Clz32 and Iop_Ctz were properly guarded against a zero input.
Remove those IROps from insn selection. Adjust vbit-test accordingly.
Created attachment 183455 [details]
Patch for amd64
As expected all occurrences of Iop_Clz64 and Iop_Ctz64 were properly guarded against
a zero input. Remove those IROps from insn selection. Adjust vbit-test accordingly.
Created attachment 183457 [details]
Patch for nanomips
There are 2 occurrences of Iop_Clz32.
#1 for the CLZ insn:
case nano_POOL32Axf4_CLZ: { /* clz */
DIP("clz r%u, r%u", rt, rs);
putIReg(rt, unop(Iop_Clz32, getIReg(rs)));
break;
}
This what happens:
CLZ insn --ir--> Iop_Clz32 --isel--> NMun_CLZ --emit--> CLZ
The CLZ insn has no special behaviour when the input is 0. Using Iop_Clz32
is not the right choice then as it has an undefined result for a 0 input.
This all works out, though, because in the end a CLZ insn will be emitted.
So we can simply change Iop_Clz64 --> Iop_ClzNat64 and be done here.
#2 for the CLO insn:
case nano_POOL32Axf4_CLO: { /* clo */
DIP("clo r%u, r%u", rt, rs);
t1 = newTemp(Ity_I1);
assign(t1, binop(Iop_CmpEQ32, getIReg(rs), mkU32(0xffffffff)));
putIReg(rt, IRExpr_ITE(mkexpr(t1),
mkU32(0x00000020),
unop(Iop_Clz32,
unop(Iop_Not32, getIReg(rs)))));
break;
}
The special case is not needed because Clz32(0) == 32;
case nano_POOL32Axf4_CLO: { /* clo */
DIP("clo r%u, r%u", rt, rs);
putIReg(rt, unop(Iop_Clz32, unop(Iop_Not32, getIReg(rs)))));
break;
}
Done.
Created attachment 183461 [details]
Patch for mips
There were multiple uses of Iop_Clz32 and Iop_Clz64 to implement CLZ, CLO,
DCLZ and DCLO. Sometimes but not always there was code to guard against a
0 input value. But it does not matter. Because in the end CLZ and DCLZ were
emitted which do the right thing. See also the comment about nanomips.
The patch removes any special handling for 0 and replaces Iop_Clz32 with
Iop_ClzNat32 and Iop_Clz64 with Iop_ClzNat64.
Created attachment 183464 [details]
Patch for arm
Replace Iop_Clz32 with Iop_ClzNat32 and remove special handling for a 0 input value.
CLZ --ir--> Iop_ClzNat32 --isel--> ARMun_CLZ --emit--> CLZ
CLZ handles all input values correctly
Created attachment 183468 [details]
Patch for arm64
Replace Iop_Clz64 with Iop_ClzNat64 for which eventually a CLZ insn will be emitted and that
handles all input values correctly.
I'd like to move this along. Will check in the x86 / amd64 patches soonish. If somebody with access to arm/64 and mips/nanomips could regtest the patches above that would be great. Still waiting to head back from Carl on ppc. Created attachment 184410 [details]
Patch for ppc
Iop_Clz64 and Iop_ClzNat64 are already handled the same way, namely:
Iop_Clz64 / Iop_ClzNat64 --isel--> Pun_CLZ64 --emit-->cntlzd
where the cntlzd insn has "natural" semantics and a 0 operand is not special.
Created attachment 184414 [details] Remove Iop_Clz32/64 and Iop_Ctz32/64 The patch removes those IRops from libvex_ir.h and fixes the ripple. What is left to do is to fix the insn emitters for x86 and amd64 so they handle a 0 operand properly i.e. "naturally". This isn't a problem in valgrind operation because in IR generation a 0 operand is handled as a special case. See also https://bugs.kde.org/show_bug.cgi?id=506211#c2 Fixed in 7389eabaf3 |