From libvex_ir.h: /* Ctz64/Ctz32/Clz64/Clz32 are UNDEFINED when given arguments of zero. You must ensure they are never given a zero argument. As of 2018-Nov-14 they are deprecated. Try to use the Nat variants immediately below, if you can. */ I'd say: time to get rid of them. Announce removal in 3.25.2. Remove in 3.26.. (or whatever the process is)
The semantics of Iop_Clz32/64 are identical to Iop_ClzNat32/64 unless the input value is zero. And likewise for Iop_Ctz32/64. So.... audit the code, check occurrences of above operators and if their use is guarded to ensure that the input value is != 0 then the operator can be replaced with its "Nat" counterpart. Easy as pie. Spoiler alert: not always so easy as there are "interesting" uses of Iop_Clz ....
Created attachment 183451 [details] Patch for x86 All occurrences of Iop_Clz32 and Iop_Ctz were properly guarded against a zero input. Remove those IROps from insn selection. Adjust vbit-test accordingly.
Created attachment 183455 [details] Patch for amd64 As expected all occurrences of Iop_Clz64 and Iop_Ctz64 were properly guarded against a zero input. Remove those IROps from insn selection. Adjust vbit-test accordingly.
Created attachment 183457 [details] Patch for nanomips There are 2 occurrences of Iop_Clz32. #1 for the CLZ insn: case nano_POOL32Axf4_CLZ: { /* clz */ DIP("clz r%u, r%u", rt, rs); putIReg(rt, unop(Iop_Clz32, getIReg(rs))); break; } This what happens: CLZ insn --ir--> Iop_Clz32 --isel--> NMun_CLZ --emit--> CLZ The CLZ insn has no special behaviour when the input is 0. Using Iop_Clz32 is not the right choice then as it has an undefined result for a 0 input. This all works out, though, because in the end a CLZ insn will be emitted. So we can simply change Iop_Clz64 --> Iop_ClzNat64 and be done here. #2 for the CLO insn: case nano_POOL32Axf4_CLO: { /* clo */ DIP("clo r%u, r%u", rt, rs); t1 = newTemp(Ity_I1); assign(t1, binop(Iop_CmpEQ32, getIReg(rs), mkU32(0xffffffff))); putIReg(rt, IRExpr_ITE(mkexpr(t1), mkU32(0x00000020), unop(Iop_Clz32, unop(Iop_Not32, getIReg(rs))))); break; } The special case is not needed because Clz32(0) == 32; case nano_POOL32Axf4_CLO: { /* clo */ DIP("clo r%u, r%u", rt, rs); putIReg(rt, unop(Iop_Clz32, unop(Iop_Not32, getIReg(rs))))); break; } Done.
Created attachment 183461 [details] Patch for mips There were multiple uses of Iop_Clz32 and Iop_Clz64 to implement CLZ, CLO, DCLZ and DCLO. Sometimes but not always there was code to guard against a 0 input value. But it does not matter. Because in the end CLZ and DCLZ were emitted which do the right thing. See also the comment about nanomips. The patch removes any special handling for 0 and replaces Iop_Clz32 with Iop_ClzNat32 and Iop_Clz64 with Iop_ClzNat64.
Created attachment 183464 [details] Patch for arm Replace Iop_Clz32 with Iop_ClzNat32 and remove special handling for a 0 input value. CLZ --ir--> Iop_ClzNat32 --isel--> ARMun_CLZ --emit--> CLZ CLZ handles all input values correctly
Created attachment 183468 [details] Patch for arm64 Replace Iop_Clz64 with Iop_ClzNat64 for which eventually a CLZ insn will be emitted and that handles all input values correctly.
I'd like to move this along. Will check in the x86 / amd64 patches soonish. If somebody with access to arm/64 and mips/nanomips could regtest the patches above that would be great. Still waiting to head back from Carl on ppc.
Created attachment 184410 [details] Patch for ppc Iop_Clz64 and Iop_ClzNat64 are already handled the same way, namely: Iop_Clz64 / Iop_ClzNat64 --isel--> Pun_CLZ64 --emit-->cntlzd where the cntlzd insn has "natural" semantics and a 0 operand is not special.
Created attachment 184414 [details] Remove Iop_Clz32/64 and Iop_Ctz32/64 The patch removes those IRops from libvex_ir.h and fixes the ripple. What is left to do is to fix the insn emitters for x86 and amd64 so they handle a 0 operand properly i.e. "naturally". This isn't a problem in valgrind operation because in IR generation a 0 operand is handled as a special case. See also https://bugs.kde.org/show_bug.cgi?id=506211#c2
Fixed in 7389eabaf3