Created attachment 182684 [details] audit of constant folding and algebraic simplification Are there missed constant folding opportunities in VEX ir_opt.c? Yes. Attached is a spreadsheet that shows the status of constant folding and algebraic simplification for IROps that have integer operands and results. Function fold_Expr_WRK is the work horse. There are also fold_IRExpr_Binop and fold_IRExpr_Unop. I did not look at those. Here's the plan: 1) Write iropt-test similar to vbit-test using the IR injection machinery to verify that constant folding works as expected. 2) Verify existing constant folding :) 3) Fill in the constant folding gaps running iropt-test alongside. 4) Explore whether IR injection can be extended to allow symbolic verification of algebraic simplifications. E.g. verify that Iop_And32(x, 0) --> x without assigning selected constants to x thereby transforming the issue to constant folding (and having only partial coverage). Current status: initial version of iropt-test verified that constant folding for Iop_Not8 works :)
1752297cb1 adds the checker for VEX constant folding. As expected no surprises.
25b9956679 updates the iropt tester to run both with and without constant folding. For now only s390 is enabled because there are failures on the other supported architectures. Here are a few samples: ppc failures: *** Incorrect result for operator MullU32 ppc64be, ppc64le opnd 0: 00000001 opnd 1: fffffffe result fold: 00000000fffffffe result nofold: fffffffffffffffe result expected: 00000000fffffffe *** Incorrect result for operator DivU32E ppc64be, ppc64le, ppc32 opnd 0: 000000ff opnd 1: 00000002 result fold: 80000000 result nofold: 00000000 result expected: 80000000 *** Incorrect result for operator DivS32E ppc64be, ppc64le, ppc32 opnd 0: 00000001 opnd 1: 00000002 result fold: 80000000 result nofold: 00000000 result expected: 80000000 amd64 failures: *** Incorrect result for operator ClzNat64 opnd 0: 0000000000000000 result fold: 0000000000000040 result nofold: fffffffffffff870 result expected: 0000000000000040 *** Incorrect result for operator CtzNat64 opnd 0: 0000000000000000 result fold: 0000000000000040 result nofold: 00000000000173da result expected: 0000000000000040 This is understood. The reason is that ClzNat64 is implemented via the BSF insn which has undefined behaviour when the operand is zero. Clearly, this is wrong and needs to be fixed. See also https://bugs.kde.org/show_bug.cgi?id=507033 This error does not surface in normal valgrind operation because in IR generation there is scaffolding to handle a 0 operand. x86 failures: *** Incorrect result for operator ClzNat32 opnd 0: 00000000 result fold: 00000020 result nofold: 7e0a7c0f result expected: 00000020 *** Incorrect result for operator CtzNat32 opnd 0: 00000000 result fold: 00000020 result nofold: 81f58410 result expected: 00000020 See amd64 explanation.
The amd64 and x86 failures have been fixed in 7389eabaf31. iropt-test is now enabled for x86 and amd64 as well.
Created attachment 184583 [details] constant folding update The fields with green background identify the IROps for which constant folding has been added. No support for IROps using 128 bit values. I will open a separate BZ for that. This concludes part #3 of the plan outlined in https://bugs.kde.org/show_bug.cgi?id=506211#c0
I'm calling it a day. Constant folding for integer IROps not yielding or operating on 128-bit values has been finished. There has also been moderate addition of algebraic simplification rules. One can keep adding rules but it is often not at all clear whether there is a benefit. So adding rules is best done when a particular inefficiency has been observed. As for part #4 in the original plan https://bugs.kde.org/show_bug.cgi?id=506211#c0 this cannot be done once you start thinking about it..