The IBM ISA 3.0 has been released. This bugzilla is the third in a series of five bugzillas for adding the needed ISA 3.0 support to Valgrind. The first bugzilla in the series is 359767. The second bugzilla in the series is 361207 Reproducible: Always
Created attachment 98632 [details] Patch 3 of 5 to add VEX support for the POWER ISA 3.0 instructions Patch to add support for emulating the Power PC ISA 3.0 instructions: lxsd, lxssp, lxv, stxsd, stxssp, and stxv xscpsgnqp, xscmpoqp, xscmpuqp, xscmpexpqp, xststdcqp, xsabsqp, xsxexpqp, xsnabsqp, xsnegqp, xsxsigqp, xsiexpqp, xsxexpdp, xsxsigdp, xscmpexpdp, xststdcdp, xsiexpdp, xsxtdcsp, xvxexpdp, xvxexpsp, xvxsigdp, xvxsigsp, xviexpsp, xviexpdp, xvtstdcsp, xvtstdcdp instructions.
Created attachment 98633 [details] Patch 3 of 5 to add testsuite support for the POWER ISA 3.0 instructions Add tests to the test_isa_3_0.c file for the instructions in the VEX patch. I manually cut the patch file in two as it exceeded the max attachment size. This is the first part of the patch file.
Created attachment 98634 [details] Patch 3 of 5 to add testsuite support for the POWER ISA 3.0 instructions, second half Add tests to the test_isa_3_0.c file for the instructions in the VEX patch. I manually cut the patch file in two as it exceeded the max attachment size. This is the second part of the patch file.
Created attachment 98666 [details] Patch 3 of 5 to add VEX support for the POWER ISA 3.0 instructions Fixed declarations to use "house" type, int i; -> Int i;
(In reply to Carl Love from comment #4) > Created attachment 98666 [details] > Patch 3 of 5 to add VEX support for the POWER ISA 3.0 instructions Ok to land, but I would prefer you fix the comments below first, especially about the IRExpr* vs IRTemp duplication, since that potentially can cause excessively long and slow code to get generated. diff --git a/VEX/priv/guest_ppc_toIR.c b/VEX/priv/guest_ppc_toIR.c index 44304df..e93831c 100644 +static IRExpr * create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero, + IRExpr *dnorm, IRExpr *pos) +static IRExpr * create_DCM ( IRExpr *NaN, IRExpr *inf, IRExpr *zero, + IRExpr *dnorm, IRExpr *pos) This gives a lot of duplication given that they produce identical trees except for the types. You can parameterise it like this: static IRExpr* create_DCM ( IROp opOR, IROp opAND, IROp opSHL, IRExpr *NaN, IRExpr *inf, IRExpr *zero, IRExpr *dnorm, IRExpr *pos ) .. return binop(opOR, binop(opSHL, NaN, mkU8(6)), etc static IRExpr* create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero, IRExpr *dnorm, IRExpr *pos ) { return create_DCM(Iop_Or32, Iop_And32, Iop_Shl32, NaN, inf, zero, dnorm, pos); } and similar for create_DCM_64 Another more general point is that you are passing in IRExpr*s, which you then bake into bigger trees: +static IRExpr * create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero, + IRExpr *dnorm, IRExpr *pos) You're using inf, zero, pos and neg twice, which means that you will force the back end to generate code for them twice, and so they will actually get computed twice. If you're lucky, the IR optimiser (ir_opt.c) will come along and common up that duplication (CSE pass), but CSEing is expensive and so only happens for a minority of IRSBs and these are relatively unlikely to qualify. A way to avoid this is to bind those values to IRTemps and use the temps instead: assign(tNaN, NaN); assign(tInf, inf); etc create_DCM_32( tNaN, tInf, ...) Binding a value tree to an IRTemp forces the back end to generate code that computes the value into a register, so you can then use the IRTemp as many times as you like without any danger of duplicating computation since it will just repeatedly read the (back-end) register that holds the value. This is particularly relevant for (eg) Quad_precision_gt since you are going to duplicate the src_A and src_B computations many times there. +static IRExpr * is_Zero_V128 ( IRTemp src ) + return mkAND1( binop( Iop_CmpEQ64, hi64, mkU64( 0 ) ), + binop( Iop_CmpEQ64, low64, mkU64( 0 ) ) ); binop(Iop_CmpEQ64, binop(Iop_Or64, hi64, low64), mkU64(0)) avoids mkAND1 since mkAND1/mkOR1/mkNOT1 and generate poor code -- the back end is not very clever with them. It could do better.
Created attachment 99056 [details] updated, 0001-Power-PC-Add-support-for-ISA-3.0-part-3.patch I like your idea for combining create_DCM_32() and create_DCM() so we don't have lots of similarly named functions. But, I am not too keen on passing a list of IROp's to the function as it isn't really obvious from the function name why it would need them. My preference would be to just pass the size of the expr to be created and then just setup the IROp's in the function. I think it is cleaner to the reader and a bit less error prone. Let the function make sure it generates 64 bit or 32 bit iops rather then the user messing up on one of the arguments. Also eliminates the need to verify the IROp's passed are all compatible with each other. I wasn't aware of the the issue with the back end generating code multiple times. So basically the rule of thumb should be: If you use a computed value multiple times, compute it into a temp not an expression. With the above two things in mind, I rewrote create_DCM() as follows: static IRExpr * create_DCM ( IRType size, IRTemp NaN, IRTemp inf, IRTemp zero, IRTemp dnorm, IRTemp pos) { /* This is a general function for creating the DCM for a 32-bit or 64-bit expression based on the passes size. */ IRTemp neg; IROp opAND, opOR, opSHL; vassert( ( size == Ity_I32 ) || ( size == Ity_I64 ) ); if ( size == Ity_I32 ) { opSHL = Iop_Shl32; opAND = Iop_And32; opOR = Iop_Or32; neg = newTemp( Ity_I32 ); } else { opSHL = Iop_Shl64; opAND = Iop_And64; opOR = Iop_Or64; neg = newTemp( Ity_I64 ); } assign( neg, unop( Iop_1Uto64, mkNOT1( unop( Iop_64to1, mkexpr ( pos ) ) ) ) ); return binop( opOR, binop( opSHL, mkexpr( NaN ), mkU8( 6 ) ), binop( opOR, binop( opOR, binop( opOR, binop( opSHL, binop( opAND, mkexpr( pos ), mkexpr( inf ) ), mkU8( 5 ) ), binop( opSHL, binop( opAND, mkexpr( neg ), mkexpr( inf ) ), mkU8( 4 ) ) ), binop( opOR, binop( opSHL, binop( opAND, mkexpr( pos ), mkexpr( zero ) ), mkU8( 3 ) ), binop( opSHL, binop( opAND, mkexpr( neg ), mkexpr( zero ) ), mkU8( 2 ) ) ) ), binop( opOR, binop( opSHL, binop( opAND, mkexpr( pos ), mkexpr( dnorm ) ), mkU8( 1 ) ), binop( opAND, mkexpr( neg ), mkexpr( dnorm ) ) ) ) ); } I chose to just pass the values in as temps and not mess around with changing them in the function. It seemed cleaner. Does the function look reasonable? I did something very similar to combine functions is_Denorm_32() and is_Denorm(). I updated Quad_precision_gt() to use temps not expressions when the computed value was used more then once. I also updated is_Zero_V128 () as suggested.
(In reply to Carl Love from comment #6) > [..] My preference would be to just pass the size > of the expr to be created and then just setup the IROp's in the function. I > think it is cleaner to the reader and a bit less error prone. Let the > function make sure it generates 64 bit or 32 bit iops rather then the user > messing up on one of the arguments. Yes, this all seems fine. Thanks for doing the fixups. > I wasn't aware of the the issue with the back end generating code multiple > times. So basically the rule of thumb should be: If you use a computed > value multiple times, compute it into a temp not an expression. Yes, exactly. Temps are always safe for multiple-use since they really just become registers in the back end code generator. Looks good to land.
(In reply to Carl Love from comment #2) > Created attachment 98633 [details] > Patch 3 of 5 to add testsuite support for the POWER ISA 3.0 instructions Ok to land. Also (obviously) for the second half, in comment #3.
Updated patch committed. Vex commit 3220 Valgrind commit 15890