The implementation of the vpermr instruction will generate a lot of Iops which could exhaust temporary memory. The instruction implementation is similar to the xxperm and xxpermr instructions which exhibited this issue.
Created attachment 108095 [details] re-implement vpermr instruction to generate fewer iops Patch reworks the vpermr instruction to use the Iop_Perm8x16. This reduces the Iop generation significantly.
diff --git a/VEX/priv/guest_ppc_toIR.c b/VEX/priv/guest_ppc_toIR.c index 3897b9f2f..79d530d71 100644 --- a/VEX/priv/guest_ppc_toIR.c +++ b/VEX/priv/guest_ppc_toIR.c @@ -24297,12 +24297,12 @@ static Bool dis_av_permute ( UInt theInstr ) IRTemp vC_andF = newTemp(Ity_V128); DIP("vperm v%d,v%d,v%d,v%d\n", vD_addr, vA_addr, vB_addr, vC_addr); - /* Limit the Perm8x16 steering values to 0 .. 15 as that is what + /* Limit the Perm8x16 steering values to 0 .. 31 as that is what IR specifies, and also to hide irrelevant bits from memcheck */ assign( vC_andF, binop(Iop_AndV128, mkexpr(vC), - unop(Iop_Dup8x16, mkU8(0xF))) ); + unop(Iop_Dup8x16, mkU8(0x1F))) ); assign( a_perm, binop(Iop_Perm8x16, mkexpr(vA), mkexpr(vC_andF)) ); assign( b_perm, Is this hunk necessary? It seems to be changing the semantics of an existing instruction, rather than merely re-implementing one. Otherwise looks OK to me.
Yes, it is changing the semantics. The code a implemented is wrong. The field for the steering value is 5 bits according to the ISA, not four as implemented. So values greater then 16 are not handled correctly. The ISA says: Let the source vector be the concatenation of the contents of VR[VRA] followed by the contents of VR[VRB]. For each integer value i from 0 to 15, do the following. Let index be the value specified by bits 3:7 of byte element i of VR[VRC]. So, the index value is 5-bits wide. The code for the vperm instruction was reused with some small tweaks to implement the vpermr, xxperm and xxpermr instructions. In testing the code for these other instructions this issue was found. Probably should have separated it out into its own bug fix.
Created attachment 108148 [details] re-implement vpermr instruction to generate fewer iops Update patch, split the vpermr implementation and the vperm index mask into separate patches. The vperm index mask was moved to bugzilla 385334.
committed patch: commit b0aef250a74804423341b3ce804355037211e330