| Summary: | vpermr instruction could exhaust temporary memory | ||
|---|---|---|---|
| Product: | [Developer tools] valgrind | Reporter: | Carl Love <cel> |
| Component: | vex | Assignee: | Julian Seward <jseward> |
| Status: | CLOSED FIXED | ||
| Severity: | normal | ||
| Priority: | NOR | ||
| Version First Reported In: | 3.14 SVN | ||
| Target Milestone: | --- | ||
| Platform: | Other | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: |
re-implement vpermr instruction to generate fewer iops
re-implement vpermr instruction to generate fewer iops |
||
|
Description
Carl Love
2017-09-29 18:07:59 UTC
Created attachment 108095 [details]
re-implement vpermr instruction to generate fewer iops
Patch reworks the vpermr instruction to use the Iop_Perm8x16. This reduces the Iop generation significantly.
diff --git a/VEX/priv/guest_ppc_toIR.c b/VEX/priv/guest_ppc_toIR.c
index 3897b9f2f..79d530d71 100644
--- a/VEX/priv/guest_ppc_toIR.c
+++ b/VEX/priv/guest_ppc_toIR.c
@@ -24297,12 +24297,12 @@ static Bool dis_av_permute ( UInt theInstr )
IRTemp vC_andF = newTemp(Ity_V128);
DIP("vperm v%d,v%d,v%d,v%d\n",
vD_addr, vA_addr, vB_addr, vC_addr);
- /* Limit the Perm8x16 steering values to 0 .. 15 as that is what
+ /* Limit the Perm8x16 steering values to 0 .. 31 as that is what
IR specifies, and also to hide irrelevant bits from
memcheck */
assign( vC_andF,
binop(Iop_AndV128, mkexpr(vC),
- unop(Iop_Dup8x16, mkU8(0xF))) );
+ unop(Iop_Dup8x16, mkU8(0x1F))) );
assign( a_perm,
binop(Iop_Perm8x16, mkexpr(vA), mkexpr(vC_andF)) );
assign( b_perm,
Is this hunk necessary? It seems to be changing the semantics of an
existing instruction, rather than merely re-implementing one.
Otherwise looks OK to me.
Yes, it is changing the semantics. The code a implemented is wrong. The field for the steering value is 5 bits according to the ISA, not four as implemented. So values greater then 16 are not handled correctly. The ISA says: Let the source vector be the concatenation of the contents of VR[VRA] followed by the contents of VR[VRB]. For each integer value i from 0 to 15, do the following. Let index be the value specified by bits 3:7 of byte element i of VR[VRC]. So, the index value is 5-bits wide. The code for the vperm instruction was reused with some small tweaks to implement the vpermr, xxperm and xxpermr instructions. In testing the code for these other instructions this issue was found. Probably should have separated it out into its own bug fix. Created attachment 108148 [details]
re-implement vpermr instruction to generate fewer iops
Update patch, split the vpermr implementation and the vperm index mask into separate patches. The vperm index mask was moved to bugzilla 385334.
committed patch: commit b0aef250a74804423341b3ce804355037211e330 |