Bug 385334

Summary: PPC64, fix vpermr, xxperm, xxpermr mask value.
Product: [Developer tools] valgrind Reporter: Carl Love <cel>
Component: vexAssignee: Julian Seward <jseward>
Status: CLOSED FIXED    
Severity: normal    
Priority: NOR    
Version First Reported In: 3.14 SVN   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: vperm instuction, fix index mask
fix mask for vpermr, xxperm, xxpermr instruction
fix mask for vpermr, xxperm, xxpermr instruction

Description Carl Love 2017-10-03 15:52:56 UTC
The vperm instruction uses a 4-bit mask to extract the index value for the permute instruction.  It should be a 5-bit mask.  The ISA says the index value is 

   Let index be the value specified by bits 3:7 of byte
   element i of VR[VRC].

This patch fixes the bug.
Comment 1 Carl Love 2017-10-03 16:04:00 UTC
Created attachment 108147 [details]
vperm instuction, fix index mask

Add patch to fix issue
Comment 2 Julian Seward 2017-10-04 06:45:56 UTC
I looked at the IR generation for vperm
(case 0x2B: { // vperm (Permute, AV p218), etc)
and it looks correct to me.  Which test case is failing and
for which target? (32-be, 64-be, 64-le) ?

The code already does use 5 bits to select.  It uses 4 bits
here:

      assign( vC_andF,
              binop(Iop_AndV128, mkexpr(vC),
                                 unop(Iop_Dup8x16, mkU8(0xF))) );

and one from here:

      // mask[i8] = (vC[i8]_4 == 1) ? 0xFF : 0x0
      assign( mask, binop(Iop_SarN8x16,
                          binop(Iop_ShlN8x16, mkexpr(vC), mkU8(3)),
                          mkU8(7)) );

by computing, for each 8-bit lane, (lane << 3) >>signed 7.  This
copies bit[3] in the IBM encoding into all 8 bits of the lane, and
hence makes it into a usable mask.

I am wondering now if the problem is one of endianness.

The IR definition of Iop_Perm8x16 only allows values 0 .. 15 in the
selector fields, which the patch ends up violating.  Given the above
I suspect there's some other way to fix this.
Comment 3 Carl Love 2017-10-04 14:53:54 UTC
I will go back and look at this again.  The issue was actually seen in the re-working of the vpermr, xxperm, xxpermr.  The code from the vperm was used to implement the other instructions with the needed tweaks to do the reverse order and fetch from the other register file.  This patch then applied the same change to the vperm instruction.
Comment 4 Carl Love 2017-10-04 23:22:46 UTC
The issue is with the vpermr and xxpermr instructions that were reimplemented based on the vperm instruction.  In those implementations, the mask needed to be 0x1F instead of 0xF and the mask assignment uses perm_val or vC_andF depending on the instruction rather then the vector containing the indexes. 

The vpermr and xxpermr instructions do the reverse permutation, specifically the permute value is 31 - index value for each vector entry.  The subtraction causes the bits to get flipped.  

The real issue is that the implementation was doing the masking each of the indexes in the vector then subtracting them from 31.  The indexes should be subtracted from 31 first, then the result is masked with 0xF.  Once the order of operations is corrected, then the mask assignment is done against the vector-31.    

So, the comments and the implementation of the vpermr, xxperm and xxpermr instructions need some cleaning up to make the mask 0xF not 0x1F, etc.  

The patch to fix the vperm instruction need to be replaced with a clean up patch.
Comment 5 Carl Love 2017-10-04 23:24:45 UTC
Created attachment 108173 [details]
fix mask for vpermr, xxperm, xxpermr instruction

Replace the vperm instruction patch with a clean up patch for the vpermr, xxperm, xxpermr instructions so their implementation is correct and "mirror" the implementation of the vperm instruction.
Comment 6 Julian Seward 2017-10-05 12:17:51 UTC
Random observation, w.r.t new patch:

Instead of this

                            binop( Iop_64HLtoV128,
                                   mkU64( 0x1F1F1F1F1F1F1F1F ),
                                   mkU64( 0x1F1F1F1F1F1F1F1F ) )

you can just do this

-                           unop( Iop_Dup8x16, mkU8( 0x1F ) )

as is actually done only a few lines further up in the patch.
Comment 7 Carl Love 2017-10-05 16:21:48 UTC
Created attachment 108190 [details]
fix mask for vpermr, xxperm, xxpermr instruction

patch cleanup suggested by Julian
Comment 8 Carl Love 2017-10-05 17:21:09 UTC
patch committed.  

commit 856d45eb7e3661a61ace32be2cfa10bf198620c8