385334 – PPC64, fix vpermr, xxperm, xxpermr mask value.

Bug 385334 - PPC64, fix vpermr, xxperm, xxpermr mask value.

Summary: PPC64, fix vpermr, xxperm, xxpermr mask value.

Status:	CLOSED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	vex (other bugs)
Version First Reported In:	3.14 SVN
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-10-03 15:52 UTC by Carl Love
Modified:	2017-10-05 17:28 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
vperm instuction, fix index mask (1.50 KB, patch) 2017-10-03 16:04 UTC, Carl Love	Details
fix mask for vpermr, xxperm, xxpermr instruction (4.45 KB, patch) 2017-10-04 23:24 UTC, Carl Love	Details
fix mask for vpermr, xxperm, xxpermr instruction (4.58 KB, patch) 2017-10-05 16:21 UTC, Carl Love	Details
Show Obsolete (2) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Carl Love 2017-10-03 15:52:56 UTC

The vperm instruction uses a 4-bit mask to extract the index value for the permute instruction.  It should be a 5-bit mask.  The ISA says the index value is 

   Let index be the value specified by bits 3:7 of byte
   element i of VR[VRC].

This patch fixes the bug.

Comment 1 Carl Love 2017-10-03 16:04:00 UTC

Created attachment 108147 [details]
vperm instuction, fix index mask

Add patch to fix issue

Comment 2 Julian Seward 2017-10-04 06:45:56 UTC

I looked at the IR generation for vperm
(case 0x2B: { // vperm (Permute, AV p218), etc)
and it looks correct to me.  Which test case is failing and
for which target? (32-be, 64-be, 64-le) ?

The code already does use 5 bits to select.  It uses 4 bits
here:

      assign( vC_andF,
              binop(Iop_AndV128, mkexpr(vC),
                                 unop(Iop_Dup8x16, mkU8(0xF))) );

and one from here:

      // mask[i8] = (vC[i8]_4 == 1) ? 0xFF : 0x0
      assign( mask, binop(Iop_SarN8x16,
                          binop(Iop_ShlN8x16, mkexpr(vC), mkU8(3)),
                          mkU8(7)) );

by computing, for each 8-bit lane, (lane << 3) >>signed 7.  This
copies bit[3] in the IBM encoding into all 8 bits of the lane, and
hence makes it into a usable mask.

I am wondering now if the problem is one of endianness.

The IR definition of Iop_Perm8x16 only allows values 0 .. 15 in the
selector fields, which the patch ends up violating.  Given the above
I suspect there's some other way to fix this.

Comment 3 Carl Love 2017-10-04 14:53:54 UTC

I will go back and look at this again.  The issue was actually seen in the re-working of the vpermr, xxperm, xxpermr.  The code from the vperm was used to implement the other instructions with the needed tweaks to do the reverse order and fetch from the other register file.  This patch then applied the same change to the vperm instruction.

Comment 4 Carl Love 2017-10-04 23:22:46 UTC

The issue is with the vpermr and xxpermr instructions that were reimplemented based on the vperm instruction.  In those implementations, the mask needed to be 0x1F instead of 0xF and the mask assignment uses perm_val or vC_andF depending on the instruction rather then the vector containing the indexes. 

The vpermr and xxpermr instructions do the reverse permutation, specifically the permute value is 31 - index value for each vector entry.  The subtraction causes the bits to get flipped.  

The real issue is that the implementation was doing the masking each of the indexes in the vector then subtracting them from 31.  The indexes should be subtracted from 31 first, then the result is masked with 0xF.  Once the order of operations is corrected, then the mask assignment is done against the vector-31.    

So, the comments and the implementation of the vpermr, xxperm and xxpermr instructions need some cleaning up to make the mask 0xF not 0x1F, etc.  

The patch to fix the vperm instruction need to be replaced with a clean up patch.

Comment 5 Carl Love 2017-10-04 23:24:45 UTC

Created attachment 108173 [details]
fix mask for vpermr, xxperm, xxpermr instruction

Replace the vperm instruction patch with a clean up patch for the vpermr, xxperm, xxpermr instructions so their implementation is correct and "mirror" the implementation of the vperm instruction.

Comment 6 Julian Seward 2017-10-05 12:17:51 UTC

Random observation, w.r.t new patch:

Instead of this

                            binop( Iop_64HLtoV128,
                                   mkU64( 0x1F1F1F1F1F1F1F1F ),
                                   mkU64( 0x1F1F1F1F1F1F1F1F ) )

you can just do this

-                           unop( Iop_Dup8x16, mkU8( 0x1F ) )

as is actually done only a few lines further up in the patch.

Comment 7 Carl Love 2017-10-05 16:21:48 UTC

Created attachment 108190 [details]
fix mask for vpermr, xxperm, xxpermr instruction

patch cleanup suggested by Julian

Comment 8 Carl Love 2017-10-05 17:21:09 UTC

patch committed.  

commit 856d45eb7e3661a61ace32be2cfa10bf198620c8