362329 – Valgrind does not support the IBM POWER ISA 3.0 instructions, part 3

Bug 362329 - Valgrind does not support the IBM POWER ISA 3.0 instructions, part 3

Summary: Valgrind does not support the IBM POWER ISA 3.0 instructions, part 3

Status:	CLOSED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	vex (other bugs)
Version First Reported In:	3.12 SVN
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-04-26 23:24 UTC by Carl Love
Modified:	2016-07-05 15:11 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Patch 3 of 5 to add VEX support for the POWER ISA 3.0 instructions (76.25 KB, patch) 2016-04-26 23:27 UTC, Carl Love	Details
Patch 3 of 5 to add testsuite support for the POWER ISA 3.0 instructions (2.91 MB, patch) 2016-04-26 23:37 UTC, Carl Love	Details
Patch 3 of 5 to add testsuite support for the POWER ISA 3.0 instructions, second half (2.61 MB, patch) 2016-04-26 23:38 UTC, Carl Love	Details
Patch 3 of 5 to add VEX support for the POWER ISA 3.0 instructions (76.41 KB, patch) 2016-04-28 19:41 UTC, Carl Love	Details
updated, 0001-Power-PC-Add-support-for-ISA-3.0-part-3.patch (78.57 KB, patch) 2016-05-18 19:30 UTC, Carl Love	Details
Show Obsolete (2) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Carl Love 2016-04-26 23:24:55 UTC

The IBM ISA 3.0 has been released. 

This bugzilla is the third in a series of five bugzillas for adding the needed ISA 3.0 support to Valgrind. 

The first bugzilla in the series is 359767.  The second bugzilla in the series is 361207

Reproducible: Always

Comment 1 Carl Love 2016-04-26 23:27:40 UTC

Created attachment 98632 [details]
Patch 3 of 5 to add VEX support for the POWER ISA 3.0 instructions

Patch to add support for emulating the Power PC ISA 3.0 instructions:
lxsd, lxssp, lxv, stxsd, stxssp, and stxv
xscpsgnqp, xscmpoqp, xscmpuqp, xscmpexpqp, xststdcqp, xsabsqp,
xsxexpqp, xsnabsqp, xsnegqp, xsxsigqp, xsiexpqp, xsxexpdp, xsxsigdp,
xscmpexpdp, xststdcdp, xsiexpdp, xsxtdcsp, xvxexpdp, xvxexpsp,
xvxsigdp, xvxsigsp, xviexpsp, xviexpdp, xvtstdcsp, xvtstdcdp
instructions.

Comment 2 Carl Love 2016-04-26 23:37:24 UTC

Created attachment 98633 [details]
Patch 3 of 5 to add testsuite support for the POWER ISA 3.0 instructions

Add tests to the test_isa_3_0.c file for the instructions in the VEX patch.

I manually cut the patch file in two as it exceeded the max attachment size.
This is the first part of the patch file.

Comment 3 Carl Love 2016-04-26 23:38:40 UTC

Created attachment 98634 [details]
Patch 3 of 5 to add testsuite support for the POWER ISA 3.0 instructions, second half

Add tests to the test_isa_3_0.c file for the instructions in the VEX patch.

I manually cut the patch file in two as it exceeded the max attachment size.
This is the second part of the patch file.

Comment 4 Carl Love 2016-04-28 19:41:25 UTC

Created attachment 98666 [details]
Patch 3 of 5 to add VEX support for the POWER ISA 3.0 instructions

Fixed declarations to use "house" type,   int i;  -> Int i;

Comment 5 Julian Seward 2016-05-18 08:57:15 UTC

(In reply to Carl Love from comment #4)
> Created attachment 98666 [details]
> Patch 3 of 5 to add VEX support for the POWER ISA 3.0 instructions


Ok to land, but I would prefer you fix the comments below first,
especially about the IRExpr* vs IRTemp duplication, since that
potentially can cause excessively long and slow code to get generated.



diff --git a/VEX/priv/guest_ppc_toIR.c b/VEX/priv/guest_ppc_toIR.c
index 44304df..e93831c 100644

+static IRExpr * create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero,
+                                IRExpr *dnorm, IRExpr *pos)

+static IRExpr * create_DCM ( IRExpr *NaN, IRExpr *inf, IRExpr *zero,
+                             IRExpr *dnorm, IRExpr *pos)

This gives a lot of duplication given that they produce identical trees
except for the types.  You can parameterise it like this:

  static IRExpr* create_DCM ( IROp opOR, IROp opAND, IROp opSHL,
                              IRExpr *NaN, IRExpr *inf, IRExpr *zero,
                              IRExpr *dnorm, IRExpr *pos )
  ..
  return binop(opOR,
               binop(opSHL, NaN, mkU8(6)),
  etc

  static IRExpr* create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero,
                                 IRExpr *dnorm, IRExpr *pos ) {
     return create_DCM(Iop_Or32, Iop_And32, Iop_Shl32,
                       NaN, inf, zero, dnorm, pos);
  }

  and similar for create_DCM_64



Another more general point is that you are passing in IRExpr*s, which
you then bake into bigger trees:

+static IRExpr * create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero,
+                                IRExpr *dnorm, IRExpr *pos)

You're using inf, zero, pos and neg twice, which means that you will
force the back end to generate code for them twice, and so they will
actually get computed twice.  If you're lucky, the IR optimiser
(ir_opt.c) will come along and common up that duplication (CSE pass),
but CSEing is expensive and so only happens for a minority of IRSBs
and these are relatively unlikely to qualify.

A way to avoid this is to bind those values to IRTemps and use the
temps instead:

   assign(tNaN, NaN);
   assign(tInf, inf);
   etc
   create_DCM_32( tNaN, tInf, ...)

Binding a value tree to an IRTemp forces the back end to generate code
that computes the value into a register, so you can then use the IRTemp
as many times as you like without any danger of duplicating computation
since it will just repeatedly read the (back-end) register that holds
the value.

This is particularly relevant for (eg) Quad_precision_gt since you are
going to duplicate the src_A and src_B computations many times there.




+static IRExpr * is_Zero_V128 ( IRTemp src )
+   return mkAND1( binop( Iop_CmpEQ64, hi64, mkU64( 0 ) ),
+                  binop( Iop_CmpEQ64, low64, mkU64( 0 ) ) );

binop(Iop_CmpEQ64, binop(Iop_Or64, hi64, low64), mkU64(0))
avoids mkAND1 since mkAND1/mkOR1/mkNOT1 and generate poor code -- the
back end is not very clever with them.  It could do better.

Comment 6 Carl Love 2016-05-18 19:30:56 UTC

Created attachment 99056 [details]
updated, 0001-Power-PC-Add-support-for-ISA-3.0-part-3.patch

I like your idea for combining create_DCM_32() and create_DCM() so we don't have lots of similarly named functions.  But, I am not too keen on passing a list of IROp's to the function as it isn't really obvious from the function name why it would need them.  My preference would be to just pass the size of the expr to be created and then just setup the IROp's in the function.  I think it is cleaner to the reader and a bit less error prone.  Let the function make sure it generates 64 bit or 32 bit iops rather then the user messing up on one of the arguments.  Also eliminates the need to verify the IROp's passed are all compatible with each other.

I wasn't aware of the the issue with the back end generating code multiple times.  So basically the rule of thumb should be:  If you use a computed value multiple times, compute it into a temp not an expression.

With the above two things in mind, I rewrote create_DCM() as follows:

static IRExpr * create_DCM ( IRType size, IRTemp NaN, IRTemp inf, IRTemp zero,  
                             IRTemp dnorm, IRTemp pos)                          
{                                                                               
   /* This is a general function for creating the DCM for a 32-bit or           
      64-bit expression based on the passes size.                               
   */                                                                           
   IRTemp neg;                                                                  
   IROp opAND, opOR, opSHL;                                                     
                                                                                
   vassert( ( size == Ity_I32 ) || ( size == Ity_I64 ) );                       
                                                                                
   if ( size == Ity_I32 ) {                                                     
      opSHL = Iop_Shl32;                                                        
      opAND = Iop_And32;                                                        
      opOR  = Iop_Or32;                                                         
      neg = newTemp( Ity_I32 );                                                 
                                                                                
   } else {                                                                     
      opSHL = Iop_Shl64;                                                        
      opAND = Iop_And64;                                                        
      opOR  = Iop_Or64;                                                         
      neg = newTemp( Ity_I64 );                                                 
   }                                                                            
                                                                                
   assign( neg, unop( Iop_1Uto64, mkNOT1( unop( Iop_64to1,                      
                                                mkexpr ( pos ) ) ) ) );  
    return binop( opOR,                                                          
                 binop( opSHL, mkexpr( NaN ), mkU8( 6 ) ),                      
                 binop( opOR,                                                   
                        binop( opOR,                                            
                               binop( opOR,                                     
                                      binop( opSHL,                             
                                             binop( opAND,                      
                                                    mkexpr( pos ),              
                                                    mkexpr( inf ) ),            
                                             mkU8( 5 ) ),                       
                                      binop( opSHL,                             
                                             binop( opAND,                      
                                                    mkexpr( neg ),              
                                                    mkexpr( inf ) ),            
                                             mkU8( 4 ) ) ),                     
                               binop( opOR,                                     
                                      binop( opSHL,                             
                                             binop( opAND,                      
                                                    mkexpr( pos ),              
                                                    mkexpr( zero ) ),           
                                             mkU8( 3 ) ),                       
                                      binop( opSHL,                             
                                             binop( opAND,                      
                                                    mkexpr( neg ),              
                                                    mkexpr( zero ) ),           
                                             mkU8( 2 ) ) ) ),                   
                        binop( opOR,                                            
                               binop( opSHL,                                    
                                      binop( opAND,                             
                                             mkexpr( pos ),                     
                                             mkexpr( dnorm ) ),                 
                                      mkU8( 1 ) ),                              
                               binop( opAND,                                    
                                      mkexpr( neg ),                            
                                      mkexpr( dnorm ) ) ) ) );                  
}                                                          

I chose to just pass the values in as temps and not mess around with changing them in the function.  It seemed cleaner.

Does the function look reasonable?

I did something very similar to combine functions is_Denorm_32() and is_Denorm().   I updated Quad_precision_gt() to use temps not expressions when the computed value was used more then once.  I also updated is_Zero_V128 () as suggested.

Comment 7 Julian Seward 2016-05-30 10:26:54 UTC

(In reply to Carl Love from comment #6)

> [..]  My preference would be to just pass the size
> of the expr to be created and then just setup the IROp's in the function.  I
> think it is cleaner to the reader and a bit less error prone.  Let the
> function make sure it generates 64 bit or 32 bit iops rather then the user
> messing up on one of the arguments.

Yes, this all seems fine.  Thanks for doing the fixups.

> I wasn't aware of the the issue with the back end generating code multiple
> times.  So basically the rule of thumb should be:  If you use a computed
> value multiple times, compute it into a temp not an expression.

Yes, exactly.  Temps are always safe for multiple-use since they really just
become registers in the back end code generator.

Looks good to land.

Comment 8 Julian Seward 2016-05-30 10:30:39 UTC

(In reply to Carl Love from comment #2)
> Created attachment 98633 [details]
> Patch 3 of 5 to add testsuite support for the POWER ISA 3.0 instructions

Ok to land.  Also (obviously) for the second half, in comment #3.

Comment 9 Carl Love 2016-06-01 19:57:44 UTC

Updated patch committed.   

Vex commit 3220
Valgrind commit 15890