Bug 363858 - Add IBM ISA 3.0 support, patch set 4
Summary: Add IBM ISA 3.0 support, patch set 4
Status: CLOSED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: vex (show other bugs)
Version: 3.12 SVN
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-02 20:10 UTC by Carl Love
Modified: 2016-07-05 15:10 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Patch 4 of 5 to add VEX support for the POWER ISA 3.0 instructions (118.59 KB, patch)
2016-06-02 20:12 UTC, Carl Love
Details
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions (3.50 MB, patch)
2016-06-02 20:20 UTC, Carl Love
Details
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions, part 2 (2.82 MB, patch)
2016-06-02 20:25 UTC, Carl Love
Details
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions, part 3 (2.00 MB, patch)
2016-06-02 20:27 UTC, Carl Love
Details
Patch 4 of 5 to add VEX support for the POWER ISA 3.0 instructions (129.22 KB, patch)
2016-06-27 20:16 UTC, Carl Love
Details
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions, part 1 (3.50 MB, patch)
2016-06-27 20:17 UTC, Carl Love
Details
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions, part 2 (2.82 MB, patch)
2016-06-27 20:19 UTC, Carl Love
Details
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions, part 3 (2.24 MB, patch)
2016-06-27 20:21 UTC, Carl Love
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carl Love 2016-06-02 20:10:11 UTC
The IBM ISA 3.0 has been released. This bugzilla is the fourth in a series of five bugzillas for adding the needed ISA 3.0 support to Valgrind. The series of bugzillas are: 359767, 361207, 362329.

Reproducible: Always
Comment 1 Carl Love 2016-06-02 20:12:25 UTC
Created attachment 99327 [details]
Patch 4 of 5 to add VEX support for the POWER ISA 3.0 instructions

Patch 4 of 5 to add VEX support for the POWER ISA 3.0 instructions This patch add support to emulate more of the ISA 3.0 instruction in Valgrind.
Comment 2 Carl Love 2016-06-02 20:20:45 UTC
Created attachment 99328 [details]
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions

Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions This patch adds test suite support for the instructions added in the first patch.

Note, I had to manually chop this patch into multiple pieces due to the limit on the size of an attachment.  This part has the test_isa_3.0.c changes and the first part of the expected results.
Comment 3 Carl Love 2016-06-02 20:25:18 UTC
Created attachment 99329 [details]
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions, part 2

Second part of the testsuite patch.  More of the expected output.
Comment 4 Carl Love 2016-06-02 20:27:48 UTC
Created attachment 99330 [details]
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions, part 3

final part of the testsuite patch expected results. 

The vgtest file change is at the very end of this part.
Comment 5 Julian Seward 2016-06-22 15:38:00 UTC
(In reply to Carl Love from comment #1)
> Created attachment 99327 [details]
> Patch 4 of 5 to add VEX support for the POWER ISA 3.0 instructions

Main comments are:

* does this have any performance effect on plain boring integer
  code?  I ask because it seems like it might affect the IR generated
  for CR321 and CR0, which are used a lot for comparisons, but I am
  not sure.

  "make perf", then
    perf/vg_perf --vg=/path/to/old/tree --vg=/path/to/new/tree --reps=5
  gives you some idea

* I am somewhat concerned that some of these translations are
  going to generate a huge amount of IR and hence a huge amount of
  code at the back end, and I wonder if the more complex ones
  would be better done with C helper functions.

  Can you trigger any JIT assertion failures with worst-case code-size
  settings, by running all the test cases with
  --tool=memcheck --track-origins=yes --vex-iropt-level=0
  ?


* And some minor things below.


VEX/priv/guest_ppc_toIR.c

You swapped putCR321 and putCR0 ?  Any actual change?  Is confusing.
Please un-swap.


+static IRExpr * CmpGT128U (IRExpr *src1, IRExpr *src2)
+   /* Do and unsigend compare of two 128-bit values */

Fix typos


+static IRTemp increment_BCDstring (IRExpr *src, IRExpr *carry_in)
This seems like it will generate a huge amount of IR.  Is it
something you can do in a helper function, in the style of
amd64g_dirtyhelper_AESKEYGENASSIST ?


+static Bool dis_int_misc ( UInt theInstr )
+       * to be suspende.  Instruction fetching and execution are resumed
suspende += d

Un-indent the default: case one level.  It is misleadingly indented.


+   case 0x12D: // lxvll (Load VSX Vector Left-Justified with Length XX1 form)
Without looking at the context I can't tell, but .. does this need "gating"
(ie is only allowed for some hardware variants) ?

+      /* nb_zero is 0xFF..FF if the nb_field = 0 */
+      assign( nb_zero, binop( Iop_64HLtoV128,
+                              unop( Iop_1Sto64,
+                                    binop( Iop_CmpEQ64,
+                                           mkexpr( nb ),
+                                           mkU64( 0 ) ) ),
+                              unop( Iop_1Sto64,
+                                    binop( Iop_CmpEQ64,
+                                           mkexpr( nb ),
+                                           mkU64( 0 ) ) ) ) );

Lift the duplicated 1Sto64(CmpEQ64, nb, 0) term onto its own IRTemp
and use that.  Or maybe we should enable CSE by default in iropt for
POWER?  That has a JIT-time cost, though.


+   case 0x1AD: // stxvll (Store VSX Vector Left-justified with length XX1-form)
Comments as for lxvll


+static Bool dis_av_bcd_misc ( UInt theInstr )
+   switch (opc2) {
+   case 0x341: // bcdcpsgn. Decimal Copy Sign VX-form
+            {
Please fix indentation to match house style
Comment 6 Julian Seward 2016-06-22 15:44:24 UTC
The test cases look OK to me.

One thing: did you check this tree still builds/runs fine on older hardware?
eg, at least on Power5 64 and 32-bit and preferably some older 32-bit too?
Comment 7 Carl Love 2016-06-27 20:00:07 UTC
I ran the perfomance tests comparing the source tree with patch 3 and the source tree with patch 4.  The results were run for 10 repetitions and 5 repetitions.  The results for the two are very consistent.  The 10 repetition results are as follows:

-- Running  tests in perf ----------------------------------------------                      
-- bigcode1 --                                                                                
bigcode1 valgrind-patch3:0.19s  no: 2.1s (11.0x, -----)  me: 4.4s (23.0x, -----)              
bigcode1 valgrind-patch4:0.19s  no: 2.1s (10.9x,  1.0%)  me: 4.3s (22.9x,  0.5%)              
-- bigcode2 --                                                                                
bigcode2 valgrind-patch3:0.19s  no: 4.7s (24.6x, -----)  me:10.7s (56.4x, -----)              
bigcode2 valgrind-patch4:0.19s  no: 4.7s (24.6x,  0.0%)  me:10.7s (56.4x,  0.1%)              
-- bz2 --                                                                                     
bz2      valgrind-patch3:0.57s  no: 3.4s ( 5.9x, -----)  me: 8.6s (15.1x, -----)              
bz2      valgrind-patch4:0.57s  no: 3.4s ( 6.0x, -0.9%)  me: 8.6s (15.1x, -0.2%)              
-- fbench --                                                                                  
fbench   valgrind-patch3:0.25s  no: 1.7s ( 6.7x, -----)  me: 4.3s (17.3x, -----)              
fbench   valgrind-patch4:0.25s  no: 1.7s ( 6.8x, -1.8%)  me: 4.3s (17.3x, -0.2%)              
-- ffbench --                                                                                 
ffbench  valgrind-patch3:0.19s  no: 0.7s ( 3.6x, -----)  me: 1.8s ( 9.6x, -----)              
ffbench  valgrind-patch4:0.19s  no: 0.7s ( 3.6x,  0.0%)  me: 1.8s ( 9.6x,  0.5%)              
-- heap --                                                                                    
heap     valgrind-patch3:0.25s  no: 1.7s ( 6.9x, -----)  me: 7.5s (29.9x, -----)              
heap     valgrind-patch4:0.25s  no: 1.7s ( 6.8x,  1.2%)  me: 7.5s (29.8x,  0.4%)              
-- heap_pdb4 --                                                                               
heap_pdb4 valgrind-patch3:0.29s  no: 1.8s ( 6.3x, -----)  me:10.8s (37.2x, -----)             
heap_pdb4 valgrind-patch4:0.29s  no: 1.8s ( 6.3x,  0.0%)  me:10.8s (37.3x, -0.1%)             
-- many-loss-records --                                                                       
many-loss-records valgrind-patch3:0.03s  no: 0.4s (13.3x, -----)  me: 1.6s (55.0x, -----)     
many-loss-records valgrind-patch4:0.03s  no: 0.4s (13.7x, -2.5%)  me: 1.7s (55.3x, -0.6%)     
-- many-xpts --                                                                               
many-xpts valgrind-patch3:0.05s  no: 0.6s (11.4x, -----)  me: 2.2s (44.0x, -----)             
many-xpts valgrind-patch4:0.05s  no: 0.6s (11.4x,  0.0%)  me: 2.2s (43.6x,  0.9%)             
-- memrw --                                                                                   
memrw    valgrind-patch3:0.05s  no: 0.8s (17.0x, -----)  me: 1.9s (39.0x, -----)              
memrw    valgrind-patch4:0.05s  no: 0.8s (16.8x,  1.2%)  me: 2.0s (39.2x, -0.5%)              
-- sarp --                                                                                    
sarp     valgrind-patch3:0.02s  no: 0.3s (16.0x, -----)  me: 2.5s (125.5x, -----)             
sarp     valgrind-patch4:0.02s  no: 0.3s (16.0x,  0.0%)  me: 2.5s (125.5x,  0.0%)             
-- tinycc --                                                                                  
tinycc   valgrind-patch3:0.21s  no: 2.3s (11.0x, -----)  me:10.1s (48.2x, -----)              
tinycc   valgrind-patch4:0.21s  no: 2.3s (11.0x,  0.0%)  me:10.3s (49.1x, -1.9%)              
-- Finished tests in perf ----------------------------------------------                      
                                                                                              
== 12 programs, 48 timings =================                                   

I investigated the second comment by running the requested test.  After fixing a bug that was found found with some missing register support, it ran fine.  After some private discussions with Julian, the  is_BCDstring(), increment_BCDstring(), and the support for bcdctz., bcdctn., bcdcfz., bcdcfn., bcdsetsgn.instructions were reimplemented using clean helpers.

The minor comments:.
There is no change in the functions, putCR321 and putCR0.  They were unswapped so there is no change in the patch. 

The duplicated code was replaced with a temp.

The typos were fixed.

Note, it was found that one of the instructions in the test suite was missing.  The test suite was updated.

The VEX support patch and test suite patch have been updated.
Comment 8 Carl Love 2016-06-27 20:16:29 UTC
Created attachment 99731 [details]
Patch 4 of 5 to add VEX support for the POWER ISA 3.0 instructions
Comment 9 Carl Love 2016-06-27 20:17:52 UTC
Created attachment 99732 [details]
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions, part 1
Comment 10 Carl Love 2016-06-27 20:19:56 UTC
Created attachment 99733 [details]
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions, part 2
Comment 11 Carl Love 2016-06-27 20:21:03 UTC
Created attachment 99734 [details]
Patch 4 of 5 to add testsuite support for the POWER ISA 3.0 instructions, part 3
Comment 12 Julian Seward 2016-06-29 08:16:32 UTC
All looks ok to me.  OK to land.  Thanks for the fixings.
Comment 13 Carl Love 2016-06-29 18:26:44 UTC
Patches committed

VEX commit 3222
valgrind commit 15896

The patches were tested on 64-bit Power 6, Power 7, Power 8 big endian and Power 8 little endian machines.  The regression tests showed no new test failures on these systems.