Bug 334834

Summary: PPC64 Little Endian support, patch 2
Product: [Developer tools] valgrind Reporter: Carl Love <cel>
Component: generalAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: normal CC: dejanjevtic87, mark, maynardj, mips32r2, philippe.waroquiers
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: PPC64 LE support patch, second patch in series
PPC64 LE support patch, second patch in series, updated
PPC64 LE support patch, second patch in series, update 2
Updated PPC64 LE support patch
PPC64 LE support patch, second patch in series, updated to fix a couple of typos
PPC64 LE support patch, second patch in series, updated with dynamic VEX Endian support
Updated PPC64 LE support patch to address feedback from Mark Wielaard
Updated PPC64 LE support patch to work with Julian's Endian patches
Updated PPC64 LE support patch to work with Julian's Endian patches

Description Carl Love 2014-05-15 19:28:10 UTC
This is the second patch in the series for adding Little Endian support for Valgrind.  It is dependent on the first patch in bugzilla 334384.  This patch has the meat of the changes.  Note, IBM changed the ABI when they added Little Endian support to Power.  This patch includes the LE specific code changes and the changes needed for the ABI change.  The LE ABI changes allow for a local and a global entry point to functions.

The third patch contains the test case changes.

Reproducible: Always
Comment 1 Carl Love 2014-05-15 19:33:11 UTC
Created attachment 86656 [details]
PPC64 LE support patch, second patch in series

Attached the patch.
Comment 2 Maynard Johnson 2014-05-15 21:23:15 UTC
acked.  I reviewed and tested this patch.
Comment 3 Philippe Waroquiers 2014-05-16 23:15:12 UTC
Quickly scanned the patch, skipped most of it (lack of ppc64 knowledge).

Two feedbacks below

diff --git a/coregrind/m_debuginfo/priv_storage.h b/coregrind/m_debuginfo/priv_storage.h
index 200949d..bcdcd0b 100644
--- a/coregrind/m_debuginfo/priv_storage.h
+++ b/coregrind/m_debuginfo/priv_storage.h
@@ -71,6 +71,7 @@ typedef
    struct { 
       Addr    addr;    /* lowest address of entity */
       Addr    tocptr;  /* ppc64-linux only: value that R2 should have */
+      Addr    second_ep; /* address for secondary entry point, ppc64le */
       HChar*  pri_name;  /* primary name, never NULL */
       HChar** sec_names; /* NULL, or a NULL term'd array of other names */
       // XXX: this could be shrunk (on 32-bit platforms) by using 30
@@ -265,7 +266,7 @@ typedef
       Int   x29_off;
    }
    DiCfSI;
This is (I guess) a memory critical data structure, e.g. seeing the comment
telling we could shrink it on 32 bits. We now have 2 Addr (tocptr and second_ep)
only used for ppc (BE and/or LE).
I am wondering if we could not avoid this ? Maybe by using a variable size
array at the end such as 
    Addr more_addr[]; // on ppc64BE, [0] is the tocptr. On ppc64LE; [1] is the second_ep.
and have setter/getters that do nothing on all platforms, except ppc64.
This is somewhat of a hack, not very nice, not sure it is worth or not.
A similar thing was done in memcheck mc_include.h struct _MC_Chunk
which is a memory critical structure.

diff --git a/coregrind/vgdb-invoker-ptrace.c b/coregrind/vgdb-invoker-ptrace.c
index b0e49a1..c8a775d 100644
--- a/coregrind/vgdb-invoker-ptrace.c
+++ b/coregrind/vgdb-invoker-ptrace.c
@@ -675,7 +675,7 @@ void restore_and_detach (pid_t pid)
    }
    if (signal_queue)
       ERROR (0, "One or more signals queued were not delivered. "
-             "First signal: %d\n", signal_queue);
+             "First signal ptr: %p\n", signal_queue);

Oops, that %d to print signal_queue is wrong. Nice catch.
However, the idea is really the print the first not delivered
signal. So, should be signal_queue[0].si_signo
Comment 4 Carl Love 2014-05-22 23:54:48 UTC
Created attachment 86785 [details]
PPC64 LE support patch, second patch in series, updated

This patch has the vgdb printing issue removed as that has now been fixed in a separate patch.  This patch does include a fix for the gdb testsuite that was causing some of the gdb tests to hang.  No changes have yet been made regarding the first comment from Philippe about the address of the second entry point.
Comment 5 Carl Love 2014-06-03 16:34:44 UTC
Created attachment 86995 [details]
PPC64 LE support patch, second patch in series, update 2

updated the gdb server code in coregrind/vgdb-invoker-ptrace.c, function  invoker_invoke_gdbserver() based on internal review by Ulrich.  

coregrind/m_debuginfo/priv_storage.h, put 
Addr    second_ep; /* address for secondary entry point, ppc64le */
into a #define for PPC64LE only to address comment by Phillippe

Patch cleanly applies to the 6/2/14 source tree.
Comment 6 Julian Seward 2014-06-09 09:49:19 UTC
This is the first time we've looked in detail at the problem of
cleanly integrating a big-and-little endian port of the same arch into
the tree.  The patches are definitely along the right lines, but the
existing infrastructure makes such big/little endian stuff a bit
awkward, so some adjustment of both the existing framework and the
patches are in order.

I identified two main areas of concern, and some smaller comments.


(1) Inside the VEX/ tree only: the use of 

    +#if defined(VGP_ppc64le_linux)
    +#define IENDIANESS   Iend_LE

    You may have noticed that VEX is relatively separate from the rest
    of the tree.  There are people who use VEX to unpick bits of
    machine code into IR and then do static analysis on it.  The
    effect of putting "#if defined(VGP_ppc64le_linux)" is to hardwire
    the assumption that the host and guest endiannesses are the same.
    This makes it impossible, for example, to write a program that
    uses VEX to unpick a ppc64le binary, that will work correctly
    on a ppc64be target.  Generally, VEX does not assume that the
    guest (the architecture it is analysing code for) and the host
    (on which it is running) are the same.  I would like to preserve
    that.

    Doing this cleanly will require modifying the framework a bit.
    disInstr_PPC (and all the other disInstr_ fns) take an argument
    which tells them whether the host is bigendian.  What we need to
    do is to pass in another Bool which says whether the guest is
    bigendian.  Then, every place that you currently use the
    ifdef-defined endianness in guest_ppc_toIR.c, use this new 
    guest_is_bigendian parameter instead.

    In the backend (host_ppc_toIR.c), use the host_is_bigendian
    that I think is plumbed in.

    In guest_ppc_helpers.c you introduce a big/little distinction
    for ppc64g_dirtyhelper_LVS.  We'll need to figure out whether
    that distinction is for the guest or the host, and adjust it
    accordingly.


(2) Ambiguity about the new meaning of unadorned "ppc64" / "PPC64"
    bits of text and file names.

    Now that "ppc64le" / "PPC64LE" definitely mean "the LE variant of
    64 bit PowerPC", we need to have a clear definition of what the
    old "ppc64" / "PPC64" mean.  In places the patches read as if it
    means "the BE variant of 64 bit PowerPC" but in other places they
    read as if it means "both LE and BE variants of 64 bit PowerPC".

    For example, this

     AM_CONDITIONAL(VGCONF_ARCHS_INCLUDE_PPC64, 
    -               test x$VGCONF_PLATFORM_PRI_CAPS = xPPC64_LINUX )
    +               test x$VGCONF_PLATFORM_PRI_CAPS = xPPC64_LINUX \
    +                -o x$VGCONF_PLATFORM_PRI_CAPS = xPPC64LE_LINUX )
    
    implies that VGCONF_ARCHS_INCLUDE_PPC64 means "either BE or LE"
    but the test itself
       test x$VGCONF_PLATFORM_PRI_CAPS = xPPC64_LINUX
         -o x$VGCONF_PLATFORM_PRI_CAPS = xPPC64LE_LINUX 
    implies that the plain "PPC64" means "BE only".

    Similarly

    -#if defined(VGP_ppc64_linux)
    +#if defined(VGP_ppc64_linux) || defined(VGP_ppc64le_linux)

    implies that "ppc64" means "BE only", but in other places (eg)
    dispatch-ppc64-linux.S, the "ppc64" clearly is intended to apply
    to both BE and LE variants.

    What I would prefer, to fix this cleanly, is:
    * if a file, macro, variable is BE only, call it *ppc64be*
    * if a file, macro, variable is LE only, call it *ppc64le*
    * if a file, macro, variable is both LE and BE, call it *ppc64*

    This will require quite a bit of file renaming and editing, so
    we'd need to coordinate that beforehand.

=================================================================

Smaller comments:

Some of these are better explained by the comments above.

The implementation patch appears to incorporate r2853, which
is confusing.  Did you create the patch using "svn diff -rHEAD" ?

---------

--- a/VEX/priv/guest_ppc_helpers.c
+++ b/VEX/priv/guest_ppc_helpers.c
@@ -45,6 +45,12 @@
 #include "guest_generic_bb_to_IR.h"
 #include "guest_ppc_defs.h"
 
+#if defined(VGP_ppc64le_linux)
+#define IENDIANESS   Iend_LE
+#else
+#define IENDIANESS   Iend_BE
+#endif
+

Please see comments above.

---------

@@ -530,7 +594,7 @@ static void storeBE ( IRExpr* addr, IRExpr* data )
 {
    IRType tyA = typeOfIRExpr(irsb->tyenv, addr);
    vassert(tyA == Ity_I32 || tyA == Ity_I64);
-   stmt( IRStmt_Store(Iend_BE, addr, data) );
+   stmt( IRStmt_Store(IENDIANESS, addr, data) );
 }
 
 static IRExpr* unop ( IROp op, IRExpr* a )
@@ -588,7 +652,7 @@ static IRExpr* mkV128 ( UShort i )
 /* This generates a normal (non load-linked) load. */
 static IRExpr* loadBE ( IRType ty, IRExpr* addr )
 {
-   return IRExpr_Load(Iend_BE, ty, addr);
+   return IRExpr_Load(IENDIANESS, ty, addr);
 }
 
This change makes the names loadBE and storeBE misleading.  

---------

@@ -17806,10 +17952,9 @@ static Bool dis_av_fp_arith ( UInt theInstr )
       DIP("vnmsubfp v%d,v%d,v%d,v%d\n",
           vD_addr, vA_addr, vC_addr, vB_addr);
       putVReg( vD_addr,
-               triop(Iop_Sub32Fx4, mkU32(Irrm_NEAREST),
+               triop(Iop_Sub32Fx4, rm,
                      mkexpr(vB),
-                     triop(Iop_Mul32Fx4, mkU32(Irrm_NEAREST),
-                           mkexpr(vA), mkexpr(vC))) );
+                     triop(Iop_Mul32Fx4, rm, mkexpr(vA), mkexpr(vC))) );
       return True;
    }
 
I'm confused.  This patch seems to contain elements of r2853, in which
I audited the uses of Iop_{Add,Sub,Mul}32Fx4 in the ppc front end.

---------

 AM_CONDITIONAL(VGCONF_ARCHS_INCLUDE_PPC64, 
-               test x$VGCONF_PLATFORM_PRI_CAPS = xPPC64_LINUX )
+               test x$VGCONF_PLATFORM_PRI_CAPS = xPPC64_LINUX \
+                -o x$VGCONF_PLATFORM_PRI_CAPS = xPPC64LE_LINUX )

This doesn't seem right to me.  In effect, the descriptions
"ppc64_*" and "ppc32_*" now refer to the BE variant only.  But
this conditional causes VGCONF_ARCHS_INCLUDE_PPC64 (implicitly, BE)
to become true even on LE platforms.

I think you need a whole new VGCONF_ARCHS_INCLUDE_PPC64LE to
parallel the implicit-BE VGCONF_ARCHS_INCLUDE_PPC64 version.

---------

--- a/coregrind/m_debuginfo/debuginfo.c
+++ b/coregrind/m_debuginfo/debuginfo.c
@@ -1,4 +1,5 @@
 
+

nit: Pls no whitespace changes.

---------

-#           elif defined(VGA_ppc32) || defined(VGA_ppc64)
+#           elif defined(VGA_ppc32) \
+                || defined(VGA_ppc64) || defined(VGA_ppc64le)

nit: push || one space to the right, so it's underneath "defined" above

---------

@@ -3862,6 +3864,7 @@ void VG_(DebugInfo_syms_getidx) ( const DebugInfo *si,
                                         Int idx,
                                   /*OUT*/Addr*    avma,
                                   /*OUT*/Addr*    tocptr,
+                                  /*OUT*/Addr*    second_ep,
                                   /*OUT*/UInt*    size,
                                   /*OUT*/HChar**  pri_name,
                                   /*OUT*/HChar*** sec_names,
@@ -3871,6 +3874,9 @@ void VG_(DebugInfo_syms_getidx) ( const DebugInfo *si,
    vg_assert(idx >= 0 && idx < si->symtab_used);
    if (avma)      *avma      = si->symtab[idx].addr;
    if (tocptr)    *tocptr    = si->symtab[idx].tocptr;
+#if defined(VGP_ppc64le_linux)
+   if (second_ep) *second_ep = si->symtab[idx].second_ep;
+#endif 
    if (size)      *size      = si->symtab[idx].size;
    if (pri_name)  *pri_name  = si->symtab[idx].pri_name;
    if (sec_names) *sec_names = (HChar **)si->symtab[idx].sec_names; // FIXME

If second_ep is ifdef'd in the body of the function, also ifdef it in
the formal parameters.  Or (better) remove the ifdef in the code.  The
situation I want to avoid is where someone mistakenly looks at the
returned value on a non-ppc64le platform and gets undefined garbage
as a result.

Overall I would like that any (mistaken) use of second_ep values
on non-ppc64le targets gets a defined zero rather than undefined junk.

---------

-   "normal" case ({x86,amd64,ppc32,arm,mips32,mips64}-linux). */
+   "normal" case ({x86,amd64,ppc32,arm,mips32,mips64}-linux, ppc64le). */

If ppc64le is "normal" here, then put it inside the { } list.

---------

Again, using the ppc64-* facilities for ppc64le means you are
effectively creating ambiguity in the meaning of the unadorned
ppc64 name: is it "be" or "both be and le" ?

What is _CALL_ELF ?  Please add some comment explaining it.

--- a/coregrind/m_dispatch/dispatch-ppc64-linux.S
+++ b/coregrind/m_dispatch/dispatch-ppc64-linux.S
@@ -28,7 +28,7 @@
   The GNU General Public License is contained in the file COPYING.
 */
 
-#if defined(VGP_ppc64_linux)
+#if defined(VGP_ppc64_linux) || defined(VGP_ppc64le_linux)
 
 #include "pub_core_basics_asm.h"
 #include "pub_core_dispatch_asm.h"
@@ -74,14 +74,26 @@ void VG_(disp_run_translations)( UWord* two_words,
 .section ".text"
 .align   2
 .globl   VG_(disp_run_translations)
+#if !defined VGP_ppc64_linux || _CALL_ELF == 2
+.type VG_(disp_run_translations),@function
+VG_(disp_run_translations):
+.type    .VG_(disp_run_translations),@function
+#else

---------

--- a/coregrind/m_libcsetjmp.c
+++ b/coregrind/m_libcsetjmp.c
@@ -29,7 +29,6 @@
 
 /* Contributed by Julian Seward <jseward@acm.org> */
 
-

nit: whitespace change
Comment 7 Carl Love 2014-06-20 17:11:38 UTC
Created attachment 87313 [details]
Updated PPC64 LE support patch

This is an updated function patch.  The patch is still a work in progress.  So far, just the naming of the #defines and functions have been addressed in Julian's comments.  The goal is to agree that the naming is all consistent and correct before tackling the other issues Julian noted in the previous patch.  The claim is that ppc64/PPC64 refers to cod that are common for BE and LE.  ppc64le/PPC64LE is for LE specific code and ppc64be and PPC64BE is for BE specific code.
Comment 8 Carl Love 2014-06-25 18:28:31 UTC
Created attachment 87401 [details]
PPC64 LE support patch, second patch in series, updated to fix a couple of typos
Comment 9 Carl Love 2014-06-27 20:01:29 UTC
Created attachment 87435 [details]
PPC64 LE support patch, second patch in series, updated with dynamic VEX Endian support

The patch has been updated to include the naming changes, the LE functional changes and the changes to needed to have the VEX code dynamically select LE or BE per Julian's previous comments.
Comment 10 Julian Seward 2014-07-07 21:11:37 UTC
(In reply to comment #9)
> Created attachment 87435 [details]


Overall it's much improved from the previous version.  There are
various small points listed below.  The primary remaining concern is
that the removal of ifdefs from VEX is only partially done.  The front
end (guest_ppc_toIR.c) is now done dynamically using the passed-in
is-the-guest-bigendian flag, but the back end (host_ppc_toIR.c and
host_ppc_defs.c) still uses ifdefs instead of passing in an
is-the-host-bigendian flag.  This can be done by, in
LibVEX_Translate(), passing host_is_bigendian to iselSB() and emit().


+++ b/VEX/priv/guest_ppc_helpers.c
void ppc64g_dirtyhelper_LVS ( VexGuestPPC64State* gst,
+                              Bool guest_is_BE )
and
+  if (!guest_is_BE) {

I'm not confident that the ppc64 back end won't pass junk in the
higher parts of the argument register for |guest_is_BE|, that might
confuse the comparison.  I think it would be safer to declare
guest_is_BE as a UInt, make sure that the front end (guest_ppc_toIR.c)
passes only 1 or 0 to it, and "& 1" the argument in
ppc64g_dirtyhelper_LVS before using it.  (Probably totally paranoid
overkill, I know ..)


-static UInt getUIntPPCendianly ( UChar* p )
+static UInt getUIntPPCendianly ( UChar* p, Bool guest_is_BE )
It's a silly name (always was).  Let's rename it to getPPCInstruction.


/* Floating point egisters are mapped to VSX registers[0..31]. */
Can we add back the missing 'r'?


VEX/priv/host_ppc_defs.c
+/* Emit an instruction ppc-endianly */
 static UChar* emit32 ( UChar* p, UInt w32 )
+#if defined(VGP_ppc32_linux) || defined(VGP_ppc64be_linux)

This should be done by plumbing host_is_BE to this point and doing
a run time test.  My aim in this is to get rid of all of the 
target #ifdefs that we can from VEX/.


VEX/priv/host_ppc_isel.c
Same all throughout here.


+++ b/coregrind/launcher-darwin.c
+   { CPU_TYPE_POWERPC64LE, "ppc64le", "ppc64le" },
I seriously doubt there will be a ppc64le version of MacOSX in the
future, so I think this isn't necessary.


coregrind/m_debuginfo changes:
You have a number of places where the second entry point is talked
about, eg
+      Addr    second_ep; /* address for secondary entry point, ppc64le */
It's not clear (to me, at least) from "second" whether this is the
slow- or fast- entry point.  Can you choose a different word that
makes it clearer to the reader which it is?  perhaps one of
(slow/fast) or (global/local).  From reading further down, perhaps
global/local is the better choice.


coregrind/m_dispatch/dispatch-ppc64-linux.S
The ifdeffery added to this file makes it quite hard to read.  I would
prefer that you make a new file, dispatch-ppc64le-linux.S, and rename
the existing one to dispatch-ppc64be-linux.S.


+#elif defined(VGP_ppc64le_linux)
+__asm__(
+".section \".toc\",\"aw\""          "\n"
+
Do you actually need the .toc section line there?


coregrind/m_redir.c
Same comments as above about second_ep -- I'd prefer if you used
local or global.


coregrind/m_syswrap/syscall-ppc64-linux.S
Like with dispatch-ppc64-linux.S, please make a new file with
the le stuff in.
Comment 11 Carl Love 2014-07-21 20:34:21 UTC
Created attachment 87865 [details]
Updated PPC64 LE support patch to address feedback from Mark Wielaard

Patch updated to address comments from Mark.  Note, the comments are being addressed in this update but the patch will be updated again once the dynamic support for the Endianess is done per Julian's comments.  Just wanted to get these changes incorporated.

> The memcheck/tests/atomic_incs.c tests/Makefile.am tests/check_isa-2_06_cap
tests/check_isa-2_07_cap tests/is_ppc64_BE.c tests/power_insn_available.c
changes already came in the earlier 2 patches.
  
Moved to third patch

> The ifunc_wrapper in coregrind/vg_preloaded.c which is new in svn. Should
that use #if defined(VGP_ppc64be_linux) or is there a better way to determine
when function descriptors are used?

For the moment, yes the #define is the best.  There is work going on to update this patch to make the Endianess dynamic.

> none/tests/ppc32/jm-insns.c got adds a #if defined(VGP_ppc64_linux) #elif
defined(VGP_ppc64le_linux) pair. The first should be #if
defined(VGP_ppc64be_linux)

Fixed
Comment 12 Carl Love 2014-07-29 19:24:40 UTC
Created attachment 88018 [details]
Updated PPC64 LE support patch to work with Julian's Endian patches

The patch was reworked to work with the latest trunk changes to support BE and LE.  Additionally, the comments from Julian on the previous version of the patch were addressed.
Comment 13 Carl Love 2014-08-07 22:48:21 UTC
Created attachment 88153 [details]
Updated PPC64 LE support patch to work with Julian's Endian patches

Fixed a couple of issues found by Julian during his review.  One of the issues would have caused issues on non PPC64 systems.  Fixes were in file coregrind/m_ume/elf.c

Patch was also updated to the latest Valgrind code base as of Aug 7, 2014 in preparation for comitting.
Comment 14 Carl Love 2014-08-07 23:37:17 UTC
Committed, Valgrind revision 14239, VEX revision 2914
Comment 15 Philippe Waroquiers 2014-08-08 19:58:06 UTC
It looks like there is a very big regression in performance on ppc64
(factor 10 for some tools).
See nightly perf results
http://sourceforge.net/p/valgrind/mailman/message/32695000/

Probably due to the ppc64 patches, as only ppc64 perf seems impacted.
Comment 16 Philippe Waroquiers 2014-08-08 19:59:31 UTC
Also a new ppc32 failure appeared:
+ none/tests/ppc32/round                   (stdout)
Comment 17 Carl Love 2014-08-08 22:35:17 UTC
VEX commit 2915 fixes a compiler warning about "dres.continueAt" being uninitialized in function disInstr_PPC ().  The new if test for non 64 bit systems in LE mode did not set dres.continueAt to zero.  

Valgrind commit 14246, fixes an issue with the Makefile.all.am where the -02 and -m64 flags were not passed to compile the PPC64 VEX code.  This resulted in a significant performance regression.  The commit fixes passing the needed compile line arguments.