This is the second patch in the series for adding Little Endian support for Valgrind. It is dependent on the first patch in bugzilla 334384. This patch has the meat of the changes. Note, IBM changed the ABI when they added Little Endian support to Power. This patch includes the LE specific code changes and the changes needed for the ABI change. The LE ABI changes allow for a local and a global entry point to functions. The third patch contains the test case changes. Reproducible: Always
Created attachment 86656 [details] PPC64 LE support patch, second patch in series Attached the patch.
acked. I reviewed and tested this patch.
Quickly scanned the patch, skipped most of it (lack of ppc64 knowledge). Two feedbacks below diff --git a/coregrind/m_debuginfo/priv_storage.h b/coregrind/m_debuginfo/priv_storage.h index 200949d..bcdcd0b 100644 --- a/coregrind/m_debuginfo/priv_storage.h +++ b/coregrind/m_debuginfo/priv_storage.h @@ -71,6 +71,7 @@ typedef struct { Addr addr; /* lowest address of entity */ Addr tocptr; /* ppc64-linux only: value that R2 should have */ + Addr second_ep; /* address for secondary entry point, ppc64le */ HChar* pri_name; /* primary name, never NULL */ HChar** sec_names; /* NULL, or a NULL term'd array of other names */ // XXX: this could be shrunk (on 32-bit platforms) by using 30 @@ -265,7 +266,7 @@ typedef Int x29_off; } DiCfSI; This is (I guess) a memory critical data structure, e.g. seeing the comment telling we could shrink it on 32 bits. We now have 2 Addr (tocptr and second_ep) only used for ppc (BE and/or LE). I am wondering if we could not avoid this ? Maybe by using a variable size array at the end such as Addr more_addr[]; // on ppc64BE, [0] is the tocptr. On ppc64LE; [1] is the second_ep. and have setter/getters that do nothing on all platforms, except ppc64. This is somewhat of a hack, not very nice, not sure it is worth or not. A similar thing was done in memcheck mc_include.h struct _MC_Chunk which is a memory critical structure. diff --git a/coregrind/vgdb-invoker-ptrace.c b/coregrind/vgdb-invoker-ptrace.c index b0e49a1..c8a775d 100644 --- a/coregrind/vgdb-invoker-ptrace.c +++ b/coregrind/vgdb-invoker-ptrace.c @@ -675,7 +675,7 @@ void restore_and_detach (pid_t pid) } if (signal_queue) ERROR (0, "One or more signals queued were not delivered. " - "First signal: %d\n", signal_queue); + "First signal ptr: %p\n", signal_queue); Oops, that %d to print signal_queue is wrong. Nice catch. However, the idea is really the print the first not delivered signal. So, should be signal_queue[0].si_signo
Created attachment 86785 [details] PPC64 LE support patch, second patch in series, updated This patch has the vgdb printing issue removed as that has now been fixed in a separate patch. This patch does include a fix for the gdb testsuite that was causing some of the gdb tests to hang. No changes have yet been made regarding the first comment from Philippe about the address of the second entry point.
Created attachment 86995 [details] PPC64 LE support patch, second patch in series, update 2 updated the gdb server code in coregrind/vgdb-invoker-ptrace.c, function invoker_invoke_gdbserver() based on internal review by Ulrich. coregrind/m_debuginfo/priv_storage.h, put Addr second_ep; /* address for secondary entry point, ppc64le */ into a #define for PPC64LE only to address comment by Phillippe Patch cleanly applies to the 6/2/14 source tree.
This is the first time we've looked in detail at the problem of cleanly integrating a big-and-little endian port of the same arch into the tree. The patches are definitely along the right lines, but the existing infrastructure makes such big/little endian stuff a bit awkward, so some adjustment of both the existing framework and the patches are in order. I identified two main areas of concern, and some smaller comments. (1) Inside the VEX/ tree only: the use of +#if defined(VGP_ppc64le_linux) +#define IENDIANESS Iend_LE You may have noticed that VEX is relatively separate from the rest of the tree. There are people who use VEX to unpick bits of machine code into IR and then do static analysis on it. The effect of putting "#if defined(VGP_ppc64le_linux)" is to hardwire the assumption that the host and guest endiannesses are the same. This makes it impossible, for example, to write a program that uses VEX to unpick a ppc64le binary, that will work correctly on a ppc64be target. Generally, VEX does not assume that the guest (the architecture it is analysing code for) and the host (on which it is running) are the same. I would like to preserve that. Doing this cleanly will require modifying the framework a bit. disInstr_PPC (and all the other disInstr_ fns) take an argument which tells them whether the host is bigendian. What we need to do is to pass in another Bool which says whether the guest is bigendian. Then, every place that you currently use the ifdef-defined endianness in guest_ppc_toIR.c, use this new guest_is_bigendian parameter instead. In the backend (host_ppc_toIR.c), use the host_is_bigendian that I think is plumbed in. In guest_ppc_helpers.c you introduce a big/little distinction for ppc64g_dirtyhelper_LVS. We'll need to figure out whether that distinction is for the guest or the host, and adjust it accordingly. (2) Ambiguity about the new meaning of unadorned "ppc64" / "PPC64" bits of text and file names. Now that "ppc64le" / "PPC64LE" definitely mean "the LE variant of 64 bit PowerPC", we need to have a clear definition of what the old "ppc64" / "PPC64" mean. In places the patches read as if it means "the BE variant of 64 bit PowerPC" but in other places they read as if it means "both LE and BE variants of 64 bit PowerPC". For example, this AM_CONDITIONAL(VGCONF_ARCHS_INCLUDE_PPC64, - test x$VGCONF_PLATFORM_PRI_CAPS = xPPC64_LINUX ) + test x$VGCONF_PLATFORM_PRI_CAPS = xPPC64_LINUX \ + -o x$VGCONF_PLATFORM_PRI_CAPS = xPPC64LE_LINUX ) implies that VGCONF_ARCHS_INCLUDE_PPC64 means "either BE or LE" but the test itself test x$VGCONF_PLATFORM_PRI_CAPS = xPPC64_LINUX -o x$VGCONF_PLATFORM_PRI_CAPS = xPPC64LE_LINUX implies that the plain "PPC64" means "BE only". Similarly -#if defined(VGP_ppc64_linux) +#if defined(VGP_ppc64_linux) || defined(VGP_ppc64le_linux) implies that "ppc64" means "BE only", but in other places (eg) dispatch-ppc64-linux.S, the "ppc64" clearly is intended to apply to both BE and LE variants. What I would prefer, to fix this cleanly, is: * if a file, macro, variable is BE only, call it *ppc64be* * if a file, macro, variable is LE only, call it *ppc64le* * if a file, macro, variable is both LE and BE, call it *ppc64* This will require quite a bit of file renaming and editing, so we'd need to coordinate that beforehand. ================================================================= Smaller comments: Some of these are better explained by the comments above. The implementation patch appears to incorporate r2853, which is confusing. Did you create the patch using "svn diff -rHEAD" ? --------- --- a/VEX/priv/guest_ppc_helpers.c +++ b/VEX/priv/guest_ppc_helpers.c @@ -45,6 +45,12 @@ #include "guest_generic_bb_to_IR.h" #include "guest_ppc_defs.h" +#if defined(VGP_ppc64le_linux) +#define IENDIANESS Iend_LE +#else +#define IENDIANESS Iend_BE +#endif + Please see comments above. --------- @@ -530,7 +594,7 @@ static void storeBE ( IRExpr* addr, IRExpr* data ) { IRType tyA = typeOfIRExpr(irsb->tyenv, addr); vassert(tyA == Ity_I32 || tyA == Ity_I64); - stmt( IRStmt_Store(Iend_BE, addr, data) ); + stmt( IRStmt_Store(IENDIANESS, addr, data) ); } static IRExpr* unop ( IROp op, IRExpr* a ) @@ -588,7 +652,7 @@ static IRExpr* mkV128 ( UShort i ) /* This generates a normal (non load-linked) load. */ static IRExpr* loadBE ( IRType ty, IRExpr* addr ) { - return IRExpr_Load(Iend_BE, ty, addr); + return IRExpr_Load(IENDIANESS, ty, addr); } This change makes the names loadBE and storeBE misleading. --------- @@ -17806,10 +17952,9 @@ static Bool dis_av_fp_arith ( UInt theInstr ) DIP("vnmsubfp v%d,v%d,v%d,v%d\n", vD_addr, vA_addr, vC_addr, vB_addr); putVReg( vD_addr, - triop(Iop_Sub32Fx4, mkU32(Irrm_NEAREST), + triop(Iop_Sub32Fx4, rm, mkexpr(vB), - triop(Iop_Mul32Fx4, mkU32(Irrm_NEAREST), - mkexpr(vA), mkexpr(vC))) ); + triop(Iop_Mul32Fx4, rm, mkexpr(vA), mkexpr(vC))) ); return True; } I'm confused. This patch seems to contain elements of r2853, in which I audited the uses of Iop_{Add,Sub,Mul}32Fx4 in the ppc front end. --------- AM_CONDITIONAL(VGCONF_ARCHS_INCLUDE_PPC64, - test x$VGCONF_PLATFORM_PRI_CAPS = xPPC64_LINUX ) + test x$VGCONF_PLATFORM_PRI_CAPS = xPPC64_LINUX \ + -o x$VGCONF_PLATFORM_PRI_CAPS = xPPC64LE_LINUX ) This doesn't seem right to me. In effect, the descriptions "ppc64_*" and "ppc32_*" now refer to the BE variant only. But this conditional causes VGCONF_ARCHS_INCLUDE_PPC64 (implicitly, BE) to become true even on LE platforms. I think you need a whole new VGCONF_ARCHS_INCLUDE_PPC64LE to parallel the implicit-BE VGCONF_ARCHS_INCLUDE_PPC64 version. --------- --- a/coregrind/m_debuginfo/debuginfo.c +++ b/coregrind/m_debuginfo/debuginfo.c @@ -1,4 +1,5 @@ + nit: Pls no whitespace changes. --------- -# elif defined(VGA_ppc32) || defined(VGA_ppc64) +# elif defined(VGA_ppc32) \ + || defined(VGA_ppc64) || defined(VGA_ppc64le) nit: push || one space to the right, so it's underneath "defined" above --------- @@ -3862,6 +3864,7 @@ void VG_(DebugInfo_syms_getidx) ( const DebugInfo *si, Int idx, /*OUT*/Addr* avma, /*OUT*/Addr* tocptr, + /*OUT*/Addr* second_ep, /*OUT*/UInt* size, /*OUT*/HChar** pri_name, /*OUT*/HChar*** sec_names, @@ -3871,6 +3874,9 @@ void VG_(DebugInfo_syms_getidx) ( const DebugInfo *si, vg_assert(idx >= 0 && idx < si->symtab_used); if (avma) *avma = si->symtab[idx].addr; if (tocptr) *tocptr = si->symtab[idx].tocptr; +#if defined(VGP_ppc64le_linux) + if (second_ep) *second_ep = si->symtab[idx].second_ep; +#endif if (size) *size = si->symtab[idx].size; if (pri_name) *pri_name = si->symtab[idx].pri_name; if (sec_names) *sec_names = (HChar **)si->symtab[idx].sec_names; // FIXME If second_ep is ifdef'd in the body of the function, also ifdef it in the formal parameters. Or (better) remove the ifdef in the code. The situation I want to avoid is where someone mistakenly looks at the returned value on a non-ppc64le platform and gets undefined garbage as a result. Overall I would like that any (mistaken) use of second_ep values on non-ppc64le targets gets a defined zero rather than undefined junk. --------- - "normal" case ({x86,amd64,ppc32,arm,mips32,mips64}-linux). */ + "normal" case ({x86,amd64,ppc32,arm,mips32,mips64}-linux, ppc64le). */ If ppc64le is "normal" here, then put it inside the { } list. --------- Again, using the ppc64-* facilities for ppc64le means you are effectively creating ambiguity in the meaning of the unadorned ppc64 name: is it "be" or "both be and le" ? What is _CALL_ELF ? Please add some comment explaining it. --- a/coregrind/m_dispatch/dispatch-ppc64-linux.S +++ b/coregrind/m_dispatch/dispatch-ppc64-linux.S @@ -28,7 +28,7 @@ The GNU General Public License is contained in the file COPYING. */ -#if defined(VGP_ppc64_linux) +#if defined(VGP_ppc64_linux) || defined(VGP_ppc64le_linux) #include "pub_core_basics_asm.h" #include "pub_core_dispatch_asm.h" @@ -74,14 +74,26 @@ void VG_(disp_run_translations)( UWord* two_words, .section ".text" .align 2 .globl VG_(disp_run_translations) +#if !defined VGP_ppc64_linux || _CALL_ELF == 2 +.type VG_(disp_run_translations),@function +VG_(disp_run_translations): +.type .VG_(disp_run_translations),@function +#else --------- --- a/coregrind/m_libcsetjmp.c +++ b/coregrind/m_libcsetjmp.c @@ -29,7 +29,6 @@ /* Contributed by Julian Seward <jseward@acm.org> */ - nit: whitespace change
Created attachment 87313 [details] Updated PPC64 LE support patch This is an updated function patch. The patch is still a work in progress. So far, just the naming of the #defines and functions have been addressed in Julian's comments. The goal is to agree that the naming is all consistent and correct before tackling the other issues Julian noted in the previous patch. The claim is that ppc64/PPC64 refers to cod that are common for BE and LE. ppc64le/PPC64LE is for LE specific code and ppc64be and PPC64BE is for BE specific code.
Created attachment 87401 [details] PPC64 LE support patch, second patch in series, updated to fix a couple of typos
Created attachment 87435 [details] PPC64 LE support patch, second patch in series, updated with dynamic VEX Endian support The patch has been updated to include the naming changes, the LE functional changes and the changes to needed to have the VEX code dynamically select LE or BE per Julian's previous comments.
(In reply to comment #9) > Created attachment 87435 [details] Overall it's much improved from the previous version. There are various small points listed below. The primary remaining concern is that the removal of ifdefs from VEX is only partially done. The front end (guest_ppc_toIR.c) is now done dynamically using the passed-in is-the-guest-bigendian flag, but the back end (host_ppc_toIR.c and host_ppc_defs.c) still uses ifdefs instead of passing in an is-the-host-bigendian flag. This can be done by, in LibVEX_Translate(), passing host_is_bigendian to iselSB() and emit(). +++ b/VEX/priv/guest_ppc_helpers.c void ppc64g_dirtyhelper_LVS ( VexGuestPPC64State* gst, + Bool guest_is_BE ) and + if (!guest_is_BE) { I'm not confident that the ppc64 back end won't pass junk in the higher parts of the argument register for |guest_is_BE|, that might confuse the comparison. I think it would be safer to declare guest_is_BE as a UInt, make sure that the front end (guest_ppc_toIR.c) passes only 1 or 0 to it, and "& 1" the argument in ppc64g_dirtyhelper_LVS before using it. (Probably totally paranoid overkill, I know ..) -static UInt getUIntPPCendianly ( UChar* p ) +static UInt getUIntPPCendianly ( UChar* p, Bool guest_is_BE ) It's a silly name (always was). Let's rename it to getPPCInstruction. /* Floating point egisters are mapped to VSX registers[0..31]. */ Can we add back the missing 'r'? VEX/priv/host_ppc_defs.c +/* Emit an instruction ppc-endianly */ static UChar* emit32 ( UChar* p, UInt w32 ) +#if defined(VGP_ppc32_linux) || defined(VGP_ppc64be_linux) This should be done by plumbing host_is_BE to this point and doing a run time test. My aim in this is to get rid of all of the target #ifdefs that we can from VEX/. VEX/priv/host_ppc_isel.c Same all throughout here. +++ b/coregrind/launcher-darwin.c + { CPU_TYPE_POWERPC64LE, "ppc64le", "ppc64le" }, I seriously doubt there will be a ppc64le version of MacOSX in the future, so I think this isn't necessary. coregrind/m_debuginfo changes: You have a number of places where the second entry point is talked about, eg + Addr second_ep; /* address for secondary entry point, ppc64le */ It's not clear (to me, at least) from "second" whether this is the slow- or fast- entry point. Can you choose a different word that makes it clearer to the reader which it is? perhaps one of (slow/fast) or (global/local). From reading further down, perhaps global/local is the better choice. coregrind/m_dispatch/dispatch-ppc64-linux.S The ifdeffery added to this file makes it quite hard to read. I would prefer that you make a new file, dispatch-ppc64le-linux.S, and rename the existing one to dispatch-ppc64be-linux.S. +#elif defined(VGP_ppc64le_linux) +__asm__( +".section \".toc\",\"aw\"" "\n" + Do you actually need the .toc section line there? coregrind/m_redir.c Same comments as above about second_ep -- I'd prefer if you used local or global. coregrind/m_syswrap/syscall-ppc64-linux.S Like with dispatch-ppc64-linux.S, please make a new file with the le stuff in.
Created attachment 87865 [details] Updated PPC64 LE support patch to address feedback from Mark Wielaard Patch updated to address comments from Mark. Note, the comments are being addressed in this update but the patch will be updated again once the dynamic support for the Endianess is done per Julian's comments. Just wanted to get these changes incorporated. > The memcheck/tests/atomic_incs.c tests/Makefile.am tests/check_isa-2_06_cap tests/check_isa-2_07_cap tests/is_ppc64_BE.c tests/power_insn_available.c changes already came in the earlier 2 patches. Moved to third patch > The ifunc_wrapper in coregrind/vg_preloaded.c which is new in svn. Should that use #if defined(VGP_ppc64be_linux) or is there a better way to determine when function descriptors are used? For the moment, yes the #define is the best. There is work going on to update this patch to make the Endianess dynamic. > none/tests/ppc32/jm-insns.c got adds a #if defined(VGP_ppc64_linux) #elif defined(VGP_ppc64le_linux) pair. The first should be #if defined(VGP_ppc64be_linux) Fixed
Created attachment 88018 [details] Updated PPC64 LE support patch to work with Julian's Endian patches The patch was reworked to work with the latest trunk changes to support BE and LE. Additionally, the comments from Julian on the previous version of the patch were addressed.
Created attachment 88153 [details] Updated PPC64 LE support patch to work with Julian's Endian patches Fixed a couple of issues found by Julian during his review. One of the issues would have caused issues on non PPC64 systems. Fixes were in file coregrind/m_ume/elf.c Patch was also updated to the latest Valgrind code base as of Aug 7, 2014 in preparation for comitting.
Committed, Valgrind revision 14239, VEX revision 2914
It looks like there is a very big regression in performance on ppc64 (factor 10 for some tools). See nightly perf results http://sourceforge.net/p/valgrind/mailman/message/32695000/ Probably due to the ppc64 patches, as only ppc64 perf seems impacted.
Also a new ppc32 failure appeared: + none/tests/ppc32/round (stdout)
VEX commit 2915 fixes a compiler warning about "dres.continueAt" being uninitialized in function disInstr_PPC (). The new if test for non 64 bit systems in LE mode did not set dres.continueAt to zero. Valgrind commit 14246, fixes an issue with the Makefile.all.am where the -02 and -m64 flags were not passed to compile the PPC64 VEX code. This resulted in a significant performance regression. The commit fixes passing the needed compile line arguments.