Bug 398883 - valgrind incorrectly assumes ABI on PowerPC based on endianness
Summary: valgrind incorrectly assumes ABI on PowerPC based on endianness
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.13.0
Platform: Other Linux
: NOR crash
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-20 18:10 UTC by A. Wilcox (awilfox)
Modified: 2023-04-25 18:52 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Ensure ELFv2 is supported on PPC64 (30.90 KB, patch)
2018-09-27 22:27 UTC, A. Wilcox (awilfox)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description A. Wilcox (awilfox) 2018-09-20 18:10:28 UTC
valgrind (mainly coregrind, though there are a few misc bits in memcheck it looks like) assumes ELFv1 ABI on big endian and ELFv2 ABI on little endian.

Neither of these assumptions are correct; the musl libc (used by Void, Adélie, Alpine, Sabotage, and other Linux distros) uses ELFv2 ABI on big endian, and older releases of OpenSuSE use ELFv1 ABI on little endian.

I plan on submitting a patch, either here or on Differential, in the coming weeks.  My change plan is:


* add definition, VGP_ppc64_linux_abi1 and VGP_ppc64_linux_abi2

* abi1 is defined if (defined(VGP_ppc64be_linux) && (!defined(_CALL_ELF) || _CALL_ELF == 1) || (defined(VGP_ppc64le_linux) && defined(_CALL_ELF) && _CALL_ELF == 1)

* abi2 is defined if (defined(VGP_ppc64le_linux) && (!defined(_CALL_ELF) || _CALL_ELF == 2) || (defined(VGP_ppc64be_linux) && defined(_CALL_ELF) && _CALL_ELF == 2)

* then, in the C code, defined(VGP_ppc64le_linux) becomes defined(VGP_ppc64_linux_abi2)

* and defined(VGP_ppc64be_linux) becomes defined(VGP_ppc64_linux_abi1)
Comment 1 A. Wilcox (awilfox) 2018-09-25 21:27:04 UTC
Okay, it doesn't have to be so hard.  There is already VG_PLAT_USES_PPCTOC; so this change just needs to actually use that define instead of just endianness to determine what ABI to use.
Comment 2 A. Wilcox (awilfox) 2018-09-27 22:27:12 UTC
Created attachment 115278 [details]
Ensure ELFv2 is supported on PPC64

This patch has been tested on PPC64 BE, both ELFv1 (Debian) and ELFv2 (Adélie).  I don't have any LE hardware to test on, but there should be no functional change.

I've tested memcheck and callgrind on both ELF ABIs on a POWER9 host running big-endian and found no issues.
Comment 3 Julian Seward 2019-03-10 07:30:47 UTC
Carl, do you have any opinion on this?
Comment 4 Carl Love 2019-03-11 16:00:41 UTC
IBM only supports ELFv1 ABI on Big endian machines and ELFv2 ABI on Little endian machines.  I checked a little on the statements about the ELF usages.  From what I was told OpenSuSE uses ELFv2 ABI on little endian.  So, I would be curious where to get the info that OpenSuSE is using ELFv1 ABI on Little Endian.  Similarly, with the musl libc.  From my discussion with the GCC team, they said that musl has been known to play around with ELFv2 ABI on Big Endian but they are off on their own in this regard as it isn't supported, recommended or tested by IBM.  The key thing is the OS, library, Valgrind etc.  all need to either use ELFv1 ABI or ELFv2 ABI.  You can't mix these.

Given that IBM does not recommend using ELFv1 on Little endian and ELFv2 on Big endian, that is not to say it can't be done.  The proposed patch is not necessarily invalid but IBM will not support these but will also not stand in the way of someone else trying it but they are on their own for support.
Comment 5 A. Wilcox (awilfox) 2019-03-11 17:17:18 UTC
I noted that it was specifically older versions of OpenSuSE.  Specifically, the very first release.  I believe it is long deprecated; it still, however, does exist.

> Similarly, with the musl libc.  From my discussion with the GCC team, they said that musl has been known to play around with ELFv2 ABI on Big Endian but they are off on their own in this regard as it isn't supported, recommended or tested by IBM.

I fear "play around" may be a wee dismissive.  It's a serious port and it has serious consumers, including Adélie Linux and Void Linux.  Additionally, FreeBSD is planning to move to ELFv2 on Big Endian, which means eventually FreeBSD will need to undefine VG_PLAT_USES_PPCTOC as well. :)
Comment 6 Brandon Bergren 2019-03-11 17:32:41 UTC
While I agree that using ELFv1 on LE is a completely unsupported configuration and need not be supported, the OpenPOWER ELF V2 ABI is fully defined and intended for use in both little- and big-endian environments.

Due to the improved toolchain support (specifically, lld's dropping of ELFv1 support entirely, and ELFv2 being a lot less "weird" of an ABI meaning it tends to have less esoteric issues and can more easily leverage mainstream optimizations) I see the future of BE on POWER to be ELFv2 as well.

And yes, FreeBSD is moving towards ELFv2 at a rapid pace. (I personally am running ELFv2 userlands exclusively in FreeBSD on my POWER9 machines.)
Comment 7 Brandon Bergren 2019-05-14 23:21:29 UTC
Ahh, I think we were using different meanings for the word "supported". I was speaking in terms of "what can the hardware do, is specified in the ABI, and is currently in use in the wild", not "what you can actually get official support for".
Comment 8 Sam James 2023-04-25 01:11:20 UTC
(In reply to Carl Love from comment #4)
> 
> Given that IBM does not recommend using ELFv1 on Little endian and ELFv2 on
> Big endian, that is not to say it can't be done.  The proposed patch is not
> necessarily invalid but IBM will not support these but will also not stand
> in the way of someone else trying it but they are on their own for support.

In that case, is anyone aware of any technical issue with the patch?
Comment 9 Carl Love 2023-04-25 15:37:36 UTC
I looked thru the patch.  Nothing jumps out at me as being an issue.  But, that doesn't guarantee anything given the assembly and all.  I did try to apply the patch to test it but it doesn't apply.  

So, I would say, update the patch and lets test it on the supported BE and LE systems to make sure the patch doesn't break anything.  Test it on the non-standard systems where you want to use it.  As long as everything works fine, I wouldn't object to it.
Comment 10 A. Wilcox (awilfox) 2023-04-25 18:52:41 UTC
Indeed, the only technical issue with this patch is that it is outdated.  We have an updated version.  I will upload the new version to this bug some time this week after testing it against the latest release and main branch.