Bug 290061

Summary:	pie elf always loaded at 0x108000
Product:	[Developer tools] valgrind	Reporter:	Amir Szekely <kichik>
Component:	general	Assignee:	Paul Floyd <pjfloyd>
Status:	REPORTED ---
Severity:	normal	CC:	pjfloyd, pwmarcz, stephen.j.parker
Priority:	NOR
Version:	3.7 SVN
Target Milestone:	---
Platform:	Unlisted Binaries
OS:	Linux
Latest Commit:		Version Fixed In:
Sentry Crash Report:
Attachments:	suggested fix error log from 3.4.1 run suggested fix (with proper svn diff)

Description Amir Szekely 2011-12-29 00:11:39 UTC

Created attachment 67208 [details]
suggested fix

It seems load_ELF() always loads pie elf (e->e.e_type == ET_DYN) at 0x108000. The code uses info->exe_base and info->exe_end to calculate a random load address, trying to emulate kernel behavior, but those are only set later in the same function. When the code is executed, both are 0 and so ebase is always 0. A few lines later, ebase is set to 0x108000 so the elf is not loaded at 0x0.

This usually shouldn't be a problem, but for me it randomly generated mmap failures after a recent kernel upgrade. It seems my new kernel decided to load ld.so a bit lower and randomly it would overlap my moderately sized executables (~3MB) always loaded at 0x108000.

In the attached log (valgrind -d -d) ld.so is loaded at 0x311000 and my 2580480 bytes executable tries to load at 0x108000. So it's trying to map the executable at 0x108000-0x37e000 and fails as it overlaps ld.so at 0x311000. The result is the good old:

valgrind: mmap(0x108000, 2580480) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.

Originally this happened in Valgrind 3.4.1, but I've been able to reproduce with 3.7.0.

I believe this should be fixed by loading the elf to a random segment large enough to contain it. I've attached a patch that replaces ebase calculation code with a call to am_get_advisory_client_simple(). This way the elf will never overlap existing allocated memory segments. It doesn't exactly generate random loading addresses, but it's good enough in my opinion.

I've ran regression tests and the results haven't changed with the patch. I'd supply unit tests or regression tests too, but I am not sure where coregrind tests would go. If there is a place, please let me know and I'll write some, mostly so I can ease myself knowing my patch doesn't destroy anything.

Comment 1 Amir Szekely 2011-12-29 00:12:49 UTC

Created attachment 67209 [details]
error log from 3.4.1 run

Comment 2 Amir Szekely 2011-12-29 01:51:13 UTC

Created attachment 67212 [details]
suggested fix (with proper svn diff)

Comment 3 Amir Szekely 2011-12-29 21:16:20 UTC

I was able to reproduce this on Ubuntu 11.10 pretty easily. I created a pie elf with 3mb static array and ran Valgrind in a loop.

altor@valgrind:~$ uname -a
Linux valgrind 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux

altor@valgrind:~$ valgrind --version
valgrind-3.6.1-Debian

altor@valgrind:~$ cat test.c
static char meh[3000000]; // ~3mb
int main() {
  return 0;
}

altor@valgrind:~$ gcc -pie test.c
altor@valgrind:~$ readelf -h a.out
...
  Type:                              DYN (Shared object file)
  Machine:                           Intel 80386
...

altor@valgrind:~$ while valgrind ./a.out ; do echo wait for it... ; done
==1422== Memcheck, a memory error detector
==1422== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==1422== Using Valgrind-3.6.1-Debian and LibVEX; rerun with -h for copyright info
==1422== Command: ./a.out
==1422==
==1422==
==1422== HEAP SUMMARY:
==1422==     in use at exit: 0 bytes in 0 blocks
==1422==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==1422==
==1422== All heap blocks were freed -- no leaks are possible
==1422==
==1422== For counts of detected and suppressed errors, rerun with: -v
==1422== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 11 from 6)
wait for it...
valgrind: mmap(0x10b000, 2998272) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.

Comment 4 Stephen Parker 2016-12-22 17:58:59 UTC

This is a bit of a necrobump (almost 5 years later to the date!), but I ran into a similar problem trying to run memcheck on android/arm64 with an executable >600mb (Unreal game) and Amir's patch solved the problem.

Comment 5 Paweł Marczewski 2020-10-29 13:02:13 UTC

I've run into this problem trying to use Valgrind on Graphene [1], a project that functions as a library OS and loads other binaries into its own address space to execute them. The main executable is a PIE, and after loading, it loads the target binary.

This works well when running under Linux directly, as Linux will load the PIE under a high enough address. However, when running under Valgrind *and* loading a non-PIE binary inside, this often fails because Graphene will get mapped at 0x108000 and the inner binary typically will need to be mapped at 0x400000, which overlaps with the already loaded PIE binary.

Would it be possible for Valgrind to match Linux's behaviour, or use a high address by default, or perhaps just include an option to override the default address? I'd be happy to work on a patch to that effect.

[1] https://graphene.readthedocs.io/en/latest/

Comment 6 Paul Floyd 2024-05-20 20:20:12 UTC

The patch no longer applies cleanly. Currently there is some special case code for mips64. I've tried to access a suitable machine to check if this is still relevant, but no success so far.