82301 – FV memory layout too rigid

Bug 82301 - FV memory layout too rigid

Summary: FV memory layout too rigid

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	general (show other bugs)
Version:	2.1.1
Platform:	unspecified Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Duplicates (1):	101271 (view as bug list)
Depends on:
Blocks:

Reported:	2004-05-27 10:22 UTC by Nicholas Nethercote
Modified:	2005-10-05 22:57 UTC (History)
CC List:	2 users (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Nicholas Nethercote 2004-05-27 10:22:58 UTC

With full virtualization (FV) the memory layout changed.  This has advantages,
in particular the separation between client and Valgrind+tool is much clearer,
and allows the use of segments to prevent the client clobbering Valgrind.

However, the downside is that the memory layout is more rigid, and this seems to
be causing various problems where one component (client, Valgrind, or tool)
cannot allocate enough memory:

* Bug #78048 is an example where the client cannot get enough memory.
* Josef W has had problems with Calltree where the tool could not allocate
enough heap(?) memory for its data structures.
* Someone else (I think it was reported on the valgrind-users mailing list) had
problems loading a very big (272MB) shared object, where Valgrind couldn't
mmap it in order to read the debug info.

There are separate sections in the address space for Valgrind, for the tool, for
shadow memory, and for the client.  The first three should be kept separate
from the client, but there's nothing to say they can't be intermingled if
necessary.  I think the boundary between client and non-client should be
moveable too.

In short, no part should run out of memory when another part has not.

Comment 1 Nicholas Nethercote 2004-06-03 10:33:10 UTC

More data: A user with a 546 MB executable was having problems with no line info:

> ==31711== Reading syms from <executable>
> ==31711== mmap failed on <executable>   
  
They made the following change in coregrind/vg_main.c

> #define VALGRIND_MAPSIZE      (128*1024*1024)
---
< #define VALGRIND_MAPSIZE      (1024*1024*1024)

And in stage2.c changed the hardcoded value for 0xb0000000.
>     info.map_base = 0xb0000000;
---
<     info.map_base = 0x78000000;

This seemed to fix things, mmap now worked and the debug info
appeared as expected.

Comment 2 Nicholas Nethercote 2004-07-15 15:08:32 UTC

Change committed on 17/5/2004 improved things significantly.  Valgrind + tool now have a single 256MB section for all their (non-shadow) memory needs, instead of two 128MB sections.  Thus memory will be exhausted not as soon in some circumstances, eg. for Calltree.  For example, previously the biggest executable that could have its debug info read would have been < 128MB, now it's < 256MB.

There's still room for improvement though;  I'm working on it.

Comment 3 Nicholas Nethercote 2004-08-05 14:31:18 UTC

More data:  a user with a 2GB/2GB user/kernel split was having problems because
Valgrind was unable to load stage2 at 0xb0000000--0xbfffffff.  Changing Valgrind
to load at 0x70000000--0x7fffffff works to an extent;  Addrcheck, Massif and Cachegrind work, but Memcheck and Helgrind don't because they use so much memory
that 0x40000000 isn't available to the client, so standard mmap() calls fail.

Also, the hard-wiring to 0xb0000000 means that systems with 4GB user-space are
not utilising the top 1GB.

Comment 4 Nicholas Nethercote 2004-08-31 15:12:12 UTC

More data: Julian had problems with Memcheck and Helgrind on a stock RH8.0 (kernel 2.4.18-14).  The big shadow memory mmap() was failing at startup.  Problem seemed to be that the kernel didn't support allocate-on-write mmap segments, and so it refused to mmap() a segment larger than the available swap space.  Increasing the swap space to 1.9GB (Memcheck requires about 1.5GB for shadow memory) made it work.  We've had a few similar complaints that could have had the same cause.

So, that's a black mark against the use of really big mapped segments;  an incremental approach may well be better.

Comment 5 Nicholas Nethercote 2004-10-28 18:53:08 UTC

The problem from comment #3 has been partly fixed:  we are now building Valgrind as a position-independent executable (PIE) on systems that support it (ie. ones with more recent versions of gcc and ld).  This means that systems that don't
have the standard 3G:1G user/kernel split work better.

Comment 6 Nicholas Nethercote 2005-01-31 22:13:06 UTC

More data:  a number of users have had problems with symbols not being read from big files.  It seems 300MB+ executables are not that uncommon;  one user had a 900MB executable.

Incremental loading of symbols would seem to be the right way to handle this.

Comment 7 Nicholas Nethercote 2005-03-11 00:08:24 UTC

*** Bug 101271 has been marked as a duplicate of this bug. ***

Comment 8 Nicholas Nethercote 2005-10-05 22:57:17 UTC

Julian's aspacem rewrite has fixed this.  Eg. see the resolution for bug #92071 which was about the large debug info problem.  Good work Julian!