164485 – VG_N_SEGNAMES and VG_N_SEGMENTS are (still) too small

Bug 164485 - VG_N_SEGNAMES and VG_N_SEGMENTS are (still) too small

Summary: VG_N_SEGNAMES and VG_N_SEGMENTS are (still) too small

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	general (other bugs)
Version First Reported In:	unspecified
Platform:	Compiled Sources Linux

Importance:	NOR crash
Target Milestone:	wanted3.6.0
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-06-19 23:33 UTC by Bill Clarke
Modified:	2013-09-30 18:34 UTC (History)
CC List:	6 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
patch for coregrind/m_aspacemgr/aspacemgr-linux.c (4.36 KB, patch) 2013-09-13 02:14 UTC, Scott McCaskill	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Bill Clarke 2008-06-19 23:33:44 UTC

Version:            (using Devel)
Installed from:    Compiled sources
Compiler:          gcc 4.2.1 
OS:                Linux

Many of the programs we debug have thousands of dynamic libraries.
I have used the following patch to get valgrind working for those programs:
"""
Index: coregrind/m_aspacemgr/aspacemgr-linux.c
===================================================================
--- coregrind/m_aspacemgr/aspacemgr-linux.c     (revision 8261)
+++ coregrind/m_aspacemgr/aspacemgr-linux.c     (working copy)
@@ -262,10 +262,10 @@
 /* ------ start of STATE for the address-space manager ------ */

 /* Max number of segments we can track. */
-#define VG_N_SEGMENTS 5000
+#define VG_N_SEGMENTS 10000

 /* Max number of segment file names we can track. */
-#define VG_N_SEGNAMES 1000
+#define VG_N_SEGNAMES 4000

 /* Max length of a segment file name. */
 #define VG_MAX_SEGNAMELEN 1000
"""

From a search on the web, it seems many (well, a few) others are hitting this too.  So perhaps it is time to bump up the defaults (again, it seems).

Comment 1 Nicholas Nethercote 2009-06-26 04:04:23 UTC

Julian, thoughts?

Comment 2 Mark Wielaard 2012-10-05 14:29:13 UTC

The google valgrind variants from https://code.google.com/p/valgrind-variant/wiki/ValgrindPatches has these:

@@ -265,10 +265,24 @@
 /* ------ start of STATE for the address-space manager ------ */
 
 /* Max number of segments we can track. */
-#define VG_N_SEGMENTS 5000
+/* glider: We keep VG_N_SEGMENTS low on Android, because they occupy
+   too much memory. We used to have VG_N_SEGMENTS=10000 on Darwin,
+   but it turned out to be too low for Chromium.
+*/ 
+#if defined(VGO_darwin)
+#define VG_N_SEGMENTS 50000
+#elif defined(ANDROID)
+#define VG_N_SEGMENTS 10000
+#else
+#define VG_N_SEGMENTS 100000
+#endif
 
 /* Max number of segment file names we can track. */
+#if defined(VGO_darwin) || defined(ANDROID)
 #define VG_N_SEGNAMES 1000
+#else
+#define VG_N_SEGNAMES 100000
+#endif
 
 /* Max length of a segment file name. */
 #define VG_MAX_SEGNAMELEN 1000

Comment 3 Mark Wielaard 2012-10-14 20:19:29 UTC

For Fedora I am now using:

 /* Max number of segments we can track. */
-#define VG_N_SEGMENTS 5000
+#define VG_N_SEGMENTS 50000
 
 /* Max number of segment file names we can track. */
-#define VG_N_SEGNAMES 1000
+#define VG_N_SEGNAMES 25000

That increases the size of the the static segnames[] and nsegments[] from ~1.2MB to use ~26MB.
Which should be fine for workstation/server machines. A more generic patch should probably do something like the patch in comment #2 to differentiate the constants based on device/arch/platform.

Comment 4 Scott McCaskill 2013-09-13 02:14:06 UTC

Created attachment 82305 [details]
patch for coregrind/m_aspacemgr/aspacemgr-linux.c

Dynamically allocate segnames and nsegments arrays in coregrind/m_aspacemgr/aspacemgr-linux.c to avoid hitting hard-coded limits of VG_N_SEGMENTS and VG_N_SEGNAMES.

Comment 5 Scott McCaskill 2013-09-13 02:21:40 UTC

Well, I had written a nice description of that patch, but the bug tracker forgot it as soon as I went to attach the patch, so here goes again..

I have a collection of apps that can potentially allocate a lot of memory and which also use a large number of memory mappings.  I got tired of hitting this limit, so I changed coregrind/m_aspacemgr/aspacemgr-linux.c so that it dynamically allocates the relevant arrays instead of having the hard-coded limits of VG_N_SEGNAMES and VG_N_SEGMENTS.  The initial sizes of the arrays are 1/10th of the old limits, so this patch should reduce memory usage in cases where the old limits are much greater than needed.

I've tested it with an app that uses in excess of 10GB of memory and it worked fine.  I realize that one app is not much in the way of testing, but I thought it would be good to find out asap if there's something in this patch that's fundamentally broken.

Comment 6 Florian Krohm 2013-09-13 08:25:03 UTC

(In reply to comment #5)

> I have a collection of apps that can potentially allocate a lot of memory
> and which also use a large number of memory mappings.  I got tired of
> hitting this limit, so I changed coregrind/m_aspacemgr/aspacemgr-linux.c so
> that it dynamically allocates the relevant arrays instead of having the
> hard-coded limits of VG_N_SEGNAMES and VG_N_SEGMENTS.  The initial sizes of
> the arrays are 1/10th of the old limits, so this patch should reduce memory
> usage in cases where the old limits are much greater than needed.
> 
> I've tested it with an app that uses in excess of 10GB of memory and it
> worked fine.  I realize that one app is not much in the way of testing, but
> I thought it would be good to find out asap if there's something in this
> patch that's fundamentally broken.

I do not think there is anything fundamentally broken with that patch. I would simplify allocate_array
and drop the last parameter, because allocate_array will not return unless allocation succeeds. 
That means at the callee side desired_num_elements == num_elements is always true.

I noticed there are 2 other arbitrary limits in this file, namely, VG_N_SEGMENTS and
VG_MAX_SEGNAMELEN. We should get rid of those at the same time.

I'm curious as to why we were using a hardcoded limit in the first place. I wonder whether there is a reason for that.  This "technique" :) has caused trouble in other places as well, e.g. buffer to small to demangle some C++ function name from boost etc.. 

What you could do is to run the regression tests with and without your patch:
make 
make check
make regtest

will do the trick. That would increase the confidence level for your patch.

Comment 7 Julian Seward 2013-09-13 11:35:10 UTC

(In reply to comment #4)
> Created attachment 82305 [details]
> Dynamically allocate segnames and nsegments arrays in
> coregrind/m_aspacemgr/aspacemgr-linux.c to avoid hitting hard-coded limits
> of VG_N_SEGMENTS and VG_N_SEGNAMES.

+#include "pub_core_mallocfree.h"

Alas .. this can't work reliably.  It causes a circular dependency
between m_aspacemgr and m_mallocfree.  Initialising m_aspacemgr then
requires m_mallocfree to be running, but m_mallocfree calls back into
the not-yet-initialised m_aspacemgr so as to mmap its first block of
memory.

This returns us to how it was in the early days of the project, in
which we had random-ish segfaults at startup.  Eventually the cycle
was removed by making m_aspacemgr self-contained, and having it start
first.  Unfortunately this was never really documented so you had no
way to know that.

I definitely want to fix this for 3.9.0.  I had been contemplating a
small-medium-large scheme similar to what is shown in comment 2.  For
64-bit anything, we assume we have a lot of swap available, so we can
increase the hardcoded limits a lot.  For 32-bit non-Android, we can
increase them a bit.  For {arm,x86}-android, probably no need to
change them at all.

Also, the sizing of the main translation table (m_transtab, N_SECTORS)
needs to be adjusted similarly.  Currently we can store about 220MB of
VEX generated code on x86_64 w/ Memcheck, which seemed a huge amount
way back then, but is now too small for even a Firefox startup.
Whereas on Android it is a bit too much.

My preference is to write a patch which decides "small/medium/large"
at configure time, generates C-preprocessor defines in the normal way,
and use (eg) #if defined(VG_config_tables_large) in the code.

Comment 8 Julian Seward 2013-09-19 10:07:10 UTC

I just committed the following as r13567.  It increases the limits by
a factor of 6 on all non-phone/tablet targets, which I think should
keep most people happy.  Pls LMK if you need them increased further.

Index: coregrind/m_aspacemgr/aspacemgr-linux.c
===================================================================
--- coregrind/m_aspacemgr/aspacemgr-linux.c	(revision 13566)
+++ coregrind/m_aspacemgr/aspacemgr-linux.c	(working copy)
@@ -264,11 +264,22 @@
 
 /* ------ start of STATE for the address-space manager ------ */
 
-/* Max number of segments we can track. */
-#define VG_N_SEGMENTS 5000
+/* Max number of segments we can track.  On Android, virtual address
+   space is limited, so keep a low limit -- 5000 x sizef(NSegment) is
+   360KB. */
+#if defined(VGPV_arm_linux_android) || defined(VGPV_x86_linux_android)
+# define VG_N_SEGMENTS 5000
+#else
+# define VG_N_SEGMENTS 30000
+#endif
 
-/* Max number of segment file names we can track. */
-#define VG_N_SEGNAMES 1000
+/* Max number of segment file names we can track.  These are big (1002
+   bytes) so on Android limit the space usage to ~1MB. */
+#if defined(VGPV_arm_linux_android) || defined(VGPV_x86_linux_android)
+# define VG_N_SEGNAMES 1000
+#else
+# define VG_N_SEGNAMES 6000
+#endif
 
 /* Max length of a segment file name. */
 #define VG_MAX_SEGNAMELEN 1000

Comment 9 Scott McCaskill 2013-09-19 14:56:27 UTC

With these changes I'm no longer hitting the limit in the app I use as a stress test.  Thanks!

Comment 10 Josh Stone 2013-09-23 18:46:27 UTC

Can you explain what it is about Android that limits *virtual* address space?  I understand the general constraints of 32-bit addressing, but I don't know why Android would be worse.

It seems to me you could make these limits pretty high on 32-bit, and make it absurdly high on 64-bit.  So long as you only touch memory up to the _used parts, this should only occupy mapped space, but not any more dirty pages in resident memory than needed.

Comment 11 Julian Seward 2013-09-29 10:35:23 UTC

(In reply to comment #10)
> Can you explain what it is about Android that limits *virtual* address
> space?  I understand the general constraints of 32-bit addressing, but I
> don't know why Android would be worse.

Android devices on the whole don't have kernels with swap enabled,
so using a lot of memory tends to get the processes nuked by the kernel
without notice.

Comment 12 Josh Stone 2013-09-30 18:34:37 UTC

(In reply to comment #11)
> Android devices on the whole don't have kernels with swap enabled,
> so using a lot of memory tends to get the processes nuked by the kernel
> without notice.

Even if it's not resident memory?  That's unfortunate...

FWIW, I just confirmed for myself that bumping those limits even another 10x only affects the virtual size -- the resident size stayed the same for a given workload.