Bug 250101 - huge "free" memory usage due to m_mallocfree.c "superblocks fragmentation"
Summary: huge "free" memory usage due to m_mallocfree.c "superblocks fragmentation"
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.6 SVN
Platform: Unlisted Binaries Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 269884 275852 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-09-03 22:56 UTC by Philippe Waroquiers
Modified: 2011-09-10 10:23 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
vgfragment.c program reproducing superblocks "fragmentation" (1.57 KB, text/plain)
2010-09-03 22:56 UTC, Philippe Waroquiers
Details
patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT (33.68 KB, text/plain)
2010-09-13 22:23 UTC, Philippe Waroquiers
Details
new version, upgraded to revision 11714. patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT (25.75 KB, text/plain)
2011-04-27 22:49 UTC, Philippe Waroquiers
Details
revision 11823 patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT + test (29.70 KB, text/plain)
2011-06-19 22:31 UTC, Philippe Waroquiers
Details
rev 11906 patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT + test (29.81 KB, text/plain)
2011-07-23 23:07 UTC, Philippe Waroquiers
Details
new version : reset reclaimable for superblock splitted by VG_(arena_memalign) (22.40 KB, text/plain)
2011-09-05 21:59 UTC, Philippe Waroquiers
Details
Same as previous patch, but avoids to set and then reset reclaimable (21.48 KB, text/plain)
2011-09-07 19:11 UTC, Philippe Waroquiers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philippe Waroquiers 2010-09-03 22:56:26 UTC
Created attachment 51290 [details]
vgfragment.c program reproducing superblocks "fragmentation"

When running a big application under valgrind, we got a "not enough memory" error.
while the total memory allocated by the big application was far to be a lot.
We found one application regression test that caused a big increase in VG memory
while the test does in fact never uses a lot of memory.

Investigation shows that the malloc memcheck replacement can "fragment"
a lot. This was mostly visible in the "client" arena but was
also visible in "tool" arena (56 Mb mmap'd while the peak memory used is 25 Mb).

A small C program (vgfragment.c, attached to the report) reproduces
the "fragmentation" of the client arena:
the peak memory used is about 30 Mb, but valgrind needs about 1200 Mb.
(in fact, the fragmentation is because the superblocks of m_mallocfree.c
are not released and/or not glued together).

Here is the output without valgrind (vgfragment.c is at the end of this
bug report):
./vgfragment
before freeing last block
    135168 int arena;    /* non-mmapped space allocated from system */
         1 int ordblks;  /* number of free chunks */
         0 int smblks;   /* number of fastbin blocks */
         1 int hblks;    /* number of mmapped regions */
   9633792 int hblkhd;   /* space in mmapped regions */
         0 int usmblks;  /* maximum total allocated space */
         0 int fsmblks;  /* space available in freed fastbin blocks */
         0 int uordblks; /* total allocated space */
    135168 int fordblks; /* total free space */
    135168 int keepcost; /* top-most, releasable (via malloc_trim) space */

after freeing last block
    135168 int arena;    /* non-mmapped space allocated from system */
         1 int ordblks;  /* number of free chunks */
         0 int smblks;   /* number of fastbin blocks */
         0 int hblks;    /* number of mmapped regions */
         0 int hblkhd;   /* space in mmapped regions */
         0 int usmblks;  /* maximum total allocated space */
         0 int fsmblks;  /* space available in freed fastbin blocks */
         0 int uordblks; /* total allocated space */
    135168 int fordblks; /* total free space */
    135168 int keepcost; /* top-most, releasable (via malloc_trim) space */

Note that we see that clib malloc is handling specially big blocks:
they are separately allocated as mmap-ed memory, which is un-mmap-ed when 
the big block is freed. I think this is (one of) the trick used by gnu linux
malloc to avoid fragmentation (for more info, see gnu linux libc mallopt
documentation).


output with valgrind (tested with 3.5.0 and with 3.6.0.SVN):
valgrind  ./vgfragment
...
==18811== 
before freeing last block
1203294208 int arena;    /* non-mmapped space allocated from system */
       177 int ordblks;  /* number of free chunks */
         0 int smblks;   /* number of fastbin blocks */
         0 int hblks;    /* number of mmapped regions */
         0 int hblkhd;   /* space in mmapped regions */
         0 int usmblks;  /* maximum total allocated space */
         0 int fsmblks;  /* space available in freed fastbin blocks */
   9632008 int uordblks; /* total allocated space */
1193650840 int fordblks; /* total free space */
         0 int keepcost; /* top-most, releasable (via malloc_trim) space */

after freeing last block
1203294208 int arena;    /* non-mmapped space allocated from system */
       177 int ordblks;  /* number of free chunks */
         0 int smblks;   /* number of fastbin blocks */
         0 int hblks;    /* number of mmapped regions */
         0 int hblkhd;   /* space in mmapped regions */
         0 int usmblks;  /* maximum total allocated space */
         0 int fsmblks;  /* space available in freed fastbin blocks */
         0 int uordblks; /* total allocated space */
1203282896 int fordblks; /* total free space */
         0 int keepcost; /* top-most, releasable (via malloc_trim) space */

==18811== 
==18811== HEAP SUMMARY:
==18811==     in use at exit: 0 bytes in 0 blocks
==18811==   total heap usage: 301 allocs, 301 frees, 1,454,434,408 bytes allocated
==18811== 
==18811== All heap blocks were freed -- no leaks are possible


As you can see, the total free space reported by mallinfo (which matches
what we observe e.g. with "top" command) is huge when it is running
under valgrind. When running the big application, this huge free space
makes impossible to run all the regression tests without crashing
with ENOMEM.

Maybe a logic similar to the gnu malloc could be implemented in m_mallocfree.c:
if a big block is VG_(arena_malloc-ed) (e.g. above 1Mb), then this always
causes a superblock to be mmap-ed. When this block is VG_(arena_free-d),
then its superblock is released (un-mmap-ed).

I know that an application that (linearly) increases a big/huge array is not
"heap friendly" but that is probably a common usage pattern which then
can often give problems when running under valgrind.
=> So a special case for such big blocks is probably worth.
(if you think this is a attractive approach, I could work on a patch based
on this approach).


A side note: if in vgfragment.c, the loop is executed 3000 times
instead of 300 times, then various mallinfo output and some valgrind
debug output starts to go bezerk, e.g. we obtain things such as:
    -------- Arena "client": -912891904 mmap'd, 30432016/30432016 max/curr --------
       30,432,016 in         2: replacemalloc.cm.1

    before freeing last block
    -177524736 int arena;    /* non-mmapped space allocated from system */

(I guess we do not have a negative size for the "client" arena and/or the mallinfo
arena:). The mallinfo API is hopelessly broken (it is an "int") but it is 
strange that the internal valgrind statistics gives a negative number 
for the --profile-heap=yes option).
Comment 1 Philippe Waroquiers 2010-09-04 01:05:44 UTC
Note that when the loop is changed to be 3000 rather than 300, even
the normal memcheck output contains bezerk data:

==20527== HEAP SUMMARY:
==20527==     in use at exit: 0 bytes in 0 blocks
==20527==   total heap usage: 3,001 allocs, 522 frees, 4,368,100,176
bytes allocated
==20527== 
==20527== All heap blocks were freed -- no leaks are possible

As you can see, 3001 blocks were allocated, 522 were freed but
all heaps blocks were freed ????
Comment 2 Philippe Waroquiers 2010-09-04 18:10:40 UTC
I think that the explanation for the 3001 allocs, 522 frees 
but still all blocks have been freed is that a failed allocation 
(no more memory) is still counted as an alloc (and returns 0x0).
vgfragment.c does not check the returned malloc result, and then frees 0x0,
which is not counted as a free.
It looks probably more logical to not count a failed malloc in the memcheck
heap usage statistics.
Comment 3 Philippe Waroquiers 2010-09-13 22:23:50 UTC
Created attachment 51616 [details]
patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT

The patch fixes the quadratic memory usage for some client malloc patterns
by implementing superblock "reclaim": big allocations are done in their specific
superblock, which is returned to the OS when the (big) block is freed.
Patch includes a test program.

NB: the statistics in memcheck on the nr of mallocs when malloc fails have 
not been fixed : it is not clear to me what is desired for that: malloc
and realloc have a different behaviour on that respect.
Comment 4 John Reiser 2011-04-01 18:17:26 UTC
*** Bug 269884 has been marked as a duplicate of this bug. ***
Comment 5 Philippe Waroquiers 2011-04-27 22:49:45 UTC
Created attachment 59379 [details]
new version, upgraded to revision 11714. patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT

upgraded to a recent SVN revision.
re-run regression tests on f12 x86 and debian 5.0 amd64

Note that there is one new regression test (memcheck/tests/sbfragment.vgtest).

massif/tests/big-alloc changes of behaviour (overhead for big alloc is changed).
Comment 6 drforbin6 2011-06-19 11:16:32 UTC
*** Bug 275852 has been marked as a duplicate of this bug. ***
Comment 7 Philippe Waroquiers 2011-06-19 22:31:15 UTC
Created attachment 61159 [details]
revision 11823 patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT + test

(Upgraded to rev 11823)
* modified m_mallocfree.c to fix the quadratic memory usage for some client
  malloc patterns by implementing superblock "reclaim": big allocations are
  done in their specific superblock, which is returned to the OS when the
 (big) block is freed.
* modified test massif/tests/big-alloc:
   added the file big-alloc.post.exp-64bit
   modified big-alloc.post.exp
  (overhead of big alloc is rounded to one page now)
* new test files memcheck/tests/sbfragment.vgtest
Comment 8 drforbin6 2011-06-20 03:04:50 UTC
philippe, I'm getting this with you new patch.....
patching file glibc-2.X.supp.in
patching file memcheck/tests/Makefile.am
Hunk #1 succeeded at 152 (offset -4 lines).
Hunk #2 succeeded at 238 (offset -3 lines).
patching file memcheck/tests/sbfragment.stderr.exp
patching file memcheck/tests/sbfragment.c
patching file memcheck/tests/sbfragment.stdout.exp
patching file memcheck/tests/sbfragment.vgtest
patching file massif/tests/big-alloc.post.exp-64bit
patching file massif/tests/big-alloc.post.exp
patching file massif/tests/Makefile.am
patching file coregrind/m_mallocfree.c
Hunk #1 succeeded at 150 (offset -1 lines).
Hunk #2 succeeded at 170 (offset -1 lines).
Hunk #3 succeeded at 186 (offset -1 lines).
Hunk #4 succeeded at 208 with fuzz 2 (offset -6 lines).
Hunk #5 succeeded at 471 (offset -6 lines).
Hunk #6 succeeded at 495 with fuzz 1 (offset -6 lines).
Hunk #7 FAILED at 510.
Hunk #8 FAILED at 525.
Hunk #9 succeeded at 596 (offset -15 lines).
Hunk #10 succeeded at 606 (offset -15 lines).
Hunk #11 succeeded at 689 (offset -15 lines).
Hunk #12 succeeded at 697 (offset -15 lines).
Hunk #13 succeeded at 722 (offset -15 lines).
Hunk #14 FAILED at 753.
Hunk #15 succeeded at 959 (offset -15 lines).
Hunk #16 succeeded at 1039 with fuzz 2 (offset -15 lines).
Hunk #17 FAILED at 1150.
Hunk #18 succeeded at 1309 (offset -16 lines).
Hunk #19 succeeded at 1417 (offset -33 lines).
Hunk #20 succeeded at 1519 (offset -37 lines).
4 out of 20 hunks FAILED -- saving rejects to file coregrind/m_mallocfree.c.rej


3.6.1
Comment 9 Philippe Waroquiers 2011-06-20 09:35:34 UTC
(In reply to comment #8)
> philippe, I'm getting this with you new patch.....
...
> Hunk #7 FAILED at 510.
...
> 3.6.1

It should apply cleanly on the last svn version (3.7.0.SVN, revision 11823).
Comment 10 Philippe Waroquiers 2011-07-23 23:07:08 UTC
Created attachment 62127 [details]
rev 11906 patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT + test

Due to popular demand, upgraded patch to recent svn version 11906.
Apart of refresh, no changes compared to prev. version.

* modified m_mallocfree.c to fix the quadratic memory usage for some client
  malloc patterns by implementing superblock "reclaim": big allocations are
  done in their specific superblock, which is returned to the OS when the
 (big) block is freed.
* modified test massif/tests/big-alloc:
   added the file big-alloc.post.exp-64bit
   modified big-alloc.post.exp
  (overhead of big alloc is rounded to one page now)
* new test files memcheck/tests/sbfragment.vgtest
Comment 11 Julian Seward 2011-07-24 17:44:08 UTC
Committed, r11911.  Thanks.
Comment 12 Julian Seward 2011-08-17 22:14:21 UTC
It appears that the following assertions in the r11911 patch are
sometimes triggered.  When running 32-bit builds of Firefox, at
cnn.com:

valgrind: m_mallocfree.c:1677 (vgPlain_arena_free): Assertion 'other_b-1 == 
(Block*)sb_end' failed.
==13424==    at 0x38029FDD: report_and_quit (m_libcassert.c:210)
==13424==    by 0x3802A135: vgPlain_assert_fail (m_libcassert.c:284)
==13424==    by 0x38038736: vgPlain_arena_free (m_mallocfree.c:1677)
==13424==    by 0x38072368: vgPlain_cli_free (replacemalloc_core.c:92)
==13424==    by 0x38001E14: die_and_free_mem (mc_malloc_wrappers.c:125)
==13424==    by 0x38002847: vgMemCheck_free (mc_malloc_wrappers.c:322)
==13424==    by 0x38074D27: vgPlain_scheduler (scheduler.c:1460)
==13424==    by 0x38086646: run_a_thread_NORETURN (syswrap-linux.c:98)

and

valgrind: m_mallocfree.c:1673 (vgPlain_arena_free): Assertion 
'(Block*)sb_start == b' failed.
==11377==    at 0x3803DA9E: report_and_quit (m_libcassert.c:210)
==11377==    by 0x3803DC18: vgPlain_assert_fail (m_libcassert.c:284)
==11377==    by 0x3804BB35: vgPlain_arena_free (m_mallocfree.c:1673)
==11377==    by 0x38084B56: vgPlain_cli_free (replacemalloc_core.c:92)
==11377==    by 0x3801629A: die_and_free_mem (mc_malloc_wrappers.c:125)
==11377==    by 0x38016A36: vgMemCheck_free (mc_malloc_wrappers.c:322)
==11377==    by 0x3808774D: vgPlain_scheduler (scheduler.c:1460)
==11377==    by 0x38099287: run_a_thread_NORETURN (syswrap-linux.c:98)

Backing out r11911 seems to stop this happening.
Comment 13 Julian Seward 2011-08-17 22:19:13 UTC
I temporarily backed out r11911 (only the m_mallocfree.c changes;
not the tests) until such time as we can investigate these failures.
(r11989)
Comment 14 Philippe Waroquiers 2011-09-05 21:59:27 UTC
Created attachment 63421 [details]
new version : reset reclaimable for superblock splitted by VG_(arena_memalign)

I could not reproduce the problem with firefox but the same assert failures could
be reproduced using memalign calls.
The test memcheck/tests/memalign2.c has been modified to reproduce these asserts.

The asserts are caused by VG_(arena_memalign) that is splitting the reclaimable
superblock indirectly allocated by a call to VG_(arena_malloc).
As a reclaimable cannot be splitted, this was causing asserts during VG_(arena_free).

Two solutions were possible:
1. Ensure that a superblock used for arena_memalign is not splitted.
2. Avoid that such a superblock is marked as reclaimable : if ever
   arena_memalign receives a reclaimable superblock, it is reset as
   not reclaimable.

The new patch attached implements the 2nd solution as big alignments would cause
a significant overhead with solution 1 (unused part of superblock would be big).

Bug reproduced by enhancing the test memcheck/tests/memalign2.c

Tested on f12/x86 and debian5/amd64.
The tests sbfragment and big-alloc are now again working properly.
Comment 15 Philippe Waroquiers 2011-09-07 19:11:34 UTC
Created attachment 63489 [details]
Same as previous patch, but avoids to set and then reset reclaimable

Instead of first setting the reclaimable aspect of the superblock,
and then resetting it in VG_(arena_memalign), this version temporarily
bumps up the minimum reclaimable size around the VG_(arena_malloc) call
in VG_(arena_memalign).
This is simpler than setting and then resetting.
Comment 16 Florian Krohm 2011-09-10 00:41:14 UTC
Philippe, I tested your latest patch on s390x. It works there as well.
Comment 17 Julian Seward 2011-09-10 10:23:40 UTC
With the comment 15 patch in place I also cannot reproduce the 
original Firefox failure.  So, committed as r12022.  Let's see
how it goes for a few days.