Created attachment 51290 [details] vgfragment.c program reproducing superblocks "fragmentation" When running a big application under valgrind, we got a "not enough memory" error. while the total memory allocated by the big application was far to be a lot. We found one application regression test that caused a big increase in VG memory while the test does in fact never uses a lot of memory. Investigation shows that the malloc memcheck replacement can "fragment" a lot. This was mostly visible in the "client" arena but was also visible in "tool" arena (56 Mb mmap'd while the peak memory used is 25 Mb). A small C program (vgfragment.c, attached to the report) reproduces the "fragmentation" of the client arena: the peak memory used is about 30 Mb, but valgrind needs about 1200 Mb. (in fact, the fragmentation is because the superblocks of m_mallocfree.c are not released and/or not glued together). Here is the output without valgrind (vgfragment.c is at the end of this bug report): ./vgfragment before freeing last block 135168 int arena; /* non-mmapped space allocated from system */ 1 int ordblks; /* number of free chunks */ 0 int smblks; /* number of fastbin blocks */ 1 int hblks; /* number of mmapped regions */ 9633792 int hblkhd; /* space in mmapped regions */ 0 int usmblks; /* maximum total allocated space */ 0 int fsmblks; /* space available in freed fastbin blocks */ 0 int uordblks; /* total allocated space */ 135168 int fordblks; /* total free space */ 135168 int keepcost; /* top-most, releasable (via malloc_trim) space */ after freeing last block 135168 int arena; /* non-mmapped space allocated from system */ 1 int ordblks; /* number of free chunks */ 0 int smblks; /* number of fastbin blocks */ 0 int hblks; /* number of mmapped regions */ 0 int hblkhd; /* space in mmapped regions */ 0 int usmblks; /* maximum total allocated space */ 0 int fsmblks; /* space available in freed fastbin blocks */ 0 int uordblks; /* total allocated space */ 135168 int fordblks; /* total free space */ 135168 int keepcost; /* top-most, releasable (via malloc_trim) space */ Note that we see that clib malloc is handling specially big blocks: they are separately allocated as mmap-ed memory, which is un-mmap-ed when the big block is freed. I think this is (one of) the trick used by gnu linux malloc to avoid fragmentation (for more info, see gnu linux libc mallopt documentation). output with valgrind (tested with 3.5.0 and with 3.6.0.SVN): valgrind ./vgfragment ... ==18811== before freeing last block 1203294208 int arena; /* non-mmapped space allocated from system */ 177 int ordblks; /* number of free chunks */ 0 int smblks; /* number of fastbin blocks */ 0 int hblks; /* number of mmapped regions */ 0 int hblkhd; /* space in mmapped regions */ 0 int usmblks; /* maximum total allocated space */ 0 int fsmblks; /* space available in freed fastbin blocks */ 9632008 int uordblks; /* total allocated space */ 1193650840 int fordblks; /* total free space */ 0 int keepcost; /* top-most, releasable (via malloc_trim) space */ after freeing last block 1203294208 int arena; /* non-mmapped space allocated from system */ 177 int ordblks; /* number of free chunks */ 0 int smblks; /* number of fastbin blocks */ 0 int hblks; /* number of mmapped regions */ 0 int hblkhd; /* space in mmapped regions */ 0 int usmblks; /* maximum total allocated space */ 0 int fsmblks; /* space available in freed fastbin blocks */ 0 int uordblks; /* total allocated space */ 1203282896 int fordblks; /* total free space */ 0 int keepcost; /* top-most, releasable (via malloc_trim) space */ ==18811== ==18811== HEAP SUMMARY: ==18811== in use at exit: 0 bytes in 0 blocks ==18811== total heap usage: 301 allocs, 301 frees, 1,454,434,408 bytes allocated ==18811== ==18811== All heap blocks were freed -- no leaks are possible As you can see, the total free space reported by mallinfo (which matches what we observe e.g. with "top" command) is huge when it is running under valgrind. When running the big application, this huge free space makes impossible to run all the regression tests without crashing with ENOMEM. Maybe a logic similar to the gnu malloc could be implemented in m_mallocfree.c: if a big block is VG_(arena_malloc-ed) (e.g. above 1Mb), then this always causes a superblock to be mmap-ed. When this block is VG_(arena_free-d), then its superblock is released (un-mmap-ed). I know that an application that (linearly) increases a big/huge array is not "heap friendly" but that is probably a common usage pattern which then can often give problems when running under valgrind. => So a special case for such big blocks is probably worth. (if you think this is a attractive approach, I could work on a patch based on this approach). A side note: if in vgfragment.c, the loop is executed 3000 times instead of 300 times, then various mallinfo output and some valgrind debug output starts to go bezerk, e.g. we obtain things such as: -------- Arena "client": -912891904 mmap'd, 30432016/30432016 max/curr -------- 30,432,016 in 2: replacemalloc.cm.1 before freeing last block -177524736 int arena; /* non-mmapped space allocated from system */ (I guess we do not have a negative size for the "client" arena and/or the mallinfo arena:). The mallinfo API is hopelessly broken (it is an "int") but it is strange that the internal valgrind statistics gives a negative number for the --profile-heap=yes option).
Note that when the loop is changed to be 3000 rather than 300, even the normal memcheck output contains bezerk data: ==20527== HEAP SUMMARY: ==20527== in use at exit: 0 bytes in 0 blocks ==20527== total heap usage: 3,001 allocs, 522 frees, 4,368,100,176 bytes allocated ==20527== ==20527== All heap blocks were freed -- no leaks are possible As you can see, 3001 blocks were allocated, 522 were freed but all heaps blocks were freed ????
I think that the explanation for the 3001 allocs, 522 frees but still all blocks have been freed is that a failed allocation (no more memory) is still counted as an alloc (and returns 0x0). vgfragment.c does not check the returned malloc result, and then frees 0x0, which is not counted as a free. It looks probably more logical to not count a failed malloc in the memcheck heap usage statistics.
Created attachment 51616 [details] patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT The patch fixes the quadratic memory usage for some client malloc patterns by implementing superblock "reclaim": big allocations are done in their specific superblock, which is returned to the OS when the (big) block is freed. Patch includes a test program. NB: the statistics in memcheck on the nr of mallocs when malloc fails have not been fixed : it is not clear to me what is desired for that: malloc and realloc have a different behaviour on that respect.
*** Bug 269884 has been marked as a duplicate of this bug. ***
Created attachment 59379 [details] new version, upgraded to revision 11714. patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT upgraded to a recent SVN revision. re-run regression tests on f12 x86 and debian 5.0 amd64 Note that there is one new regression test (memcheck/tests/sbfragment.vgtest). massif/tests/big-alloc changes of behaviour (overhead for big alloc is changed).
*** Bug 275852 has been marked as a duplicate of this bug. ***
Created attachment 61159 [details] revision 11823 patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT + test (Upgraded to rev 11823) * modified m_mallocfree.c to fix the quadratic memory usage for some client malloc patterns by implementing superblock "reclaim": big allocations are done in their specific superblock, which is returned to the OS when the (big) block is freed. * modified test massif/tests/big-alloc: added the file big-alloc.post.exp-64bit modified big-alloc.post.exp (overhead of big alloc is rounded to one page now) * new test files memcheck/tests/sbfragment.vgtest
philippe, I'm getting this with you new patch..... patching file glibc-2.X.supp.in patching file memcheck/tests/Makefile.am Hunk #1 succeeded at 152 (offset -4 lines). Hunk #2 succeeded at 238 (offset -3 lines). patching file memcheck/tests/sbfragment.stderr.exp patching file memcheck/tests/sbfragment.c patching file memcheck/tests/sbfragment.stdout.exp patching file memcheck/tests/sbfragment.vgtest patching file massif/tests/big-alloc.post.exp-64bit patching file massif/tests/big-alloc.post.exp patching file massif/tests/Makefile.am patching file coregrind/m_mallocfree.c Hunk #1 succeeded at 150 (offset -1 lines). Hunk #2 succeeded at 170 (offset -1 lines). Hunk #3 succeeded at 186 (offset -1 lines). Hunk #4 succeeded at 208 with fuzz 2 (offset -6 lines). Hunk #5 succeeded at 471 (offset -6 lines). Hunk #6 succeeded at 495 with fuzz 1 (offset -6 lines). Hunk #7 FAILED at 510. Hunk #8 FAILED at 525. Hunk #9 succeeded at 596 (offset -15 lines). Hunk #10 succeeded at 606 (offset -15 lines). Hunk #11 succeeded at 689 (offset -15 lines). Hunk #12 succeeded at 697 (offset -15 lines). Hunk #13 succeeded at 722 (offset -15 lines). Hunk #14 FAILED at 753. Hunk #15 succeeded at 959 (offset -15 lines). Hunk #16 succeeded at 1039 with fuzz 2 (offset -15 lines). Hunk #17 FAILED at 1150. Hunk #18 succeeded at 1309 (offset -16 lines). Hunk #19 succeeded at 1417 (offset -33 lines). Hunk #20 succeeded at 1519 (offset -37 lines). 4 out of 20 hunks FAILED -- saving rejects to file coregrind/m_mallocfree.c.rej 3.6.1
(In reply to comment #8) > philippe, I'm getting this with you new patch..... ... > Hunk #7 FAILED at 510. ... > 3.6.1 It should apply cleanly on the last svn version (3.7.0.SVN, revision 11823).
Created attachment 62127 [details] rev 11906 patch fixing m_mallocfree.c quadratic memory usage + use %lu for SizeT + test Due to popular demand, upgraded patch to recent svn version 11906. Apart of refresh, no changes compared to prev. version. * modified m_mallocfree.c to fix the quadratic memory usage for some client malloc patterns by implementing superblock "reclaim": big allocations are done in their specific superblock, which is returned to the OS when the (big) block is freed. * modified test massif/tests/big-alloc: added the file big-alloc.post.exp-64bit modified big-alloc.post.exp (overhead of big alloc is rounded to one page now) * new test files memcheck/tests/sbfragment.vgtest
Committed, r11911. Thanks.
It appears that the following assertions in the r11911 patch are sometimes triggered. When running 32-bit builds of Firefox, at cnn.com: valgrind: m_mallocfree.c:1677 (vgPlain_arena_free): Assertion 'other_b-1 == (Block*)sb_end' failed. ==13424== at 0x38029FDD: report_and_quit (m_libcassert.c:210) ==13424== by 0x3802A135: vgPlain_assert_fail (m_libcassert.c:284) ==13424== by 0x38038736: vgPlain_arena_free (m_mallocfree.c:1677) ==13424== by 0x38072368: vgPlain_cli_free (replacemalloc_core.c:92) ==13424== by 0x38001E14: die_and_free_mem (mc_malloc_wrappers.c:125) ==13424== by 0x38002847: vgMemCheck_free (mc_malloc_wrappers.c:322) ==13424== by 0x38074D27: vgPlain_scheduler (scheduler.c:1460) ==13424== by 0x38086646: run_a_thread_NORETURN (syswrap-linux.c:98) and valgrind: m_mallocfree.c:1673 (vgPlain_arena_free): Assertion '(Block*)sb_start == b' failed. ==11377== at 0x3803DA9E: report_and_quit (m_libcassert.c:210) ==11377== by 0x3803DC18: vgPlain_assert_fail (m_libcassert.c:284) ==11377== by 0x3804BB35: vgPlain_arena_free (m_mallocfree.c:1673) ==11377== by 0x38084B56: vgPlain_cli_free (replacemalloc_core.c:92) ==11377== by 0x3801629A: die_and_free_mem (mc_malloc_wrappers.c:125) ==11377== by 0x38016A36: vgMemCheck_free (mc_malloc_wrappers.c:322) ==11377== by 0x3808774D: vgPlain_scheduler (scheduler.c:1460) ==11377== by 0x38099287: run_a_thread_NORETURN (syswrap-linux.c:98) Backing out r11911 seems to stop this happening.
I temporarily backed out r11911 (only the m_mallocfree.c changes; not the tests) until such time as we can investigate these failures. (r11989)
Created attachment 63421 [details] new version : reset reclaimable for superblock splitted by VG_(arena_memalign) I could not reproduce the problem with firefox but the same assert failures could be reproduced using memalign calls. The test memcheck/tests/memalign2.c has been modified to reproduce these asserts. The asserts are caused by VG_(arena_memalign) that is splitting the reclaimable superblock indirectly allocated by a call to VG_(arena_malloc). As a reclaimable cannot be splitted, this was causing asserts during VG_(arena_free). Two solutions were possible: 1. Ensure that a superblock used for arena_memalign is not splitted. 2. Avoid that such a superblock is marked as reclaimable : if ever arena_memalign receives a reclaimable superblock, it is reset as not reclaimable. The new patch attached implements the 2nd solution as big alignments would cause a significant overhead with solution 1 (unused part of superblock would be big). Bug reproduced by enhancing the test memcheck/tests/memalign2.c Tested on f12/x86 and debian5/amd64. The tests sbfragment and big-alloc are now again working properly.
Created attachment 63489 [details] Same as previous patch, but avoids to set and then reset reclaimable Instead of first setting the reclaimable aspect of the superblock, and then resetting it in VG_(arena_memalign), this version temporarily bumps up the minimum reclaimable size around the VG_(arena_malloc) call in VG_(arena_memalign). This is simpler than setting and then resetting.
Philippe, I tested your latest patch on s390x. It works there as well.
With the comment 15 patch in place I also cannot reproduce the original Firefox failure. So, committed as r12022. Let's see how it goes for a few days.