Bug 282105 - to reduce memory: generalise 'reclaimSuperBlock' to also reclaim splittable superblock
Summary: to reduce memory: generalise 'reclaimSuperBlock' to also reclaim splittable s...
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.7 SVN
Platform: Unlisted Binaries Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-15 21:20 UTC by Philippe Waroquiers
Modified: 2011-09-26 11:33 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
generalised reclaimSuperBlock + activate unsplittable reclaim for all Arena (13.14 KB, text/plain)
2011-09-15 21:20 UTC, Philippe Waroquiers
Details
generalised reclaimSuperBlock + activate unsplittable reclaim for all Arena (16.94 KB, text/plain)
2011-09-17 16:54 UTC, Philippe Waroquiers
Details
generalised+deferred reclaimSuperBlock + activate unsplittable reclaim for all Arena (20.89 KB, text/plain)
2011-09-19 21:55 UTC, Philippe Waroquiers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philippe Waroquiers 2011-09-15 21:20:31 UTC
Created attachment 63676 [details]
generalised reclaimSuperBlock + activate unsplittable reclaim for all Arena

A previous patch (bug 250101) introduced the concept of reclaimable
superblock: a superblock that cannot be splitted in smaller blocks
and that can be munmapped.

This patch generalises the reclaimable concept : all superblocks are
now reclaimable. To reduce fragmentatio, big superblocks are still
kept unsplittable.

The patch has 4 aspects:
1 The previous concept of 'reclaimable superblock' is renamed
  'unsplittable superblock' (this is a mechanical change).
2 Ensure that splittable blocks can be reclaimed :
  After each free, if the free results in a merged block which
  completely covers the superblock, then the superblock can be reclaimed.
3 If a superblock is reclaimed and there exists some translations 
  for this superblock then discard the translations.
  Note : I did not understand the comment speaking about
  circular dependency. Just calling VG_(discard_translations) seems
  to cause no problem. As m_transtab.c does not allocate client memory,
  I believe no circular (dynamic) dependency can be done.
4 Activate 'unsplittable superblock' for all arenas.


Results in memory decrease:
---------------------------
When starting a big application, the dinfo arena max size
decreased from 124Mb to 112Mb. Other arenas unchanged.

Performance impact:
-------------------
* Checking that a superblock can be reclaimed is very cheap (in cpu).
  If the block can be reclaimed, aspacemgr is used to munmap
  the superblock.

* In case big alloc/free are done (i.e. unsplittable superblocks),
  it is deemed that the cost of mmap/munmap is acceptable
  compared to the cost of using this memory.
  Note: mmap/munmap is also used by the glibc malloc for big blocks.
  So, for client arena, behaviour is similar to glibc.
  For Valgrind arenas having small superblocks, the mmap/munmap
  might be proportionally more costly.

* In case an application (or valgrind) does a very small
  nr of malloc/free just 'at the limit' of a superblock, then
  these small nr of malloc/free might imply to pay each time
  the price of aspacemgr mmap/munmap.
  If that would be a problem, then the superblock reclaim
  should be deferred till either another superblock can be
  reclaimed or till a new superblock must be allocated
  (meaning that the reclaimable superblock is not big enough,
   so we better first reclaim it to decrease fragmentation).
  (we might add a statistical counter for each arena, counting
   the nr of superblocks mmap calls so as to detect such problem
   with --profile-heap=yes).

The patch has some performance impact on the regression tests:
With this patch, about 30 seconds more cpu on +- 9minutes of cpu
on a fast amd64.
These 30 seconds are mostly additional system cpu, so very probably
due to the additional mmap/munmap.
First, there is systematically a mmap then an munmap at
the initialisation of m_mallocfree.c : to initialise and
check the allocator, a malloc followed by a free is done.
The malloc causes a superblock to be created. The free causes
it to be munmapped. This can be avoided by either initializing
the allocator another way (e.g. just calling VG_(free) (NULL))
or by never munmapping the first superblock
or by implementing the deferred reclaim (see above).
Second, it might be that many regression tests are doing
a very small nr of malloc/free, leading to mmap/munmap costs.

No significant performance impact was detected on 'make perf'.

On bigger (real) applications/tests, no performance impact detected.
Comment 1 Philippe Waroquiers 2011-09-17 16:54:35 UTC
Created attachment 63733 [details]
generalised reclaimSuperBlock + activate unsplittable reclaim for all Arena

Slightly improved version:
* small improvement of m_mallocfree.c performance
   (about 10% on perf/heap)
* added statistics about reclaim in profile-heap
* a few comment changes
* fixed exp-sgcheck hsg.c test.
Comment 2 Philippe Waroquiers 2011-09-19 09:01:18 UTC
I did a few more trials (with the new stats) regarding the need of a deferred
reclaim.
There is at least one regression test which behaves very badly without
the deferred reclaim (drd/tests/memory_allocation.c).

The deferred reclaim with the following logic will solve this kind of
bad behaviour, and will reclaim all blocks when needed:

* reclaim immediately "big unsplittable blocks".
* defer the reclaim of splittable block till either:
   another block is 'deferred reclaimed' in the same arena
   or a new superblock is needed in any other arena.

With this, it will both avoid pathological behaviour like in memory_allocation.c, avoid fragmentation and ensure the same 'max peak mmap memory'.

I will work on that approach this evening, and submit a new patch version.
Comment 3 Philippe Waroquiers 2011-09-19 21:55:28 UTC
Created attachment 63783 [details]
generalised+deferred reclaimSuperBlock + activate unsplittable reclaim for all Arena 

Implements deferred reclaim : splittable blocks are not reclaimed directly.
Instead, they are reclaimed when either another block can be (deferred) reclaimed
in the same arena or when a new superblock is needed in this or any other arena.
This limits memory usage and fragmentation, and keeps good performance
even for malloc/free sequences "around' superblock limits.

Tested on amd64 debian5 + x86 f12 + launched a firefox with it.
Comment 4 Julian Seward 2011-09-26 11:33:32 UTC
Committed, r12047.  Thanks!