Created attachment 63676 [details] generalised reclaimSuperBlock + activate unsplittable reclaim for all Arena A previous patch (bug 250101) introduced the concept of reclaimable superblock: a superblock that cannot be splitted in smaller blocks and that can be munmapped. This patch generalises the reclaimable concept : all superblocks are now reclaimable. To reduce fragmentatio, big superblocks are still kept unsplittable. The patch has 4 aspects: 1 The previous concept of 'reclaimable superblock' is renamed 'unsplittable superblock' (this is a mechanical change). 2 Ensure that splittable blocks can be reclaimed : After each free, if the free results in a merged block which completely covers the superblock, then the superblock can be reclaimed. 3 If a superblock is reclaimed and there exists some translations for this superblock then discard the translations. Note : I did not understand the comment speaking about circular dependency. Just calling VG_(discard_translations) seems to cause no problem. As m_transtab.c does not allocate client memory, I believe no circular (dynamic) dependency can be done. 4 Activate 'unsplittable superblock' for all arenas. Results in memory decrease: --------------------------- When starting a big application, the dinfo arena max size decreased from 124Mb to 112Mb. Other arenas unchanged. Performance impact: ------------------- * Checking that a superblock can be reclaimed is very cheap (in cpu). If the block can be reclaimed, aspacemgr is used to munmap the superblock. * In case big alloc/free are done (i.e. unsplittable superblocks), it is deemed that the cost of mmap/munmap is acceptable compared to the cost of using this memory. Note: mmap/munmap is also used by the glibc malloc for big blocks. So, for client arena, behaviour is similar to glibc. For Valgrind arenas having small superblocks, the mmap/munmap might be proportionally more costly. * In case an application (or valgrind) does a very small nr of malloc/free just 'at the limit' of a superblock, then these small nr of malloc/free might imply to pay each time the price of aspacemgr mmap/munmap. If that would be a problem, then the superblock reclaim should be deferred till either another superblock can be reclaimed or till a new superblock must be allocated (meaning that the reclaimable superblock is not big enough, so we better first reclaim it to decrease fragmentation). (we might add a statistical counter for each arena, counting the nr of superblocks mmap calls so as to detect such problem with --profile-heap=yes). The patch has some performance impact on the regression tests: With this patch, about 30 seconds more cpu on +- 9minutes of cpu on a fast amd64. These 30 seconds are mostly additional system cpu, so very probably due to the additional mmap/munmap. First, there is systematically a mmap then an munmap at the initialisation of m_mallocfree.c : to initialise and check the allocator, a malloc followed by a free is done. The malloc causes a superblock to be created. The free causes it to be munmapped. This can be avoided by either initializing the allocator another way (e.g. just calling VG_(free) (NULL)) or by never munmapping the first superblock or by implementing the deferred reclaim (see above). Second, it might be that many regression tests are doing a very small nr of malloc/free, leading to mmap/munmap costs. No significant performance impact was detected on 'make perf'. On bigger (real) applications/tests, no performance impact detected.
Created attachment 63733 [details] generalised reclaimSuperBlock + activate unsplittable reclaim for all Arena Slightly improved version: * small improvement of m_mallocfree.c performance (about 10% on perf/heap) * added statistics about reclaim in profile-heap * a few comment changes * fixed exp-sgcheck hsg.c test.
I did a few more trials (with the new stats) regarding the need of a deferred reclaim. There is at least one regression test which behaves very badly without the deferred reclaim (drd/tests/memory_allocation.c). The deferred reclaim with the following logic will solve this kind of bad behaviour, and will reclaim all blocks when needed: * reclaim immediately "big unsplittable blocks". * defer the reclaim of splittable block till either: another block is 'deferred reclaimed' in the same arena or a new superblock is needed in any other arena. With this, it will both avoid pathological behaviour like in memory_allocation.c, avoid fragmentation and ensure the same 'max peak mmap memory'. I will work on that approach this evening, and submit a new patch version.
Created attachment 63783 [details] generalised+deferred reclaimSuperBlock + activate unsplittable reclaim for all Arena Implements deferred reclaim : splittable blocks are not reclaimed directly. Instead, they are reclaimed when either another block can be (deferred) reclaimed in the same arena or when a new superblock is needed in this or any other arena. This limits memory usage and fragmentation, and keeps good performance even for malloc/free sequences "around' superblock limits. Tested on amd64 debian5 + x86 f12 + launched a firefox with it.
Committed, r12047. Thanks!