This allocation s* pa1 = static_cast<s*>(operator new(sizeof(*pa1), static_cast<std::align_val_t>(256U))); causes this output with clang++/libc++ 5 16,216 16,216 16,000 216 0 and with g++/libstdc++ 5 16,248 16,248 16,000 248 0 (the standard library used shouldn't be a factor as replacement allocators are used in the main source file). Previously there were 32 bytes of extra-heap. I'd expect another 8 bytes for admin, making 40, plus whatever the alignment adds. clang++ adds 176, g++ adds 208. If this is being rounded up to the next multiple of 256 then I'd expect 96 to get added.
So I think what happens is that arena_malloc allocates requested size + alignment + overhead Then the small block up to the alignment boundary gets freed. Since the alignment was added to the arena_malloc size then even in the worst case there will be the original user requested size available. That means that "extra-heap" is made up of overhead and slop from rounding up the the alignment boundary. And from what I see the slop depends on what was allocated previously. Since libc++ and libstdc++ do different allocations the slop is different. Conclusion: I need to filter the extra-heap for these aligned allocations.
commit d248a4830770160cc7062f32ec91933804fe401a Author: Paul Floyd <pjfloyd@wanadoo.fr> Date: Sun Nov 5 13:35:01 2023 +0100 Bug 476535 - Difference in allocation size for massif/tests/overloaded-new between clang++/libc++ and g++/libstdc++ In the end all I could do was filter the results. libc++ and libstdc++ allocate different sizes of stuff for their own usr. That means that when we get to allocating aligned blocks there is some slop (up to the alignment size) that gets counted. And the amount of that slop depends on the prior (internal) allocations.