428689 – Feature request: include allocation size information at top of flame graph

Bug 428689 - Feature request: include allocation size information at top of flame graph

Summary: Feature request: include allocation size information at top of flame graph

Status:	REPORTED

Alias:	None

Product:	Heaptrack
Classification:	Applications
Component:	general (show other bugs)
Version:	unspecified
Platform:	Other Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Milian Wolff

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-11-04 15:45 UTC by Andrew Somerville
Modified:	2020-11-09 16:26 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Somerville 2020-11-04 15:45:23 UTC

SUMMARY

Using allocation size information is already captured by heaptrack, it would be helpful to be able to see that breakdown from the context of the final function frame in the flamegraph (or any of the other tree views)

Relatedly it would be nice to see the sizeof information for the types being allocated thought it's unclear to me if the necessary information for such a thing is a) already captured b) is even feasible to capture.

If such a (pair of) features I'd be willing to take a stab at implementing them.

Comment 1 Milian Wolff 2020-11-04 20:06:56 UTC

Can you make a rough mockup of how you envision this to look like? I have a hard time envisioning it, especially in flame graphs for code like this:


void a() { malloc(100); }
void b() { a(); malloc(200); }
void c() { a(); b(); malloc(200); malloc(100); }

This would create a "step" currently in the flamegraph and it's pretty clear where the self cost is. But when we would add something on top to indicate the allocation sizes... I'm  bit lost :) Would you think it should look like this:

[1x 100b]
[a      ] [1x 200b      ]
[b                      ] [1x 200b      ] [1x 100b]
[c                                                ]

Maybe if we remove the background color or something like that from these entries it would become easy to realize what is what, hmmm.

Regarding your other idea: I've been thinking about that for years but never really got around to try my shot at it. Generally, this should be tracked in a separate bug report. The problem is that the type information is lost in `malloc`, so we would have to try and parse the code lines for the allocation locations and guesstimate the type from there... But for C++ templates I guess it will quickly fail :( `new T[100]` is not very expressive

Comment 2 Andrew Somerville 2020-11-05 07:33:05 UTC

> Would you think it should look like this:
>
> [1x 100b]
> [a      ] [1x 200b      ]
> [b                      ] [1x 200b      ] [1x 100b]
> [c                                                ]

Yeah, that's almost exactly what I was thinking. In this example only the break-out/histogram of `c`'s allocations (next to `b`) is particularly meaningful since the others only have a single allocation, but you could imagine this same function path being traversed many times with different allocation sizes happening each time. Currently all those are aggregated into a single flame graph with a total outstanding memory at the moment of peak memory use, but with the allocation pseudo-histogram broken out as you have above, it would be clear whether there were many small allocations or few large ones. (For good measure the call counts of each function might also be nice)

With regard to the second idea, I'm afraid my understanding of some of the foundational stuff is a little thin. My naive thought would be that it might need to be heuristic. Perhaps seeing two successive frames matching a `operator new` regex and capturing their size argument, one could divide one by the other and deduce a size. But that's a shot in the dark.

And I can certainly make a separate bug report if it's helpful.

Comment 3 Andrew Somerville 2020-11-05 19:53:39 UTC

It would seem that it might be possible to search the call stack some small number of frames for function names matching a regex and find allocators for type T and then lookup the sizeof info for T in the DWARF table, then divide the malloc byte number by that sizeof for a number of objects allocated.

Alternatively to the `sizeof` lookup a similar trick might be done by searching the call stack and find an earlier allocation function which was passed the number of object to allocate and use the same division.

Unfortunately it seems this is a bit more complicated:
Perhaps unsurprisingly getting the function call arguments more-or-less requires dwarf info. 

https://nikhilism.com/post/2019/retrieving-function-arguments-while-unwinding-the-stack/

I tried the idea in gdb and here's an example where `bytes` (F0) can be divided by `__n` (F2) but the pattern of `new_allocator<>::allocate` 2 frames above malloc would have to be detected: 


    (gdb) bt
    #0 SP in __GI___libc_malloc (bytes=1600) at malloc.c:3028
    #1 SP in operator new(unsigned long) + 0x18 () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
    #2 SP in __gnu_cxx::new_allocator<Foo>::allocate(unsigned long, void const*) (this=0x7fffffffd910, __n=100)

Comment 4 Milian Wolff 2020-11-07 09:54:07 UTC

I think it's more reliable to query for the size of the type size using GDB directly, as accessing the function parameters of inlined frames would not be possible in a release build - and the allocator will most probably get inlined.

I mean you can do that in GDB:

```
(gdb) p sizeof(Foo)
$1 = 16
```

Before we'd try to get that information though I'd like to port heaptrack away from libbacktrace to elfutils instead, as I don't think getting that information would be easily possible with libbacktrace or libdwarf directly.

Furthermore, note how this common case cannot be handled without parsing the actual *source code*:

```
struct Foo { int a; char b; double c; };

int main() {
    auto b = new Foo[100];
    return 0;
};
```

GDB shows this backtrace in a debug build:

```
#0  0x00007ffff7b0fc90 in malloc () from /usr/lib/libc.so.6
#1  0x00007ffff7e5053a in operator new (sz=1600) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/new_op.cc:50
#2  0x000055555555579b in main () at test.cpp:4
```

We don't know the type nor the number of elements :(

Comment 5 Andrew Somerville 2020-11-09 16:26:28 UTC

I'm going to experiment a little with `libdw` from `elfutils` just to inform myself a bit. I'm a newbie compared to you, but the idea has become interesting enough that I'm curious to learn more.

From what you've said and what I've learned so far it does seem like any solution is likely to be heuristic rather than "closed-form" but I wonder if that might be useful none-the-less.