Created attachment 173455 [details] callgrind.out before and after attempting to "augment" it, plus matching perf map file. SUMMARY I'm working on MoarVM, a language runtime that includes a jit compiler. When a program generates its own code, there's already multiple different ways to tell a variety of developer tools what those memory regions are, and whatever extra information you like along with it: * In order to unwind stack frames coming from jitted code, GDB lets you notify it of new functions being created, you store whatever extra custom info you need in client-program memory, and then load a .so into gdb itself as a "jit reader". I believe you can then also add a boatload of other stuff if you like, since much of the GDB API for blocks and symbols is usable. * MoarVM jit reader (not merged or part of any release yet): https://github.com/MoarVM/MoarVM/commit/3c63afed7d524852aab9a9335b9d1089d2f5410b * gdb documentation about the jit reader: https://sourceware.org/gdb/current/onlinedocs/gdb.html/Writing-JIT-Debug-Info-Readers.html * In order to get samples merged into the frames they belong to when recording with perf, you can write a perf-$PID.map to /tmp that assigns a name to a start address + length tuple * https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt * lots of VMs have some kind of flag or agent/plugin that creates the perf maps, for example the perf-map-agent for java * When using just a perf map, the "annotate" function in `perf report` will not work. This is a view that shows machine code and source code lines along with counts of corresponding samples. * MoarVM looks at the env var MVM_JIT_PERF_MAP, if it's anything other than empty it generates the file. * For more complex needs, your program can write out a "jitdump" file that is then used with `perf inject` to augment a `perf.data` file with recordings. `perf inject` can also be used to add buildid annotations in the right places of a `perf.data` file. * https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-inject.txt * https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jitdump-specification.txt * I believe that jitdump is what makes it possible to "annotate" jitted frames in the `perf report` UI. * libunwind has an interface where you can describe how registers relevant to unwinding change at different positions for the instruction pointer: * https://www.nongnu.org/libunwind/man/libunwind-dynamic(3).html * surely there are other things like this that I haven't encountered yet When recording the execution of MoarVM, the results from callgrind and cachegrind (and probably other tools as well), when loaded into kcachegrind, have just the memory address of the given jitted frames as their name, and "(unknown)" as their location, and the "source code" tab shows "The function is located in this ELF object:" "(unknown)", which makes sense. callgrind_annotate simply shows these frames like "???:0x00000000090cb000 [???]" I tried an approach similar to `perf inject` where I located the "definitions" of strings in the fn category that match the address of each entry of the perf map file I have, then later whenever i see the same fn or cfn number, i also spit out a fl/cfi and ob/cob. Unfortunately I had to do a lot of guesswork with how fl/fi, cfl/cfi, and such really work, so the file I ended up with seems to have the "augmented" frames kind-of split in half, if that makes sense; if i understand correctly, they appear once with correct call chains leading up to them but no callees, and once with no call chains leading up to them, but the list of callees appears more sensible. I have attached a zstd compressed tarball with the before and after "augmented" callgrind.out files, as well as the perf map file I used to do the augmentation. It is helpful to know that jitted frames in MoarVM are called directly from MVM_jit_code_enter (not sure if there are exceptions, I don't think there are.) STEPS TO REPRODUCE 1. Install a rakudo package, for experimenting with the jit and perf map, anything newer than ~6 years old should work. 2. Here's an invocation of callgrind + rakudo that gives you a perf map file as well. If you don't have a /usr/share/dict/words, any file with a few tens of thousands of lines should be more than enough to generate jitted frames and some decent recording data. env MVM_JIT_PERF_MAP=1 valgrind --tool=callgrind --dump-instr=yes -- rakudo-m -e 'my %idx; for "/usr/share/dict/words".IO.lines { for .comb { %idx{$_}++ } }; say %idx.sort.tail(10);' 3. Additionally, you can supply a path via the `MVM_JIT_DUMP_BYTECODE` env var where moar will write the compiled native code (no headers or anything, just the bytes that we jump into). There will be a subfolder with the PID in its name. 4. Check the resulting callgrind.out.$PID file with callgrind_annotate and/or kcachegrind. OBSERVED RESULT The resulting output contains many raw memory addresses with no indication where they come from, instead of function names, source file paths, and line numbers. Additionally, the "source code" and "machine code" tabs in KCachegrind have no way to find a place to look for details on these functions. EXPECTED RESULT I would hope that at least function name, file name and line number could become available for jitted frames. The MoarVM JIT can also associate line numbers with addresses in the jitted code, so it would be great if it was also possible to make that visible in kcachegrind and callgrind_annotate. Ideally, there would not be "yet another" way to make jitted frame information useful in valgrind, compared to what gdb, perf, libunwind, and how-ever many other projects have already built. If the interface is simple enough, for example just a few valgrind client commands like you can already use to teach memcheck about custom memory allocators, that make information available to valgrind, that would be okay if it doesn't fully match any existing interface. It could be made partially compatible with gdb's jit reader API perhaps. I would very much prefer not having to create a real, full, and proper ELF structure, whether in memory or on disk. Having the jitted frames immediately work instead of first having to run some kind of script or tool to combine callgrind.out and whatever has the needed extra information would be much preferred, if possible.
I should mention that of course it's possible to create full ELF files for the JIT compiled frames and dlopen them to load them into the program's memory, and it's probably going to work with callgrind/cachegrind just as well as ahead-of-time compiled code would. This issue is about having a way to do less work than that in order to have some amount of information available about jitted frames. An AOT compiler has to create object files and what not anyway in order to be usable at all, but a JIT has no reason to do the extra work.
To prevent any confusion, this issue is only about getting information about jit frames out of the jit compiler and into valgrind for use in callgrind/cachegrind and whatever else shows function names and stack traces. Actually just running code in jitted frames works fine, and there is the VALGRIND_DISCARD_TRANSLATIONS client request, but it is only necessary when valgrind has created a translation of code in a memory area, and the program changes the code - "self modifying code". Making sure that the translation valgrind has for the code in a given memory range is orthogonal to what I created this issue for. The JIT in MoarVM never writes to the memory ranges where it emitted jitted code after it has been "committed", which in our case means calling mprotect with "read and execute" on it. There is also an old patch here in the bug tracker for letting valgrind read ELF structures from memory instead of only from disk: #319237 (https://bugs.kde.org/show_bug.cgi?id=319237). This could help with this feature request in a pinch, but as I mentioned in the original description, having to create a full ELF structure is tedious, even if you don't have to write it to disk and load it from a file each time.