Summary: | Overhaul DHAT | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Nick Nethercote <n.nethercote> |
Component: | dhat | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | jseward, philippe.waroquiers, roger |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Attachments: |
patch
Sample output file patch |
Created attachment 117022 [details]
Sample output file
Here is output from a run of the Rust compiler under DHAT.
If you want to try it out easily: - save the attached file - visit http://njn.valgrind.org/dh_view.html and load the saved file. No comment about the tool itself, just a suggestion for the implementation. As I understand, dhat has 2 data structures: * an interval tree, to find for an address a what block is covering it * a wordFM of an Execontext, together with some data associated with this execontext. It looks to me that the second data structure could be done by using pub_tool_xtree.h, which is a generalised way to associate data to an execontext, and print it in (currently) 2 output formats : massif format, kcachegrind format. It might be interesting to replace the wordFM by an xtree, and add the json output format to pub_tool_xtree.h (I suspect that an xtree might be somewhat faster than an wordFM). Massif speed was improved a lot when switching it to an xtree. Created attachment 117393 [details]
patch
Here is an updated patch with documentation written. I think this is ready to land.
> It might be interesting to replace the wordFM by an xtree,
It may. Nonetheless, I'd rather land the code as-is, because it's a major improvement over the existing DHAT. We can consider optimizations to the implementation later :)
(In reply to Nick Nethercote from comment #5) > > It might be interesting to replace the wordFM by an xtree, > > It may. Nonetheless, I'd rather land the code as-is, because it's a major > improvement over the existing DHAT. We can consider optimizations to the > implementation later :) Optimization (in terms of speed) is one (non major) aspect (I would guess that it is unlikely that dhat spends a lot of cpu in this data structure). The main aspect is to avoid adding another dhat specific data structure, instead of reusing (extending) the 'common coregrind data structure', used by massif/memcheck/helgrind. Or told otherwise: avoid growing the code basis unnecessarily. But I have not looked much in details how easy it would be to extend xtre to make it support the dhat and output json. This looks good to me. The viewer in particular is very helpful. Just a few very minor suggestions: I think the order of the Sort Metrics in the documentation should match that in dh_view.html. Should "Sort Metrics" actually be labelled something like "Sort and Filter Metrics"? How much work would it be to have a custom sort/filter option in the viewer? Note that if no effort is available to look at what is suggested in comment 3 and 6, then maybe better to push the patch as is. (for sure, I do not want to block a clear functional improvement for that reason). Julian can maybe comment. This looks excellent to me; I especially appreciate the documentation. I suggest landing it as-is, and if there's any residual breakage we can easily enough fix it up after the event. > I think the order of the Sort Metrics in the documentation should match that > in dh_view.html. I just double-checked; as far as I can tell it does match. Which ones do you think are out of order? > How much work would it be to have a custom sort/filter option in the viewer? That sounds difficult. I'm having trouble even imagining how it would work. |
Created attachment 117020 [details] patch I have totally overhauled DHAT. The current version is useful, but has some annoyances and deficiencies. The new version is much better. I have used extensively on the Rust compiler, using it on more than 20 PRs. The only incomplete part is that I haven't updated the documentation. I plan to do that just before landing, so that if I get feedback about the UI that causes me to change the output I don't need to update the docs again. What follows is the commit message, which describes the changes. Overhaul DHAT. This commit thoroughly overhauls DHAT, moving it out of the "experimental" ghetto. It makes moderate changes to DHAT itself, including dumping profiling data to a JSON format output file. It also implements a new data viewer (as a web app, in dhat/dh_view.html). The main benefits over the old DHAT are as follows. - The separation of data collection and presentation means you can run a program once under DHAT and then sort the data in various ways. Also, full data is in the output file, and the viewer chooses what to omit. - The data can be sorted in more ways than previously. Some of these sorts involve useful filters such as "short-lived" and "zero reads or zero writes". - The tree structure view avoids the need to choose stack trace depth. This avoids both the problem of not enough depth (when records that should be distinct are combined, and may not contain enough information to be actionable) and the problem of too much depth (when records that should be combined are separated, making them seem less important than they really are). - Byte and block measures are shown with a percentage relative to the global count, which helps gauge relative significance of different parts of the profile. - Byte and blocks measures are also shown with an allocation rate (bytes and blocks per million instructions), which enables comparisons across multiple profiles, even if those profiles represent different workloads. - Both global and per-node measurements are taken at the global heap peak ("At t-gmax"), which gives Massif-like insight into the point of peak memory use. - The final/liftimes stats are a bit more useful than the old deaths stats. (E.g. the old deaths stats didn't take into account lifetimes of unfreed blocks.) - The handling of realloc() has changed. The sequence `p = malloc(100); realloc(p, 200);` now increases the total block count by 2 and the total byte count by 300. Previously it increased them by 1 and 200. The new handling is a more operational view that better reflects the effect of allocations on performance. It makes a significant difference in the results, giving paths involving reallocation (e.g. repeated pushing to a growing vector) more prominence. Other things of note: - There is now testing, both regression tests that run within the standard test suite, and viewer-specific tests that cannot run within the standard test suite. The latter are run by loading dh_view.html?test=1 in a web browser. - The commit puts all tool lists in Makefiles (and similar files) in a consistent order: memcheck, cachegrind, callgrind, helgrind, drd, massif, dhat, lackey, none; exp-sgcheck, exp-bbv. - A lot of fields in dh_main.c have been given more descriptive names. Those names now match those used in dh_view.js.