Bug 402369 - Overhaul DHAT
Summary: Overhaul DHAT
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: dhat (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-20 02:35 UTC by Nick Nethercote
Modified: 2019-02-01 04:18 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
patch (223.45 KB, patch)
2018-12-20 02:35 UTC, Nick Nethercote
Details
Sample output file (2.46 MB, text/plain)
2018-12-20 09:20 UTC, Nick Nethercote
Details
patch (266.39 KB, patch)
2019-01-11 03:37 UTC, Nick Nethercote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Nethercote 2018-12-20 02:35:01 UTC
Created attachment 117020 [details]
patch

I have totally overhauled DHAT. The current version is useful, but has some annoyances and deficiencies. The new version is much better. I have used extensively on the Rust compiler, using it on more than 20 PRs.

The only incomplete part is that I haven't updated the documentation. I plan to do that just before landing, so that if I get feedback about the UI that causes me to change the output I don't need to update the docs again.

What follows is the commit message, which describes the changes.

    Overhaul DHAT.
    
    This commit thoroughly overhauls DHAT, moving it out of the
    "experimental" ghetto. It makes moderate changes to DHAT itself,
    including dumping profiling data to a JSON format output file. It also
    implements a new data viewer (as a web app, in dhat/dh_view.html).
    
    The main benefits over the old DHAT are as follows.
    
    - The separation of data collection and presentation means you can run a
      program once under DHAT and then sort the data in various ways. Also,
      full data is in the output file, and the viewer chooses what to omit.
    
    - The data can be sorted in more ways than previously. Some of these
      sorts involve useful filters such as "short-lived" and "zero reads or
      zero writes".
    
    - The tree structure view avoids the need to choose stack trace depth.
      This avoids both the problem of not enough depth (when records that
      should be distinct are combined, and may not contain enough
      information to be actionable) and the problem of too much depth (when
      records that should be combined are separated, making them seem less
      important than they really are).
    
    - Byte and block measures are shown with a percentage relative to the
      global count, which helps gauge relative significance of different
      parts of the profile.
    
    - Byte and blocks measures are also shown with an allocation rate
      (bytes and blocks per million instructions), which enables comparisons
      across multiple profiles, even if those profiles represent different
      workloads.
    
    - Both global and per-node measurements are taken at the global heap
      peak ("At t-gmax"), which gives Massif-like insight into the point of
      peak memory use.
    
    - The final/liftimes stats are a bit more useful than the old deaths
      stats. (E.g. the old deaths stats didn't take into account lifetimes
      of unfreed blocks.)
    
    - The handling of realloc() has changed. The sequence `p = malloc(100);
      realloc(p, 200);` now increases the total block count by 2 and the
      total byte count by 300. Previously it increased them by 1 and 200.
      The new handling is a more operational view that better reflects the
      effect of allocations on performance. It makes a significant
      difference in the results, giving paths involving reallocation (e.g.
      repeated pushing to a growing vector) more prominence.
    
    Other things of note:
    
    - There is now testing, both regression tests that run within the
      standard test suite, and viewer-specific tests that cannot run within
      the standard test suite. The latter are run by loading
      dh_view.html?test=1 in a web browser.
    
    - The commit puts all tool lists in Makefiles (and similar files) in a
      consistent order: memcheck, cachegrind, callgrind, helgrind, drd,
      massif, dhat, lackey, none; exp-sgcheck, exp-bbv.
    
    - A lot of fields in dh_main.c have been given more descriptive names.
      Those names now match those used in dh_view.js.
Comment 1 Nick Nethercote 2018-12-20 09:20:51 UTC
Created attachment 117022 [details]
Sample output file

Here is output from a run of the Rust compiler under DHAT.
Comment 2 Nick Nethercote 2018-12-20 09:24:33 UTC
If you want to try it out easily:
- save the attached file
- visit http://njn.valgrind.org/dh_view.html and load the saved file.
Comment 3 Philippe Waroquiers 2018-12-20 21:11:15 UTC
No comment about the tool itself, just a suggestion for the implementation.

As I understand, dhat has 2 data structures:
  * an interval tree, to find for an address a what block is covering it
  * a wordFM of an Execontext, together with some data associated with
    this execontext.

It looks to me that the second data structure could be done by using
pub_tool_xtree.h, which is a generalised way to associate data to an
execontext, and print it in (currently) 2 output formats : massif format,
kcachegrind format.

It might be interesting to replace the wordFM by an xtree,
and add the json output format to pub_tool_xtree.h
(I suspect that an xtree might be somewhat faster than an wordFM).
Massif speed was improved a lot when switching it to an xtree.
Comment 4 Nick Nethercote 2019-01-11 03:37:44 UTC
Created attachment 117393 [details]
patch

Here is an updated patch with documentation written. I think this is ready to land.
Comment 5 Nick Nethercote 2019-01-11 03:39:09 UTC
> It might be interesting to replace the wordFM by an xtree,

It may. Nonetheless, I'd rather land the code as-is, because it's a major improvement over the existing DHAT. We can consider optimizations to the implementation later :)
Comment 6 Philippe Waroquiers 2019-01-11 07:53:58 UTC
(In reply to Nick Nethercote from comment #5)
> > It might be interesting to replace the wordFM by an xtree,
> 
> It may. Nonetheless, I'd rather land the code as-is, because it's a major
> improvement over the existing DHAT. We can consider optimizations to the
> implementation later :)

Optimization (in terms of speed) is one (non major) aspect
(I would guess that it is unlikely that dhat spends a lot of cpu
in this data structure). 
The main aspect is to avoid adding another dhat specific data structure,
instead of reusing (extending) the 'common coregrind data structure',
used by massif/memcheck/helgrind.
Or told otherwise: avoid growing the code basis unnecessarily.

But I have not looked much in details how easy it would be to extend
xtre to make it support the dhat and output json.
Comment 7 Roger Light 2019-01-15 10:34:29 UTC
This looks good to me. The viewer in particular is very helpful.

Just a few very minor suggestions:

I think the order of the Sort Metrics in the documentation should match that in dh_view.html.

Should "Sort Metrics" actually be labelled something like "Sort and Filter Metrics"?

How much work would it be to have a custom sort/filter option in the viewer?
Comment 8 Philippe Waroquiers 2019-01-23 22:33:34 UTC
Note that if no effort is available to look at what is suggested
in comment 3 and 6, then maybe better to push the patch as is.
(for sure, I do not want to block a clear functional improvement
for that reason).

Julian can maybe comment.
Comment 9 Julian Seward 2019-01-31 15:08:57 UTC
This looks excellent to me; I especially appreciate the documentation.
I suggest landing it as-is, and if there's any residual breakage we can
easily enough fix it up after the event.
Comment 10 Nick Nethercote 2019-02-01 04:06:34 UTC
> I think the order of the Sort Metrics in the documentation should match that
> in dh_view.html.

I just double-checked; as far as I can tell it does match. Which ones do you think are out of order?

> How much work would it be to have a custom sort/filter option in the viewer?

That sounds difficult. I'm having trouble even imagining how it would work.