Bug 399355 - Add callgrind_diff
Summary: Add callgrind_diff
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: callgrind (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Josef Weidendorfer
URL:
Keywords:
Depends on: 399301
Blocks:
  Show dependency treegraph
 
Reported: 2018-10-04 00:25 UTC by Nick Nethercote
Modified: 2019-04-12 04:28 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Add callgrind_diff (25.23 KB, text/plain)
2018-10-04 00:25 UTC, Nick Nethercote
Details
Updated version (25.62 KB, text/plain)
2018-10-19 05:09 UTC, Nick Nethercote
Details
First big callgrind file (630.78 KB, application/x-xz)
2018-10-20 09:25 UTC, Philippe Waroquiers
Details
Second big callgrind file (630.92 KB, application/x-xz)
2018-10-20 09:25 UTC, Philippe Waroquiers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Nethercote 2018-10-04 00:25:01 UTC
Created attachment 115396 [details]
Add callgrind_diff

Cachegrind has cg_diff, and it's useful. Callgrind should have callgrind_diff too.
Comment 1 Josef Weidendorfer 2018-10-04 18:41:06 UTC
On a first look, this patch looks fine.
I would split the unrelated parts into separate commits (I can do that).
Comment 2 Nick Nethercote 2018-10-10 05:59:44 UTC
Is it really worth splitting the patch? I guess you could put the threshold changes in one patch and everything else in another, but it doesn't seem worth the effort.

I did the original threshold changes in Cachegrind while writing cg_diff... I don't remember the exact reason (it was a few years ago) but I did the threshold changes in response to the results I was getting from cg_diff.
Comment 3 Philippe Waroquiers 2018-10-10 20:47:38 UTC
I tried callgrind_diff at work on big callgrind out files, but I got
several error messages.

At least one of them can be reproduced by doing:
 ./vg-in-place --tool=callgrind  --dump-instr=yes --collect-jumps=yes --collect-systime=yes  ./perf/memrw -l 5
 ./vg-in-place --tool=callgrind  --dump-instr=yes --collect-jumps=yes --collect-systime=yes  ./perf/memrw -l 3

and then running callgrind_diff on the resulting 2 files:

./Inst/bin/callgrind_diff callgrind.out.8448 callgrind.out.8452
Line 179: too many event counts

On the big file, before this 'too many event counts', I have a bunch of errors
such as:
Argument "*" isn't numeric in addition (+) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_diff line 139, <INPUTFILE> line 27.
Argument "*" isn't numeric in addition (+) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_diff line 139, <INPUTFILE> line 28.
Argument "*" isn't numeric in addition (+) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_diff line 139, <INPUTFILE> line 29.
Argument "*" isn't numeric in addition (+) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_diff line 139, <INPUTFILE> line 32.
Argument "*" isn't numeric in addition (+) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_diff line 139, <INPUTFILE> line 34.
Argument "*" isn't numeric in addition (+) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_diff line 139, <INPUTFILE> line 35.

The first few lines of the big callgrind files are:
# callgrind format
version: 1
creator: callgrind-3.14.0.GIT
pid: 24494
cmd:  counterp_process
part: 5


desc: I1 cache: 
desc: D1 cache: 
desc: LL cache: 

desc: Timerange: Basic block 21282324349 - 244375501750
desc: Trigger: dump End_of_script_./replay/scripts/minutely_targets.replay.el

positions: instr line
events: Ir sysCount sysTime
summary: 1006404365481 50339204 252715138


ob=(3) /usr/lib64/libc-2.17.so
fl=(613) /usr/src/debug/glibc-2.17-c758a686/misc/tsearch.c
fn=(4236) tfind
0xf43a0 307 2
+2 +3 2
+3 -3 2
+3 * 2
+1 * 2
+3 * 2
+1 +3 2
jcnd=2/2 +37 +5 
* * 
+37 +5 2
+3 * 2
+3 * 2
+2 +12 2
+1 * 2
+1 -16 2
+2 +16 2
+2 * 2
Comment 4 Nick Nethercote 2018-10-12 03:08:28 UTC
Thank you for the feedback. I will look at the failures next week.
Comment 5 Nick Nethercote 2018-10-19 05:09:13 UTC
Created attachment 115743 [details]
Updated version

This fixes the problem that Philippe identified.
Comment 6 Philippe Waroquiers 2018-10-20 09:21:43 UTC
(In reply to Nick Nethercote from comment #5)
> Created attachment 115743 [details]
> Updated version
> 
> This fixes the problem that Philippe identified.
Thanks for the fix.
Here is some more feedback with bigger files (that I will attach).

I still obtain some error messages:
  callgrind_diff callgrind.out.counterp1.24494.5 callgrind.out.counterp1.17960.5 > d.out
  Use of uninitialized value in subtraction (-) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_diff line 159.
  Use of uninitialized value in subtraction (-) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_diff line 159.
  Use of uninitialized value in subtraction (-) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_diff line 159.
  ...

The numbers obtained in the resulting file seems also not realistic.
E.g. when I look in kcachegrind the intruction fetch costs for
curtain__airspace__list__curtain_impl__airspaces_iter__asp_iter__slave__next
(a leaf function), I see:
9_933_891_910 in the 24494.5 file
and 9_901_341_110 in the 17960.5 file,
but d.out contains
fl=/cm/build12/cm/ot/ELD/GEO_FLIGHT!22.5.0.87/build_G!19.OP.L7/sources/group_iterator_g.adb
fn=curtain__airspace__list__curtain_impl__airspaces_iter__asp_iter__slave__next
0 3731582914 74 963

callgrind_annotate d.out produces some output but mixed with errors after some lines:
  Use of uninitialized value in multiplication (*) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_annotate line 910.
  Use of uninitialized value in multiplication (*) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_annotate line 910.
  Use of uninitialized value in multiplication (*) at /home/philippe/valgrind/git/fixes/Inst/bin/callgrind_annotate line 910.

Also, when reading the resulting file with kcachegrind, that gives a list
of errors such as:
  kcachegrind(31363): Loading "./d.out" : 9 :  "Garbage at end of cost line ('-151919000 -2 -3')" 
  kcachegrind(31363): Loading "./d.out" : 15 :  "Garbage at end of cost line ('-1 -4963')" 
  kcachegrind(31363): Loading "./d.out" : 21 :  "Garbage at end of cost line ('-1')" 
  kcachegrind(31363): Loading "./d.out" : 30 :  "Garbage at end of cost line ('-546')" 
  kcachegrind(31363): Loading "./d.out" : 36 :  "Garbage at end of cost line ('-470934')" 
  kcachegrind(31363): Loading "./d.out" : 42 :  "Garbage at end of cost line ('-161016 -1 -1')" 
  kcachegrind(31363): Loading "./d.out" : 48 :  "Garbage at end of cost line ('-30')" 


kcachegrind looks to be more strict about the callgrind format, that according
to the callgrind format grammar documentation, only allows costs >= 0:
  Costs := (Number Space+)+
  Number := HexNumber | (Digit)+

Not clear if it would be easy to enhance kcachegrind to support negative costs
(maybe Josef can give an idea of the complexity of this).

The alternative is to have callgrind_diff producing a file with twice more events
than the compared files: for each event (e.g. instruction fetch/syscount),
have 2 new costs in the diff file:
  Instruction Fetch Increase     Instruction Fetch Decrease
  Syscount Increase
and then the cost will always be positive.

Brainstorm idea: if ever the callgrind format allows it, it would be nice
to have a 'delta call' output also, i.e. have for each 
'call A to B'  in each file produce some costs like
 A has called B     Calls Increase     Calls Decrease
(I guess this would imply to output a fake call from A to B in the diff file
and associate some costs to this call, showing the increase/decrease).
Comment 7 Philippe Waroquiers 2018-10-20 09:22:56 UTC
(In reply to Philippe Waroquiers from comment #6)
> The alternative is to have callgrind_diff producing a file with twice more
> events
> than the compared files: for each event (e.g. instruction fetch/syscount),
> have 2 new costs in the diff file:
>   Instruction Fetch Increase     Instruction Fetch Decrease
>   Syscount Increase
Comment sent a little bit too early, missing Syscount Decrease.
Comment 8 Philippe Waroquiers 2018-10-20 09:25:17 UTC
Created attachment 115774 [details]
First big callgrind file
Comment 9 Philippe Waroquiers 2018-10-20 09:25:44 UTC
Created attachment 115775 [details]
Second big callgrind file
Comment 10 Philippe Waroquiers 2018-10-20 09:30:11 UTC
(In reply to Philippe Waroquiers from comment #6)

> Brainstorm idea: if ever the callgrind format allows it, it would be nice
> to have a 'delta call' output also, i.e. have for each 
> 'call A to B'  in each file produce some costs like
>  A has called B     Calls Increase     Calls Decrease
> (I guess this would imply to output a fake call from A to B in the diff file
> and associate some costs to this call, showing the increase/decrease).
If having the delta for each 'arrow' in the callgraph as suggested above is too tricky,
a maybe simpler alternative is to just produce 2 new costs 'call increase/call decrease'
for each function : this will not indicate where the increase/decrease of calls
is coming from, but it might be helpful in any case.
Comment 11 Philippe Waroquiers 2018-10-20 10:17:02 UTC
When building, I also see the below error:

./callgrind/Makefile.am:1: error: docs/callgrind_diff-manpage.xml is in EXTRA_DIST but doesn't exist
Comment 12 Julian Seward 2019-03-10 07:29:51 UTC
Nick, what's the situation here?  Is it possible to make this work
well enough to be worth landing?
Comment 13 Nick Nethercote 2019-03-12 04:29:44 UTC
I haven't looked into the problems reported in comment 6. I don't see why this shouldn't be able to work, but the Callgrind format does have a lot of bells and whistles that the Cachegrind format does not, so they have to all be handled appropriately.
Comment 14 Philippe Waroquiers 2019-04-12 04:28:07 UTC
Note that at work, I am busy discussing to have someone working on this bug.
So, some progress might happen in the coming weeks (but not for 3.15).