Bug 384633 - Add a simple progress-reporting facility
Summary: Add a simple progress-reporting facility
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.14 SVN
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-12 20:11 UTC by Julian Seward
Modified: 2018-01-21 08:01 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Implementation. (8.23 KB, patch)
2017-09-13 06:33 UTC, Julian Seward
Details
Addresses all review comments in comments 2 and 3. (12.08 KB, patch)
2018-01-15 12:28 UTC, Julian Seward
Details
Patch with Solaris implementation added (13.93 KB, patch)
2018-01-19 16:39 UTC, Ivo Raisr
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Julian Seward 2017-09-12 20:11:49 UTC
This is a proposal for a small enhancement to Valgrind's debugging output.

For long runs of Valgrind, I find it useful to have some rough indication as
to how the run is progressing.  This makes it possible, for example, to
guess when approximately the program will reach some desired state.

The attached patch adds a new command line flag, --progress-interval=number.
This causes V to print a 1-line summary of progress every |number| seconds.
For example, when starting Firefox with --progress-interval=10, I get lines
like this

--32411-- PROGRESS: U 110s, W 113s, 97.3% CPU, EvC 414.79M, TIn 616.7k, TOut 0.5k, #thr 67
--32411-- PROGRESS: U 120s, W 124s, 96.8% CPU, EvC 505.27M, TIn 636.6k, TOut 3.0k, #thr 64
--32411-- PROGRESS: U 130s, W 134s, 97.0% CPU, EvC 574.90M, TIn 657.5k, TOut 3.0k, #thr 63
--32411-- PROGRESS: U 140s, W 144s, 97.2% CPU, EvC 636.34M, TIn 659.9k, TOut 3.0k, #thr 62
--32411-- PROGRESS: U 150s, W 155s, 96.8% CPU, EvC 710.21M, TIn 664.0k, TOut 17.7k, #thr 61
--32411-- PROGRESS: U 160s, W 201s, 79.6% CPU, EvC 822.38M, TIn 669.9k, TOut 75.8k, #thr 60

Each line shows:
   U:    total user time
   W:    total wallclock time
   CPU:  overall average cpu use
   EvC:  number of event checks.  An event check is a backwards branch
         in the simulated program, so this is a measure of forward progress
         of the program
   TIn:  number of code blocks instrumented by the JIT
   TOut: number of instrumented code blocks that have been thrown away
   #thr: number of threads in the program

From the progress of these, it is possible to observe:

* when the program is compute bound (TIn rises slowly, EvC rises rapidly)
* when the program is in a spinloop (TIn/TOut fixed, EvC rises rapidly)
* when the program is JIT-bound (TIn rises rapidly)
* when the program is rapidly discarding code (TOut rises rapidly)
* when the program is about to achieve some expected state (EvC arrives
  at some value you expect)
* when the program is idling (U rises more slowly than W)

I have found this very useful this past few months and would like to land it.
Comment 1 Julian Seward 2017-09-13 06:33:18 UTC
Created attachment 107830 [details]
Implementation.

This works on Linux.  It should work on Solaris and OSX too, with the
limitation that it always shows zero user time.  This is because UInt VG_(get_user_milliseconds) isn't implemented on those targets and so
just returns zero.
Comment 2 Ivo Raisr 2017-09-13 16:19:34 UTC
Thank you for the patch, Julian.

I will implement VG_(get_user_milliseconds)() for Solaris, that's not a problem.

However I wonder why allowed values for --progress-interval are only 1..3600? Why not 0, the default?

Please also do something with comments around next_report_due_at, they look really scary. This is not the first static variable we have in Valgrind...

What happens if a Valgrind'ed process with --progress-interval > 0 is suspended for several minutes? Is the logic in maybe_progress_report() able to cope
with this?

And the last nit - please use %u for "#thr" formatting instead of "%d". nThreads is UInt.
Comment 3 Ivo Raisr 2017-09-13 16:20:32 UTC
And also please add the line explanation somewhere. User manual, verbose output...
Comment 4 Julian Seward 2018-01-15 12:28:52 UTC
Created attachment 109872 [details]
Addresses all review comments in comments 2 and 3.

Addresses all review comments in comments 2 and 3.  Ivo, if you are
happy with it, could you please do the Solaris version of 
VG_(get_user_milliseconds) ?

Note, this patch stores millisecond values in a UInt (unsigned 32 bit int)
and so will overflow after 49.71 days.  I don't consider this to be a big
problem.
Comment 5 Ivo Raisr 2018-01-17 16:27:21 UTC
(In reply to Julian Seward from comment #4)

> Addresses all review comments in comments 2 and 3.  Ivo, if you are
> happy with it, could you please do the Solaris version of 
> VG_(get_user_milliseconds)?

Will do that, Julian.
Comment 6 Ivo Raisr 2018-01-19 16:39:55 UTC
Created attachment 109992 [details]
Patch with Solaris implementation added
Comment 7 Ivo Raisr 2018-01-19 16:41:49 UTC
Hi Julian, patch with Solaris implementation is attached.
The patch has been refreshed to apply cleanly to the latest git HEAD and a NEWS entry added. It is ready to be pushed - let me know if I should do it.
Comment 8 Julian Seward 2018-01-20 17:33:06 UTC
Ivo, thanks for the fix.  Yes, if you are happy with it, please
do push it.
Comment 9 Ivo Raisr 2018-01-20 19:57:42 UTC
Pushed as commit bd077baa71a40b60dcf0286b9fb89d803323fd93.

https://sourceware.org/git/?p=valgrind.git;a=commit;h=bd077baa71a40b60dcf0286b9fb89d803323fd93
Comment 10 Julian Seward 2018-01-21 08:01:47 UTC
(In reply to Ivo Raisr from comment #9)
> Pushed as commit bd077baa71a40b60dcf0286b9fb89d803323fd93.

Thanks!