387664 – Memcheck: make expensive-definedness-checks be the default

Bug 387664 - Memcheck: make expensive-definedness-checks be the default

Summary: Memcheck: make expensive-definedness-checks be the default

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	3.14 SVN
Platform:	unspecified All

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-12-07 12:42 UTC by Julian Seward
Modified:	2018-01-11 17:26 UTC (History)
CC List:	2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
WIP patch. Tested on x86, amd64, arm32. (44.31 KB, patch) 2017-12-07 12:48 UTC, Julian Seward	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Julian Seward 2017-12-07 12:42:42 UTC

Memcheck tries to accurately track definedness at the bit level, at least
for scalar integer operations.  For many operations it is good enough to use
approximations which may overstate the undefinedness of the result of an
operation, provided that fully defined inputs still produce a fully defined
output.  For example, the standard analysis for an integer add is

   Add#(x#, y#) = Left(UifU(x#, y#))

which (as explained in the USENIX 05 paper
http://valgrind.org/docs/memcheck2005.pdf) means: for an add, worst-case
carry propagation is assumed.  So all bits to the left of, and including,
the rightmost undefined bit in either operand, are assumed to be undefined.

As compilers have become increasingly aggressive, some of these
approximations are no longer good enough.  For example, LLVM for some years
has used Add operations with partially undefined inputs, when it knows that
the carry propagation will not pollute important parts of the result.
Similarly, both GCC and LLVM will generate integer equality comparisons with
partially undefined inputs in situations where it knows the result of the
comparison will be defined.  In both cases, Memcheck's default strategies
give rise to false uninitialised-value errors, and the problem is getting
worse as time goes by.

Memcheck already has expensive (non-default) instrumentation for integer
adds, subtracts, and equality comparisons.  Currently these are only used if
you specify --expensive-definedness-checks=yes, and in some rare cases to do
with inlined string operations, as determined by analysing the block to be
instrumented, and by default on MacOS.  The performance hit from them can be
quite high, up to 30% lossage.

This patch makes the following changes:

* During instrumentation, there is much finer control over which IROps get
  expensive instrumentation.  The following groups can now be selected
  independently for expensive or cheap instrumentation:

     Iop_Add32
     Iop_Add64
     Iop_Sub32
     Iop_Sub64
     Iop_CmpEQ32 and Iop_CmpNE32
     Iop_CmpEQ64 and Iop_CmpNE64

  This makes it possible to only enable, on a given platform, only the minimal
  necessary set of expensive cases.

* The default set of expensive cases can be set on a per-platform basis.
  This is set up in the first part of MC_(instrument).

* There is a new pre-instrumentation analysis pass.  It identifies Iop_Add32
  and Iop_Add64 uses for which the expensive handling will give the same
  results as the cheap handling.  This includes all adds that are used only
  to create memory addresses.  Given that the expensive handling of adds is,
  well, expensive, and that most adds merely create memory addresses, this
  more than halves the extra costs of expensive Add handling.

* The pre-existing "bogus literal" detection (0x80808080, etc) pass
  has been rolled into the new pre-instrumentation analysis.

* The --expensive-definedness-checks= flag has been changed.  Before, it
  had two settings, "no" and "yes", with "no" being the default.  Now, it
  has three settings:

   no -- always use the cheapest handling

   auto -- use the minimum set of expensive handling needed to get
           reasonable results on this platform, and perform
           pre-instrumentation analysis so as to minimise the costs thereof

   yes -- always use the most expensive handling

  The default setting is now "auto".  The user-visible effect of the new
  default is that there should (hopefully) be a drop in false positive rates
  but (unfortunately) also some drop in performance.

Comment 1 Julian Seward 2017-12-07 12:48:43 UTC

Created attachment 109234 [details]
WIP patch.  Tested on x86, amd64, arm32.

Needs testing on other platforms, user-level documentation, and some
comment updates.  Functionally though I think it's pretty complete.

Comment 2 Carl Love 2017-12-07 20:59:54 UTC

Patch tested on Power 8 BE, Power 8 LE and Power 9.  No regression errors were found on any of the systems.  The time command was used to record how long it took to run the regression test.  Note, the two power 8 machines were lightly loaded whereas the power 9 system was fairly busy so the run times maybe a little suspect.  That said, the overhead of the patch looks to be very small.

The time to run the regression tests are listed below.  
          
                base tree              patched tree         % change
P8 LE   real    22m53.848s             22m.42.088s          -0.13
        user    18m31.216s             18m31.964s            0.067
        sys     0m32.700s              0m33.796s             0.29

P8 LE   real    31m55.201s             32m39.012s            0.46
        user    27m48.776s             28m27.808s            0.45
        sys     0m38.604s              0m43.320s             2.86

P9 LE   real    0m58.134s              11m1.152s             2.29
        user    9m54.270s              9m59.954s             2.34
        sys     1m29.456s              1m31.806s            12.20

Comment 3 Tom Hughes 2017-12-07 23:36:48 UTC

I'm not seeing any significant slowdown on x86_64 with the kind of typical use I would make of memcheck. Here's the current release version:

2466.53user 11.03system 42:21.37elapsed 97%CPU (0avgtext+0avgdata 9265420maxresident)k

and git master with the patch:

2574.74user 5.21system 43:49.79elapsed 98%CPU (0avgtext+0avgdata 9250704maxresident)k

Comment 4 Julian Seward 2017-12-12 09:27:54 UTC

Tom, Carl, thanks for the perf measurements.

The main patch landed as commit e847cb5429927317023d8410c3c56952aa47fb08.

memcheck/tests/vbit-test/vbit-test is now failing, because of the increased
accuracy with which some IROps are now instrumented.  Fixing that is in 
progress.

Comment 5 Julian Seward 2018-01-03 11:00:25 UTC

(In reply to Julian Seward from comment #4)
> memcheck/tests/vbit-test/vbit-test is now failing, because of the increased
> accuracy with which some IROps are now instrumented.  Fixing that is in 
> progress.

Fixed in 0f18cfc986f800b107c7eee063b8b7c04617e0b8.  I think this
can be closed now.