Memcheck tries to accurately track definedness at the bit level, at least for scalar integer operations. For many operations it is good enough to use approximations which may overstate the undefinedness of the result of an operation, provided that fully defined inputs still produce a fully defined output. For example, the standard analysis for an integer add is Add#(x#, y#) = Left(UifU(x#, y#)) which (as explained in the USENIX 05 paper http://valgrind.org/docs/memcheck2005.pdf) means: for an add, worst-case carry propagation is assumed. So all bits to the left of, and including, the rightmost undefined bit in either operand, are assumed to be undefined. As compilers have become increasingly aggressive, some of these approximations are no longer good enough. For example, LLVM for some years has used Add operations with partially undefined inputs, when it knows that the carry propagation will not pollute important parts of the result. Similarly, both GCC and LLVM will generate integer equality comparisons with partially undefined inputs in situations where it knows the result of the comparison will be defined. In both cases, Memcheck's default strategies give rise to false uninitialised-value errors, and the problem is getting worse as time goes by. Memcheck already has expensive (non-default) instrumentation for integer adds, subtracts, and equality comparisons. Currently these are only used if you specify --expensive-definedness-checks=yes, and in some rare cases to do with inlined string operations, as determined by analysing the block to be instrumented, and by default on MacOS. The performance hit from them can be quite high, up to 30% lossage. This patch makes the following changes: * During instrumentation, there is much finer control over which IROps get expensive instrumentation. The following groups can now be selected independently for expensive or cheap instrumentation: Iop_Add32 Iop_Add64 Iop_Sub32 Iop_Sub64 Iop_CmpEQ32 and Iop_CmpNE32 Iop_CmpEQ64 and Iop_CmpNE64 This makes it possible to only enable, on a given platform, only the minimal necessary set of expensive cases. * The default set of expensive cases can be set on a per-platform basis. This is set up in the first part of MC_(instrument). * There is a new pre-instrumentation analysis pass. It identifies Iop_Add32 and Iop_Add64 uses for which the expensive handling will give the same results as the cheap handling. This includes all adds that are used only to create memory addresses. Given that the expensive handling of adds is, well, expensive, and that most adds merely create memory addresses, this more than halves the extra costs of expensive Add handling. * The pre-existing "bogus literal" detection (0x80808080, etc) pass has been rolled into the new pre-instrumentation analysis. * The --expensive-definedness-checks= flag has been changed. Before, it had two settings, "no" and "yes", with "no" being the default. Now, it has three settings: no -- always use the cheapest handling auto -- use the minimum set of expensive handling needed to get reasonable results on this platform, and perform pre-instrumentation analysis so as to minimise the costs thereof yes -- always use the most expensive handling The default setting is now "auto". The user-visible effect of the new default is that there should (hopefully) be a drop in false positive rates but (unfortunately) also some drop in performance.
Created attachment 109234 [details] WIP patch. Tested on x86, amd64, arm32. Needs testing on other platforms, user-level documentation, and some comment updates. Functionally though I think it's pretty complete.
Patch tested on Power 8 BE, Power 8 LE and Power 9. No regression errors were found on any of the systems. The time command was used to record how long it took to run the regression test. Note, the two power 8 machines were lightly loaded whereas the power 9 system was fairly busy so the run times maybe a little suspect. That said, the overhead of the patch looks to be very small. The time to run the regression tests are listed below. base tree patched tree % change P8 LE real 22m53.848s 22m.42.088s -0.13 user 18m31.216s 18m31.964s 0.067 sys 0m32.700s 0m33.796s 0.29 P8 LE real 31m55.201s 32m39.012s 0.46 user 27m48.776s 28m27.808s 0.45 sys 0m38.604s 0m43.320s 2.86 P9 LE real 0m58.134s 11m1.152s 2.29 user 9m54.270s 9m59.954s 2.34 sys 1m29.456s 1m31.806s 12.20
I'm not seeing any significant slowdown on x86_64 with the kind of typical use I would make of memcheck. Here's the current release version: 2466.53user 11.03system 42:21.37elapsed 97%CPU (0avgtext+0avgdata 9265420maxresident)k and git master with the patch: 2574.74user 5.21system 43:49.79elapsed 98%CPU (0avgtext+0avgdata 9250704maxresident)k
Tom, Carl, thanks for the perf measurements. The main patch landed as commit e847cb5429927317023d8410c3c56952aa47fb08. memcheck/tests/vbit-test/vbit-test is now failing, because of the increased accuracy with which some IROps are now instrumented. Fixing that is in progress.
(In reply to Julian Seward from comment #4) > memcheck/tests/vbit-test/vbit-test is now failing, because of the increased > accuracy with which some IROps are now instrumented. Fixing that is in > progress. Fixed in 0f18cfc986f800b107c7eee063b8b7c04617e0b8. I think this can be closed now.