> http://valgrind.org/docs/manual/mc-manual.html > Because there are different kinds of leaks with different severities, an interesting question is this: which leaks should be counted as true "errors" and which should not? The answer to this question affects the numbers printed in the ERROR SUMMARY line, and also the effect of the --error-exitcode option. Memcheck uses the following criteria: > * First, a leak is only counted as a true "error" if --leak-check=full is specified. In other words, an unprinted leak is not considered a true "error". If this were not the case, it would be possible to get a high error count but not have any errors printed, which would be confusing. Based on this reasoning, I think --show-possibly-lost should bring down the error count (on which the exit code is based). It's preventing Mozilla's Valgrind tests from being green: https://bugzilla.mozilla.org/show_bug.cgi?id=631811#c12 https://bugzilla.mozilla.org/show_bug.cgi?id=793509#c3 > * After that, definitely lost and possibly lost blocks are counted as true "errors". The current behavior is documented, sorta. It's also sorta self-contradictory. > Indirectly lost and still reachable blocks are not counted as true "errors", even if --show-reachable=yes is specified and they are printed; this is because such blocks don't need direct fixing by the programmer. This is kind of weird. If a programmer says --show-reachable=yes, maybe they want to see and fix the (non-suppressed) still-reachables! == derp.sup == { derp Memcheck:Leak ... fun:_ZN16ImageLoaderMachO18doModInitFunctionsERKN11ImageLoader11LinkContextE } == pointinside.c == #include <stdlib.h> char *x = NULL; int main() { x = (char*) malloc(4) + 1; return 0; } == Bug with --show-possibly-lost=no == gcc -g ~/Desktop/pointinside.c && valgrind --dsymutil=yes --error-exitcode=77 --leak-check=full --suppressions=derp.sup ./a.out ; echo $? shows the leak details; exits with code 77 gcc -g ~/Desktop/pointinside.c && valgrind --dsymutil=yes --error-exitcode=77 --leak-check=full --show-possibly-lost=no --suppressions=derp.sup ./a.out ; echo $? does not show the leak details, but still exits with code 77 (!?) == pointdirect.c == #include <stdlib.h> char *x = NULL; int main() { x = (char*) malloc(4); return 0; } == Intentional, but weird, behavior of --show-reachable=yes == gcc -g ~/Desktop/pointdirect.c && valgrind --dsymutil=yes --error-exitcode=77 --leak-check=full --suppressions=derp.sup ./a.out ; echo $? does not show the leak details; exits with code 0 gcc -g ~/Desktop/pointdirect.c && valgrind --dsymutil=yes --error-exitcode=77 --leak-check=full --show-reachable=yes --suppressions=derp.sup ./a.out ; echo $? shows the leak details, but still exits with code 0 (!?)
Yes, this does seem kind of silly now you point it out. Perhaps the simplest thing would be for the number of errors (contributed by leak checking, I mean) to be equal to the number of stacks printed by during the leak checking, regardless of the flags specified. How does that sound? It is (1) simple and (2) consistent with how the rest of Memcheck behaves (IIRC!)
Created attachment 74324 [details] Test patch Jesse, here's a test patch which I think fixes the problem. I tested it with your cases, and AFAICS it simply causes the error count to be equal to the number of leaks printed (since there are no other errors in these programs), and the error code is then as you want. Can you give it a try? All credit to Nick for the idea, since I merely did what he said at https://bugzilla.mozilla.org/show_bug.cgi?id=631811#c15
Created attachment 74327 [details] stdout after patch was applied I confirm that the patch fixes the issue on Mac OS X 10.7.5!
Erm, after the patch, the exit codes are working properly, but what about: (suppressed: 0 from 0) for the first few examples? We have suppressed one leak in derp.sup, so shouldn't it be (suppressed: 1 from 1)?
Created attachment 75525 [details] implements --show-leak-kinds, --errors-for-leak-kinds, match-leak-kinds: suppression line Currently, Valgrind does not provide a fully flexible way to indicate which leak kinds to show, which leak kinds to consider as an error, and which leak kinds to suppress. This is a.o. described in bugs 284540 and 307465. For example, the current options (--show-reachable=yes|no --show-possibly-lost=yes|no) do not allow to indicate that reachable blocks should be considered as an error. There is also no way to indicate that possibly lost blocks are not an error (whatever the value of --show-possibly-lost). Leak suppression entries are also currently catching all leak kinds. For example, if you have possibly lost blocks which you want to suppress, the suppression entry will also suppress definitely lost blocks allocated at the same stack trace, thereby hiding/suppressing real leaks. The patch attached to bug 307465 implements a flexible way to specify on the command line which leak kinds to show and which leak kinds to consider as an error. It also provides a way to have a leak suppression entry matching only a specific set of leak kinds. Here are the new command lines args: --show-leak-kinds=kind1,kind2,.. which leak kinds to show? [definite,possible] --errors-for-leak-kinds=kind1,kind2,.. which leak kinds are errors? [definite,possible] where kind is one of definite indirect possible reachable all none (note: old arguments are kept for backward compatibility). With the patch, a suppression entry now also has an optional line indicating which leak kind(s) are matched by this suppression. For example: { <insert_a_suppression_name_here> Memcheck:Leak match-leak-kinds: possible fun:malloc fun:mk fun:f fun:main } (where the optional match-leak-kinds: line can specify leak kinds similarly to the command line options). When using --gen-suppressions=yes, the match-leak-kinds: line will be produced to match the reported leak kind. This is not committed (yet), any feedback about the approach is welcome.
Even though I totally support this effort (Chromium bots had the same problems as described in the topmost comment above), personally I believe c#5 suggestion makes things too complex. Also, some combinations just don't make sense, e.g. "count reachable as errors but not show them" Maybe a simpler idea would be to introduce new "Memcheck:Reachable" and "Memcheck:PossiblyLost" suppression types. "Memcheck:Leak" should suppress any leak-like reports, "Memcheck:PossiblyLost" should suppress possibly-lost reports as well as reachable "Memcheck:Reachable" should suppress only reachable allocations. [*] (This is basically how leak suppressions work in Dr.Memory) Then, --show-possibly-lost and --show-reachable should be replaced/aliased-with-deprecation to --report-possibly-lost=yes|no and --report-reachable=yes|no which should include/exclude the respective report types in/from the ERROR SUMMARY error count. This way a) the reporting logic is clear b) it's easy to set up Valgrind to meet your expectations ("do I want possibly-lost reports? yes/no" "do I want reachables to be reported? yes/no") c) the suppression logic is clearer as well! WDYT ? [*] - those who have "Memcheck:Leak" suppressions for possibly-lost reports are already doing it wrong. OTOH, the new suppressions will be more precise -> so we don't break anything which is correct *now*. We don't make things worse either.
*** Bug 284540 has been marked as a duplicate of this bug. ***
(In reply to comment #6) A main point for chosing the approach is to ensure that the user can decide precisely what to show, what is an error and what to suppress. Up to now, V was providing a limited nr of predefined combinations, but this nr/flexibility was not good enough. => the patch provides a consistent orthogonal way to show/count/suppress precisely each leak kind. (even if as you say, some combinations are "unlikely to be used", but even then, the behaviour will be exactly what is specified by the argument). Before doing the "leak kind set" approach, solutions similar to what you propose were discussed but not preferred as a.o. they do not cover some needed cases (e.g. be able to suppress definitely lost without suppressing reachable on a suppression case by case basis) or would be dependent on "an ordered list of leak severity" and have arguments such as --report-leaks-from-level=... (the "severity ordering" was found not user friendly) or implies to remember things such as "reachable are shown as part of possibly lost". We also have to cover the case of indirectly lost (I can never remember where/when these are shown). So, preference was given to a single concept of: "use everywhere the leak kind set". Finally, Julian put a (very reasonable IMO) constraint that the default behaviour should be backward compatible, and old arguments should then be expressable in terms of the new implementation/args. > [*] - those who have "Memcheck:Leak" suppressions for possibly-lost reports > are already doing it wrong. OTOH, the new suppressions will be more precise > -> so we don't break anything which is correct *now*. We don't make things > worse either. Effectively, the current mechanism for leak suppressions is a 'catch all' at this stack trace, which is for sure wrong in many cases (catching definitely lost, while possibly lost were the only one expected). This problem is fixed with the patch. I understand that even if you do not need the full flexibility of the proposed approach, that it covers the cases you have for Chromium, i.e. * avoid showing reachable or counting reachable as errors * avoid a suppression matching a reachable * be more precise in suppression matching I hope the patch covers all needed cases. If not, more thinking/work will be needed :(.
> I hope the patch covers all needed cases. Probably, but personally I'm afraid the suggested logic is too complex. > the patch provides a consistent orthogonal way to show/count/suppress > precisely each leak kind. > ... > e.g. be able to suppress definitely lost without suppressing > reachable on a suppression case by case basis That's incorrect to allow a user to make such a preference. Any "real" leak can be treated as still reachable if there is at least one random value somewhere that looks like a pointer to this memory block, right? > Finally, Julian put a (very reasonable IMO) constraint that the default > behaviour should be backward compatible, and old arguments should then > be expressable in terms of the new implementation/args. I think my approach does this too, to some extent (see "aliasing"). Also, I think "backward compatibility" is not the most important thing to chase for if the original design was bad. Somehow you just need to make radical changes to achieve better result. Issuing a descriptive warning/error like "this option is deprecated, please use --x-y or --a-b instead" will fire only once - when a user rolls new Valgrind and she can adjust the scripts etc as needed.
(In reply to comment #9) > > I hope the patch covers all needed cases. > Probably, but personally I'm afraid the suggested logic is too complex. The logic is (supposed to be :) very simple. It is at least explainable in two small sentences: 1. there are 4 kinds of leaks : definite, indirect, possible, reachable 2. each leak kind can be individually suppressed (or not) counted as error (or not) reported with details (or not reported) All other solutions envisaged were more tricky to explain and/or were not using the same "consistency" in the suppressions or in the reporting or in the error counting (like "possibly lost" meaning possibly lost only on the command line, but meaning 'possibly lost and reachable" in the suppression) and/or are giving interactions difficult to explain (like today the interaction between --show-reachable=yes and the indirectly lost blocks suddenly being shown). and/or were backward incompatible (see below). > > the patch provides a consistent orthogonal way to show/count/suppress > > precisely each leak kind. > > ... > > e.g. be able to suppress definitely lost without suppressing > > reachable on a suppression case by case basis > That's incorrect to allow a user to make such a preference. This is needed as suppression logic runs before (and independently) of the reporting. This allows to only have the definitely lost in a suppression statistics, or only the possibly lost (without having the reachable). Otherwise, we are back to the original problem (e.g. that more reachable is confused with more possibly lost or more definitely lost in the suppression statistics). > Also, I think "backward compatibility" is not the most important thing to > chase for if the original design was bad. Somehow you just need to make > radical changes to achieve better result. > Issuing a descriptive warning/error like "this option is deprecated, please > use --x-y or --a-b instead" will fire only once - when a user rolls new > Valgrind and she can adjust the scripts etc as needed. This was discussed but is giving a difficulty : the current arguments are optional, and so no error msg can/should be given in case the args are not given. Silently changing the behaviour is not ok: the default behaviour should be backward compatible and should be expressable in terms of the new arguments. Backward compatibility for this is deemed critical, as there was a very bad experience in a previous release of changing the behaviour of often used args (change had to be rolled back). I believe what you propose makes a lot of sense (something very similar was discussed in details) but could not fullfill all the requirements/constraints.
Committed revision 13170.
I agree with Timur. For Firefox I use --show-possibly-lost=no, and I just spent about an hour trying to work out why the ERROR SUMMARY line was saying I had 111 errors when no error stacks were being printed. I eventually worked out that I needed--errors-for-leak-kinds=definite by adding diagnostic printfs for Valgrind's code. Yuk. Being able to count an error without printing it is silly.
(In reply to comment #12) > I agree with Timur. For Firefox I use --show-possibly-lost=no, and I just > spent about an hour trying to work out why the ERROR SUMMARY line was saying > I had 111 errors when no error stacks were being printed. I eventually > worked out that I needed--errors-for-leak-kinds=definite by adding > diagnostic printfs for Valgrind's code. Yuk. Being able to count an error > without printing it is silly. Yes the old arguments (--show-possibly-lost, --show-reachable) had a non orthogonal and sometimes very surprising behaviour. Counting errors without necessarily showing them might however be used in some circumstances (e.g. there are many reachable blocks, a lot to show, but you want to ensure this nr does not increase). The new args --show-leak-kinds and --errors-for-leak-kinds have an orthogonal behaviour. You can still specific to have leaks counted but not printed or printed but not counted, but that is a.o. a consequence of the deliberate choice of backward compatibilty and allow to implement the old behaviour based on the new arg technique (see comment 10). The new args are IMO at least easy to explain: 1. there are 4 kinds of leaks : definite, indirect, possible, reachable 2. each leak kind can be individually suppressed (or not) counted as error (or not) reported with details (or not reported) In summary, old args should preferrably not be used anymore. Maybe we should mark the old arguments as obsolete in the --help output and in the doc ?