Bug 307465 - --show-possibly-lost=no should bring down the error count / exit code
Summary: --show-possibly-lost=no should bring down the error count / exit code
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.9.0.SVN
Platform: unspecified macOS
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 284540 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-09-26 22:29 UTC by Jesse Ruderman
Modified: 2014-01-11 13:14 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Test patch (826 bytes, patch)
2012-10-03 18:35 UTC, Julian Seward
Details
stdout after patch was applied (4.53 KB, text/plain)
2012-10-03 20:56 UTC, Gary Kwong [:gkw]
Details
implements --show-leak-kinds, --errors-for-leak-kinds, match-leak-kinds: suppression line (41.26 KB, text/plain)
2012-11-28 22:54 UTC, Philippe Waroquiers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jesse Ruderman 2012-09-26 22:29:52 UTC
> http://valgrind.org/docs/manual/mc-manual.html

> Because there are different kinds of leaks with different severities, an interesting question is this: which leaks should be counted as true "errors" and which should not? The answer to this question affects the numbers printed in the ERROR SUMMARY line, and also the effect of the --error-exitcode option. Memcheck uses the following criteria:

> *   First, a leak is only counted as a true "error" if --leak-check=full is specified. In other words, an unprinted leak is not considered a true "error". If this were not the case, it would be possible to get a high error count but not have any errors printed, which would be confusing.

Based on this reasoning, I think --show-possibly-lost should bring down the error count (on which the exit code is based).

It's preventing Mozilla's Valgrind tests from being green:
https://bugzilla.mozilla.org/show_bug.cgi?id=631811#c12
https://bugzilla.mozilla.org/show_bug.cgi?id=793509#c3

> *   After that, definitely lost and possibly lost blocks are counted as true "errors".

The current behavior is documented, sorta.  It's also sorta self-contradictory.

> Indirectly lost and still reachable blocks are not counted as true "errors", even if --show-reachable=yes is specified and they are printed; this is because such blocks don't need direct fixing by the programmer.

This is kind of weird.  If a programmer says --show-reachable=yes, maybe they want to see and fix the (non-suppressed) still-reachables!


== derp.sup ==

{
    derp
    Memcheck:Leak
    ...
    fun:_ZN16ImageLoaderMachO18doModInitFunctionsERKN11ImageLoader11LinkContextE
}

== pointinside.c ==

#include <stdlib.h>
char *x = NULL;
int main()
{
    x = (char*) malloc(4) + 1;
    return 0;
}

== Bug with --show-possibly-lost=no ==

gcc -g ~/Desktop/pointinside.c  && valgrind --dsymutil=yes --error-exitcode=77  --leak-check=full --suppressions=derp.sup  ./a.out ; echo $?

shows the leak details; exits with code 77

gcc -g ~/Desktop/pointinside.c  && valgrind --dsymutil=yes --error-exitcode=77  --leak-check=full --show-possibly-lost=no --suppressions=derp.sup  ./a.out ; echo $?

does not show the leak details, but still exits with code 77 (!?)

== pointdirect.c ==

#include <stdlib.h>

char *x = NULL;

int main()
{
    x = (char*) malloc(4);
    return 0;
}

== Intentional, but weird, behavior of --show-reachable=yes ==

gcc -g ~/Desktop/pointdirect.c  && valgrind --dsymutil=yes --error-exitcode=77  --leak-check=full  --suppressions=derp.sup  ./a.out ; echo $?

does not show the leak details; exits with code 0

gcc -g ~/Desktop/pointdirect.c  && valgrind --dsymutil=yes --error-exitcode=77  --leak-check=full --show-reachable=yes  --suppressions=derp.sup  ./a.out ; echo $?

shows the leak details, but still exits with code 0 (!?)
Comment 1 Julian Seward 2012-10-03 14:15:26 UTC
Yes, this does seem kind of silly now you point it out.  Perhaps the simplest
thing would be for the number of errors (contributed by leak checking, I mean)
to be equal to the number of stacks printed by during the leak checking, 
regardless of the flags specified.  How does that sound?  It is (1) simple and 
(2) consistent with how the rest of Memcheck behaves (IIRC!)
Comment 2 Julian Seward 2012-10-03 18:35:16 UTC
Created attachment 74324 [details]
Test patch

Jesse, here's a test patch which I think fixes the problem.  I tested
it with your cases, and AFAICS it simply causes the error count to be
equal to the number of leaks printed (since there are no other errors
in these programs), and the error code is then as you want.

Can you give it a try?

All credit to Nick for the idea, since I merely did what he said at
https://bugzilla.mozilla.org/show_bug.cgi?id=631811#c15
Comment 3 Gary Kwong [:gkw] 2012-10-03 20:56:17 UTC
Created attachment 74327 [details]
stdout after patch was applied

I confirm that the patch fixes the issue on Mac OS X 10.7.5!
Comment 4 Gary Kwong [:gkw] 2012-10-03 21:02:15 UTC
Erm, after the patch, the exit codes are working properly, but what about:

(suppressed: 0 from 0)

for the first few examples? We have suppressed one leak in derp.sup, so shouldn't it be (suppressed: 1 from 1)?
Comment 5 Philippe Waroquiers 2012-11-28 22:54:08 UTC
Created attachment 75525 [details]
implements --show-leak-kinds, --errors-for-leak-kinds, match-leak-kinds: suppression line

Currently, Valgrind does not provide a fully flexible
way to indicate which leak kinds to show,
which leak kinds to consider as an error,
and which leak kinds to suppress.
This is a.o. described in bugs 284540 and 307465.

For example, the current options
(--show-reachable=yes|no --show-possibly-lost=yes|no)
do not allow to indicate that reachable blocks should
be considered as an error.

There is also no way to indicate that possibly lost
blocks are not an error (whatever the value of --show-possibly-lost).

Leak suppression entries are also currently catching all leak kinds.
For example, if you have possibly lost blocks which you want
to suppress, the suppression entry will also suppress
definitely lost blocks allocated at the same stack trace,
thereby hiding/suppressing real leaks.


The patch attached to bug 307465 implements a flexible way to specify
on the command line which leak kinds to show and which
leak kinds to consider as an error.
It also provides a way to have a leak suppression entry
matching only a specific set of leak kinds.

Here are the new command lines args:
    --show-leak-kinds=kind1,kind2,.. which leak kinds to show?
                                            [definite,possible]
    --errors-for-leak-kinds=kind1,kind2,..  which leak kinds are errors?
                                            [definite,possible]
        where kind is one of definite indirect possible reachable all none

(note: old arguments are kept for backward compatibility).

With the patch, a suppression entry now also has an optional line
indicating which leak kind(s) are matched by this suppression.
For example:
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: possible
   fun:malloc
   fun:mk
   fun:f
   fun:main
}


(where the optional match-leak-kinds: line can specify
leak kinds similarly to the command line options).

When using --gen-suppressions=yes, the match-leak-kinds: line
will be produced to match the reported leak kind.

This is not committed (yet), any feedback about the approach is welcome.
Comment 6 Timur Iskhodzhanov 2012-12-05 12:39:18 UTC
Even though I totally support this effort (Chromium bots had the same problems as described in the topmost comment above), personally I believe c#5 suggestion makes things too complex.
Also, some combinations just don't make sense, e.g. "count reachable as errors but not show them"

Maybe a simpler idea would be to introduce new "Memcheck:Reachable" and "Memcheck:PossiblyLost" suppression types.
"Memcheck:Leak" should suppress any leak-like reports,
"Memcheck:PossiblyLost" should suppress possibly-lost reports as well as reachable
"Memcheck:Reachable" should suppress only reachable allocations. [*]
(This is basically how leak suppressions work in Dr.Memory)

Then,
  --show-possibly-lost and --show-reachable
should be replaced/aliased-with-deprecation to
  --report-possibly-lost=yes|no and --report-reachable=yes|no
which should include/exclude the respective report types in/from the ERROR SUMMARY error count.
This way
a) the reporting logic is clear
b) it's easy to set up Valgrind to meet your expectations
    ("do I want possibly-lost reports? yes/no" "do I want reachables to be reported? yes/no")
c) the suppression logic is clearer as well!

WDYT ?

[*] - those who have "Memcheck:Leak" suppressions for possibly-lost reports are already doing it wrong. OTOH, the new suppressions will be more precise -> so we don't break anything which is correct *now*. We don't make things worse either.
Comment 7 Timur Iskhodzhanov 2012-12-05 12:41:19 UTC
*** Bug 284540 has been marked as a duplicate of this bug. ***
Comment 8 Philippe Waroquiers 2012-12-05 21:54:18 UTC
(In reply to comment #6)

A main point for chosing the approach is to ensure that the user can decide precisely
what to show, what is an error and what to suppress.
Up to now, V was providing a limited nr of predefined combinations, but this nr/flexibility
was not good enough.
=> the patch provides a consistent orthogonal way to show/count/suppress precisely
each leak kind.
(even if as you say, some combinations are "unlikely to be used", but even then,
the behaviour will be exactly what is specified by the argument).

Before doing the "leak kind set" approach, solutions similar to what you propose
were discussed but not preferred as a.o. they do not cover some
needed cases (e.g. be able to suppress definitely lost without suppressing reachable
on a suppression case by case basis)
or would be dependent on "an ordered list of leak severity" and have
arguments such as --report-leaks-from-level=...
(the "severity ordering" was found not user friendly)
or implies to remember things such as "reachable are shown as part of possibly lost".
We also have to cover the case of indirectly lost (I can never remember where/when
these are shown). 
So, preference was given to a single concept of: "use everywhere the leak kind set".

Finally, Julian put a (very reasonable IMO) constraint that the default
behaviour should be backward compatible, and old arguments should then
be expressable in terms of the new implementation/args.


> [*] - those who have "Memcheck:Leak" suppressions for possibly-lost reports
> are already doing it wrong. OTOH, the new suppressions will be more precise
> -> so we don't break anything which is correct *now*. We don't make things
> worse either.
Effectively, the current mechanism for leak suppressions is a 'catch all' at this stack trace,
which is for sure wrong in many cases (catching definitely lost, while possibly lost
were the only one expected).
This problem is fixed with the patch.

I understand that even if you do not need the full flexibility of the proposed approach,
that it covers the cases you have for Chromium, i.e.
   * avoid showing reachable or counting reachable as errors
   * avoid a suppression matching a reachable
   * be more precise in suppression matching

I hope the patch covers all needed cases. If not, more thinking/work will be needed :(.
Comment 9 Timur Iskhodzhanov 2012-12-06 12:18:03 UTC
> I hope the patch covers all needed cases.
Probably, but personally I'm afraid the suggested logic is too complex.

> the patch provides a consistent orthogonal way to show/count/suppress
> precisely each leak kind.
> ...
> e.g. be able to suppress definitely lost without suppressing
> reachable on a suppression case by case basis
That's incorrect to allow a user to make such a preference.
Any "real" leak can be treated as still reachable if there is at least one random value somewhere that looks like a pointer to this memory block, right?

> Finally, Julian put a (very reasonable IMO) constraint that the default
> behaviour should be backward compatible, and old arguments should then
> be expressable in terms of the new implementation/args.
I think my approach does this too, to some extent (see "aliasing").

Also, I think "backward compatibility" is not the most important thing to chase for if the original design was bad. Somehow you just need to make radical changes to achieve better result.
Issuing a descriptive warning/error like "this option is deprecated, please use --x-y or --a-b instead" will fire only once - when a user rolls new Valgrind and she can adjust the scripts etc as needed.
Comment 10 Philippe Waroquiers 2012-12-06 22:23:12 UTC
(In reply to comment #9)
> > I hope the patch covers all needed cases.
> Probably, but personally I'm afraid the suggested logic is too complex.
The logic is (supposed to be :) very simple.
It is at least explainable in two small sentences:
1. there are 4 kinds of leaks : definite, indirect, possible, reachable
2. each leak kind can be individually
           suppressed (or not)
           counted as error (or not)
           reported with  details (or not reported)

All other solutions envisaged were
more tricky to explain

and/or were not using the same "consistency" in the suppressions or in the reporting
or in the error counting (like "possibly lost" meaning possibly lost only
on the command line, but meaning 'possibly lost and reachable" in the suppression)

and/or are giving interactions difficult to explain
(like today the interaction between --show-reachable=yes and the indirectly
lost blocks suddenly being shown).

and/or were backward incompatible (see below).

> > the patch provides a consistent orthogonal way to show/count/suppress
> > precisely each leak kind.
> > ...
> > e.g. be able to suppress definitely lost without suppressing
> > reachable on a suppression case by case basis
> That's incorrect to allow a user to make such a preference.
This is needed as suppression logic runs before (and independently) of the reporting.
This allows to only have the definitely lost in a suppression statistics,
or only the possibly lost (without having the reachable). Otherwise, we
are back to the original  problem (e.g. that more reachable is confused with more
possibly lost or more definitely lost in the suppression statistics).


> Also, I think "backward compatibility" is not the most important thing to
> chase for if the original design was bad. Somehow you just need to make
> radical changes to achieve better result.
> Issuing a descriptive warning/error like "this option is deprecated, please
> use --x-y or --a-b instead" will fire only once - when a user rolls new
> Valgrind and she can adjust the scripts etc as needed.
This was discussed but is giving a difficulty : the current arguments are optional,
and so no error msg can/should be given in case the args are not given.
Silently changing the behaviour is  not ok:
 the default behaviour should be backward compatible and should be
expressable in terms of the new arguments.
Backward compatibility for this is deemed critical, as there was a very bad
experience in a previous release of changing the behaviour of often used args
(change had to be rolled back).

I believe what you propose makes a lot of sense (something very similar was
discussed in details) but could not fullfill all the requirements/constraints.
Comment 11 Philippe Waroquiers 2012-12-08 17:55:02 UTC
Committed revision 13170.
Comment 12 Nicholas Nethercote 2014-01-10 00:33:44 UTC
I agree with Timur. For Firefox I use --show-possibly-lost=no, and I just spent about an hour trying to work out why the ERROR SUMMARY line was saying I had 111 errors when no error stacks were being printed. I eventually worked out that I needed--errors-for-leak-kinds=definite by adding diagnostic printfs for Valgrind's code. Yuk. Being able to count an error without printing it is silly.
Comment 13 Philippe Waroquiers 2014-01-11 13:14:23 UTC
(In reply to comment #12)
> I agree with Timur. For Firefox I use --show-possibly-lost=no, and I just
> spent about an hour trying to work out why the ERROR SUMMARY line was saying
> I had 111 errors when no error stacks were being printed. I eventually
> worked out that I needed--errors-for-leak-kinds=definite by adding
> diagnostic printfs for Valgrind's code. Yuk. Being able to count an error
> without printing it is silly.

Yes the old arguments (--show-possibly-lost, --show-reachable) had a non orthogonal
and sometimes very surprising behaviour.
Counting errors without necessarily showing them might however be used in some
circumstances (e.g. there are many reachable blocks, a lot to show, but you want to
ensure this nr does not increase).

The new args  --show-leak-kinds and --errors-for-leak-kinds have an orthogonal behaviour.
You can still specific to have leaks counted but not printed or printed but not counted,
but that is a.o. a consequence of the deliberate choice of backward compatibilty
and allow to implement the old behaviour based on the new arg technique  (see comment 10).


The new args are IMO at least easy to explain:
1. there are 4 kinds of leaks : definite, indirect, possible, reachable
2. each leak kind can be individually
     suppressed (or not) 
     counted as error (or not)
     reported with details (or not reported) 

In summary, old args should preferrably not be used anymore.
Maybe we should mark the old arguments as obsolete in the --help output and in the doc ?