284540 – Memcheck shouldn't count suppressions matching still-reachable allocations

Bug 284540 - Memcheck shouldn't count suppressions matching still-reachable allocations

Summary: Memcheck shouldn't count suppressions matching still-reachable allocations

Status:	RESOLVED DUPLICATE of bug 307465

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	3.7 SVN
Platform:	Unlisted Binaries Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-10-20 10:23 UTC by Timur Iskhodzhanov
Modified:	2012-12-05 12:41 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
Proposed fix (1.38 KB, patch) 2011-10-20 10:36 UTC, Timur Iskhodzhanov	Details
Proposed fix #2 (962 bytes, patch) 2011-10-21 10:43 UTC, Timur Iskhodzhanov	Details
patch of Timur updated to a recent 3.8.0 (5.30 KB, text/plain) 2012-07-07 10:36 UTC, Philippe Waroquiers	Details
patch of Timur updated to a recent 3.8.0 (only the relevant changes) (2.44 KB, text/plain) 2012-07-07 10:38 UTC, Philippe Waroquiers	Details
small optimisation for suppr matching (2.23 KB, text/plain) 2012-07-24 21:36 UTC, Philippe Waroquiers	Details
micro_optimisation of the cmp functions in m_libcbase.c (1.69 KB, text/plain) 2012-07-24 21:37 UTC, Philippe Waroquiers	Details
implements lazy conversion of stack trace to fun/obj name during suppression matching (11.46 KB, text/plain) 2012-07-29 16:24 UTC, Philippe Waroquiers	Details
Show Obsolete (3) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Timur Iskhodzhanov 2011-10-20 10:23:40 UTC

Demo:
$ cat demo.c 
#include <malloc.h>

char *ptr;

void allocate_reachable() {
  ptr = malloc(42);
}

int main(void) {
  allocate_reachable();
}

$ cat demo.supp 
{
  Shouldn't be here!
  Memcheck:Leak
  fun:malloc
  fun:allocate_reachable
  fun:main
}

$ gcc demo.c 

$ valgrind --leak-check=full --suppressions=demo.supp -v ./a.out 2>&1 | tail
==6744==    still reachable: 0 bytes in 0 blocks
==6744==         suppressed: 42 bytes in 1 blocks
==6744== 
==6744== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
--6744-- 
--6744-- used_suppression:      1 Shouldn't be here!
--6744-- used_suppression:      2 U1004-ARM-_dl_relocate_object
--6744-- used_suppression:      2 glibc-2.5.x-on-SUSE-10.2-(PPC)-2a
==6744== 
==6744== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)

Comment 1 Timur Iskhodzhanov 2011-10-20 10:36:12 UTC

Created attachment 64731 [details]
Proposed fix

Comment 2 Timur Iskhodzhanov 2011-10-20 13:28:59 UTC

Hm, I wonder what perf improvement this gives us - assuming we have a large amount of still-reachable allocations at the end of the run on something like Chromium...

Comment 3 Timur Iskhodzhanov 2011-10-21 09:12:19 UTC

We ARE observing a speed-up with this patch applied
http://code.google.com/p/chromium/issues/detail?id=101098

Comment 4 Timur Iskhodzhanov 2011-10-21 10:43:49 UTC

Created attachment 64749 [details]
Proposed fix #2

Looking from the user PoV, IMO we shouldn't bother matching suppressions against the leak types we don't print out (e.g. still-reachable, --show-possibly-lost=no)
Attached is the new patch which accounts for that.

Test, run with "--leak-check=yes --show-possibly-lost=no"
----demo.c-----
#include <malloc.h>

char *ptr_reachable, *ptr_possible;

void allocate_reachable() { ptr_reachable = malloc(42); }

void allocate_possible()  { ptr_possible = 4 + (char*)malloc(42); }

int main(void) {
  allocate_reachable();
  allocate_possible();
}
---------------

---demo.supp---
{
  Shouldn't be here! - reachable
  Memcheck:Leak
  fun:malloc
  fun:allocate_reachable
  fun:main
}
{
  Shouldn't be here! - possible
  Memcheck:Leak
  fun:malloc
  fun:allocate_possible
  fun:main
}
---------------

Comment 5 Timur Iskhodzhanov 2011-10-27 12:09:20 UTC

Perf data here.
Looks like the comment #1 patch improved the total performance of Chromium ui_tests by 10%.

ui_tests consist of ~100 test cases, each doing the following:
1) start new Chromium process
2) trigger a couple of hooks emulating user input
3) check the results of (2)
4) stop Chromium (and perform leak check when running under Valgrind).
Each test case takes about 1-2 minutes under Valgrind.

Right before the update:
http://build.chromium.org/p/chromium.memory.fyi/builders/Linux%20Tests%20%28valgrind%29%281%29/builds/4476/
-> Chromium ui_tests take 33m42s + 30m28s

Right after the update:
http://build.chromium.org/p/chromium.memory.fyi/builders/Linux%20Tests%20%28valgrind%29%281%29/builds/4476/
-> Chromium ui_tests take 30m23s + 27m09s

Same holds for the other shards:
http://build.chromium.org/p/chromium.memory.fyi/builders/Linux%20Tests%20%28valgrind%29%282%29/builds/5050
-> 27m01s + 29m13s
vs
http://build.chromium.org/p/chromium.memory.fyi/builders/Linux%20Tests%20%28valgrind%29%282%29/builds/5051
-> 23m42s + 26m05s

Comment 6 Philippe Waroquiers 2012-07-07 10:31:52 UTC

The idea is interesting (at work, we starting once a big system, and then
run a bunch of tests with a leak search between each test). So, optimising the leak
search is a good idea.
The change however creates a situation slightly confusing:
Let's imagine a (typical ?) user that does
    --leak-check=full --show-reachable=no
(for the below, the test program definitely leaks 1000 blocks of 1 byte
and has a one block of 3 bytes still reachable).

Without the patch, the "reachable suppressed" are always counted and shown in 
the suppressed part:
==29035==    still reachable: 0 bytes in 0 blocks
==29035==         suppressed: 1,003 bytes in 1,001 blocks

With the change, the "reachable suppressed" will be counted and shown in the still reachable
when --show-reachable=no:
==29255==    still reachable: 3 bytes in 1 blocks
==29255==         suppressed: 1,000 bytes in 1,000 blocks

If the user then worries about this "still reachable" and re-runs with --show-reachable=yes,
then the below is obtained:
==30195==    still reachable: 0 bytes in 0 blocks
==30195==         suppressed: 1,003 bytes in 1,001 blocks

On a simple example, the behaviour is understandable.
In a complex big application with many suppressions and a not necessarily 100%
reproducible run, this non-intuitive behaviour is somewhat annoying.

So, despite the fact that the speed improvement for leak search would be welcome
(would provide a speedup at my work probably similar to the speedup on Chromium
or maybe more), I am not that enthusiastic to commit  the change.

I do not see how to present the above behaviour in a not confusing way to the user
and!or without introducing some more complexity in the combinations
reachable/possibly/definitely X suppressed/not suppressed.

Any idea / suggestion ?

Comment 7 Philippe Waroquiers 2012-07-07 10:36:27 UTC

Created attachment 72351 [details]
patch of Timur updated to a recent 3.8.0

Comment 8 Philippe Waroquiers 2012-07-07 10:38:36 UTC

Created attachment 72352 [details]
patch of Timur updated to a recent 3.8.0 (only the relevant changes)

Comment 9 Timur Iskhodzhanov 2012-07-09 12:20:01 UTC

> I do not see how to present the above behaviour in a not confusing way
> to the user and!or without introducing some more complexity in the 
> combinations reachable/possibly/definitely X suppressed/not suppressed.
Personally I think splitting "suppressed" into reachable/possiblydefinitely is the way to go. It's not adding complexity but actually making the already-complex thing understandable in more than just trivial cases.

Comment 10 Philippe Waroquiers 2012-07-09 21:53:00 UTC

(In reply to comment #9)
> > I do not see how to present the above behaviour in a not confusing way
> > to the user and!or without introducing some more complexity in the 
> > combinations reachable/possibly/definitely X suppressed/not suppressed.
> Personally I think splitting "suppressed" into reachable/possiblydefinitely
> is the way to go. It's not adding complexity but actually making the
> already-complex thing understandable in more than just trivial cases.

Even if we would plit the suppressed per reachable/possibly/definitely,
the behaviour would not be straightforward : when the matching between a loss
record and a suppression is not done (due to show-reachable=no), where would we
put this info ? Now, it is in "not suppressed" (without suppr matching, we cannot decide).
If we split the suppressed in 3, then we would put it in "reachable not suppressed".

To make it (somewhat) understandable, we would need to have a third state, so as
to have:
    suppressed, not suppressed, undetermined
and have up to 12 different combinations
  reachable/possible/indirectly/definitely X the new third states.

A possible solution would be to have:
    --show-reachable=no|yes|suppress
where suppress would mean to automaticailly consider a reachable loss record as part of the suppression.

However, this is still some additional complexity, for a gain of about 10% on Chromium test.
Even if the --show-reachable=suppress is not that ugly, it looks to me preferrable to just
work on optimising the leak search and/or the matching between a lost record and the suppressions.

Do you have some statistics about chromium ?
In particular: how many reachable loss records, how many suppression entries in your suppression file,
and how many Mb of memory is scanned ?

you are using

Comment 11 Philippe Waroquiers 2012-07-09 22:43:25 UTC

(In reply to comment #10)
> Do you have some statistics about chromium ?
> In particular: how many reachable loss records, how many suppression entries
> in your suppression file,
> and how many Mb of memory is scanned ?
With about 50_000 loss records, and 1024 suppressions entries, 
on my slow pc, I get about 24 seconds for suppr matching.
With 512 suppressions, about 12 seconds.
With 128 suppressions, about 3 seconds.

So, performance seems to be linear in nr of suppressions
(and I guess is also linear in terms of loss records).
Of course, a lot of loss records multiplied by a lot of suppressions will take some time.

From reading the suppr matching code, I suppose most of the time is spent
in the function supp_pattEQinp.
This function will be called at least once for each suppression kind matching
the loss record. It will be called as many times as there is something matching.
supp_pattEQinp will have to translate an IP address into a function name,
and that might be costly (e.g. accessing symtab).

So, suppr matching might be improved two ways:
* avoid scanning suppression kinds that have no chance to match
    (this is probably not a big win, that is basically just integer comparison).
* avoid repetitively translate an ip address into a soname and/or fnname:
   still do a lazy evaluation, but keep the result once computed for a loss record,
   to re-use this for the next suppression matching.

Comment 12 Timur Iskhodzhanov 2012-07-10 08:09:30 UTC

> A possible solution would be to have:
>    --show-reachable=no|yes|suppress
> where suppress would mean to automaticailly consider
> a reachable loss record as part of the suppression.
I like this one.
Probably better names would be:
--show-reachable=no|yes|stats

> However, this is still some additional complexity, for a gain of about 10% on Chromium test.
I'd rather say it speeds up leak checking by Xx where X is large :)
On Chromium, the leak checking was so slow that the multi-process stuff was often thinking child processes are hung.
Increasing the timeouts didn't help enough or it was just slowing down everything a lot, especially on tests with a lot of processes.

> Do you have some statistics about chromium ?
> In particular: how many reachable loss records, how many suppression entries in your suppression file,
I don't remember exactly anymore. Roughly, A LOT.

> and how many Mb of memory is scanned ?
About 1-2GB of address space I suppose.
You can look at hacking the code at http://code.google.com/p/valgrind-variant/wiki/LeakCheckingOn32bits as one more benchmark.

> * avoid repetitively translate an ip address into a soname and/or fnname:
>   still do a lazy evaluation, but keep the result once computed for a loss record,
>   to re-use this for the next suppression matching.
This sounds like a good improvement to be done in the Valgrind framework (in the lookup_pc() API) if it's not done yet.

Comment 13 Philippe Waroquiers 2012-07-10 22:48:00 UTC

(In reply to comment #12)

> > However, this is still some additional complexity, for a gain of about 10% on Chromium test.
> I'd rather say it speeds up leak checking by Xx where X is large :)
??? not too sure I understand how much we really gain in total.
Checking in the chromium src website, the memcheck suppression file has about
350 suppressions. If chromium has 

> On Chromium, the leak checking was so slow that the multi-process stuff was
> often thinking child processes are hung.
> Increasing the timeouts didn't help enough or it was just slowing down
> everything a lot, especially on tests with a lot of processes.
> 
> > Do you have some statistics about chromium ?
> > In particular: how many reachable loss records, how many suppression entries in your suppression file,
> I don't remember exactly anymore. Roughly, A LOT.
> 
> > and how many Mb of memory is scanned ?
> About 1-2GB of address space I suppose.
On an Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz,
Valgrind scans 1 Gb of memory, searching for 2 millions malloc-ed blocks
in about 16 seconds (based on the test leak program you have referenced).

Adding a file with 1024 suppression does almost not change this time
(this is explained by the fact that your test program creates very few
different stack traces, and so there is very little matches to do between a loss record
and the 1024 suppressions).

If I use the same 1024 suppressions with the test program perf/many-loss-records, this has
to do about 50_000 loss records X 1024 suppressions.
This takes about 8 seconds.

I suspect these 8 seconds might become a lot more if a big executable containing
many shared libs and/or many debug symbols are used (chromium?).
Also, I believe that suppressions starting with ... or having one or more ...
might often cause all of the ip of the stacktrace to be translated to a function name or object name. 

If the above is true, then the below  optimisation should help significantly.

> > * avoid repetitively translate an ip address into a soname and/or fnname:
> >   still do a lazy evaluation, but keep the result once computed for a loss record,
> >   to re-use this for the next suppression matching.
> This sounds like a good improvement to be done in the Valgrind framework (in
> the lookup_pc() API) if it's not done yet.
I am more thinking about doing a specific optimised structure for the error/suppression matching code,
as this looks simpler than introducing a result cache in the debug info/symtab handling
(and I guess that the most frequent usage of ip to object/fn translation is for error -> suppr matching).
The other usages should be a lot less frequent (e.g. when really printing an error).

Comment 14 Timur Iskhodzhanov 2012-07-11 07:39:11 UTC

> On an Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz,
> Valgrind scans 1 Gb of memory, searching for 2 millions malloc-ed blocks
> in about 16 seconds (based on the test leak program you have referenced).
That's a lot compared to native execution of the same test in just 1 seconds
which theoretically should take 20-40 seconds to run under Valgrind (assuming the 20x-40x slowdown).
Adding extra 10-20 seconds *for each process* makes things up to 1.5x-2x slower overall...

> Also, I believe that suppressions starting with ... or having one or more ...
> might often cause all of the ip of the stacktrace to be translated
> to a function name or object name.
This sounds very much possible

>> This sounds like a good improvement to be done in the Valgrind framework
>> (in the lookup_pc() API) if it's not done yet.
> I am more thinking about doing a specific optimised structure for the error/suppression matching code,
> as this looks simpler than introducing a result cache in the debug info/symtab handling
> (and I guess that the most frequent usage of ip to object/fn translation is for error -> suppr matching).
> The other usages should be a lot less frequent (e.g. when really printing an error).
In ThreadSanitizer we do lookup a PC for every basic block being executed.
We ended up having our own caching for that.
Not sure about the other tools though, I agree they may only need to lookup only when printing out a report.

Comment 15 Timur Iskhodzhanov 2012-07-11 07:40:34 UTC

> Adding extra 10-20 seconds *for each process* makes things up to 1.5x-2x slower overall...
And again, the native shutdown timeout is hundreds of milliseconds; it's OK to multiply it (assuming 20x Valgrind slowdown) but it might not be OK to increase it by adding a 16 seconds constant :)

Comment 16 Philippe Waroquiers 2012-07-11 20:31:21 UTC

(In reply to comment #14)
> > On an Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz,
> > Valgrind scans 1 Gb of memory, searching for 2 millions malloc-ed blocks
> > in about 16 seconds (based on the test leak program you have referenced).
> That's a lot compared to native execution of the same test in just 1 seconds
> which theoretically should take 20-40 seconds to run under Valgrind
> (assuming the 20x-40x slowdown).
> Adding extra 10-20 seconds *for each process* makes things up to 1.5x-2x
> slower overall...
The 16 seconds above is the time needed to scan the 1 Gb memory.
This bug report (and its patch) are not improving the scan time.

Basically, the leak search has the following steps:
1. build a list of all allocated blocks (this is very fast)
2. scan the memory (this is the 16 seconds above)
    and mark all allocated blocks that are still reachable
3. regroup all blocks in loss records, i.e. same status (reachable, definitely lost, ..)
   and same stack trace.
4. for each loss record, see if it matches a suppression file.
    output the non matching loss records.

Step 1 to 3 are taking about 16 seconds for 1 Gb of memory and 200000 blocks.
These 16 seconds looks to be linear in terms of memory size.
It should be n log n in terms of nr of blocks (but I could not detect the 
n log n impact so it is very "flat" n log n it looks).

Step 4 is very small for your test program (as there is only a few loss records).
When step 4 is working on about 50_000 loss records and 1024 suppressions,
it takes 8 seconds.

This bug speaks about optimising the step 4.
What I do not understand yet is how Chromium tests can imply a huge
time for step 4. Having 50_000 loss records is really a lot, as you need 50_000
different stack traces for malloc calls.
(having recursive functions helps to obtain that).
50_000 * 1024 suppressions is not that horrible (only 8 seconds compared
to the 16 seconds for leak scan).

At work, a big application with many recursive algorithms and a big pile
of code and several hundreds thousands of blocks allocated 
takes between 8 to 20 seconds for total leak search). 


So, it is still not clear to me what exactly has to be optimised.
I suspect the speed of the suppression matching code is badly influenced
by the kind of suppressions (... or not ... ?) and by the nr of shared libs
and/or debug info, which might explain the 8 seconds for the small
test case and (huge) timeouts for the chromium tests.
(but how much is "huge" ? do you have an idea or can you re-measure ?)

Before (prematurely?) optimising a part of the leak search, it would be nice
to understand more where the problem is coming from : is it better to
work on reducing the 16 seconds ? (maybe not trivial)
or to optimise the suppression matching code ?
or to "enrich/complexify" the --show-reachable by adding a third value (suppress or stats) ?

Comment 17 Philippe Waroquiers 2012-07-24 21:36:06 UTC

Created attachment 72738 [details]
small optimisation for suppr matching

Performance of leak scanning has been improved in revision 12756.

The attached patch  (somewhat) improves the
suppression matching (following a suggestion from Julian).
On a test case with 1024 suppressions and 50_000 loss records, the patch
improves the total leak search time by 10 to 15%.

Comment 18 Philippe Waroquiers 2012-07-24 21:37:30 UTC

Created attachment 72739 [details]
micro_optimisation of the cmp functions in m_libcbase.c

Looking at the asm code, the code is slightly smaller and contains 6 jumps instead of 9.

Comment 19 Julian Seward 2012-07-26 08:17:39 UTC

(In reply to comment #17)
> Created attachment 72738 [details]
> small optimisation for suppr matching

Seems fine, except for one logic problem, easy to fix: a pattern
string can also have '?' as a wildcard that matches exactly one
character.  With this case a string containing a '?' but no '*' will
be classified as a simple_string and so will only match against a
string with literally '?' in that position ==> a change in semantics.

Comment 20 Julian Seward 2012-07-26 08:38:37 UTC

(In reply to comment #18)
> Created attachment 72739 [details]
> micro_optimisation of the cmp functions in m_libcbase.c

Ok iff you are 110% sure these don't change the semantics :-)

Comment 21 Philippe Waroquiers 2012-07-29 16:24:03 UTC

Created attachment 72814 [details]
implements lazy conversion of stack trace to fun/obj name during suppression matching

The attached patch optimises suppr matching by avoiding
redoing the debug info search for each suppression match with a stack trace.
Running many-loss-records with a 1024 suppression file
(all suppressing leaks, i.e. pattern matching is needed)
gives the following (also compared with 3.7.0):
        amd64            x86
3.7.0   9.253            34.0
trunk   9.635            28.5
patch  3.252            10.6

So, basically, a factor 3 improvement on this (rather artificial) test.

Would be nice to test with a real setup (e.g. chromium tests).

Comment 22 Timur Iskhodzhanov 2012-07-31 12:44:38 UTC

3x sounds cool!
I'll give it a try when I'm back from vacation (late August)

Comment 23 Philippe Waroquiers 2012-08-03 23:15:12 UTC

suppr matching improvement committed in rev 12824.

Comment 24 Timur Iskhodzhanov 2012-08-04 07:58:36 UTC

Wait, why have you changed the subject and closed the bug?
The original idea was that the output logic is confusing (please re-read comments #0, #1, #4). Your fix doesn't address that, right?
So I'm reopening.

The original optimization (comment #5) was a nice side-effect and it's good to see that your optimizations made things much faster even without my patch, but it deserves a separate bug (266390?).

Also, just recalled:
> What I do not understand yet is how Chromium tests
> can imply a huge time for step 4.
Do you know that we use --num-callers=30 and many suppressions like this:
{
  blah
  Memcheck:Leak
  ...
  fun:someFunction1
  fun:someFunction2
}
?

I'm pretty sure it makes things much slower than with --num-callers=10.

Comment 25 Philippe Waroquiers 2012-08-04 09:05:27 UTC

(In reply to comment #24)
> Wait, why have you changed the subject and closed the bug?
> The original idea was that the output logic is confusing (please re-read
> comments #0, #1, #4). Your fix doesn't address that, right?
> So I'm reopening.
Ok, misunderstood the bug (I thought the main problem was the efficiency
linked with useless matching of suppression matching for reachable).
I will re-read more carefully :).

Is it ok to close bug 266390 based on patches
   https://bugs.kde.org/attachment.cgi?id=72738
   https://bugs.kde.org/attachment.cgi?id=72739
   https://bugs.kde.org/attachment.cgi?id=72814
?

> The original optimization (comment #5) was a nice side-effect and it's good
> to see that your optimizations made things much faster even without my
> patch, but it deserves a separate bug (266390?).
> 
> Also, just recalled:
> > What I do not understand yet is how Chromium tests
> > can imply a huge time for step 4.
> Do you know that we use --num-callers=30 and many suppressions like this:
> {
>   blah
>   Memcheck:Leak
>   ...
>   fun:someFunction1
>   fun:someFunction2
> }
> ?
> 
> I'm pretty sure it makes things much slower than with --num-callers=10.
I suspect that effectively such ... will imply to always translate the full
stack traces.
The just committed patch should improve this significantly if you have
a lot of such suppression patterns : an IP address will be translated
at most once.

Comment 26 Timur Iskhodzhanov 2012-08-04 09:09:59 UTC

> Is it ok to close bug 266390
Let me check it first. I've added a comment there.

> The just committed patch should improve this significantly if you have
> a lot of such suppression patterns : an IP address will be translated
> at most once.
Yes, sounds right.

Comment 27 Philippe Waroquiers 2012-08-04 21:50:59 UTC

(In reply to comment #24)
> The original idea was that the output logic is confusing (please re-read
> comments #0, #1, #4). 

I have re-read, and I do not see what is confusing.
A leak check produces loss records.
These loss records can be of different kinds:
   reachable
   possibly-lost
   definitely-lost
Some of these loss records can be suppressed.

The "show-possibly-lost=yes (or no)) only indicates to show (or not) the non suppressed
possibly lost. (similar for show-reachable).
It does not influence the fact that a suppression is or not suppressing the loss record.
If it would influence, then we are back to the (real IMO) confusion described in
comments 6 and 10.

I think the current behaviour is simple:
a suppression entry always suppress what it has to suppress.
Then the user can decide to show (or not) the non suppressed parts.

It is not clear to me what we can gain by introducing new user visible categories.

Maybe I am missing the point ?
Can you tell exactly what is confusing in the current behavior (i.e. what information
the user is missing, or what wrong info is being given to the user ?)

Comment 28 Timur Iskhodzhanov 2012-08-05 08:36:45 UTC

> The "show-possibly-lost=yes (or no)) only indicates to show (or not) the
> non suppressed possibly lost. (similar for show-reachable).
That's the confusion by itself too.

In Chromium we don't care at all about still reachable stuff as there are too many of them by design. (*)
We have a lot of suppressions due to the "commit, oh Valgrind found something, suppress, fix someday later" development process.
We often rely on the "used suppressions" section to judge whether a particular leak/bug is still present. (**)
Due to (*) and (**), we don't want the still reachable allocations matching the suppressions to show up in the stats.

--
Re: comment 6 + comment 10:
1) Splitting definitely/possibly/reachable stats is a good thing as it helps to prioritize fixing stuff
2) Then, matching the suppressions and showing the number of suppressed allocations the way you want is OK if
  a) it doesn't re-introduce the Chromium problem described above and
  b) if it's OK performance-wise after r12824
  We don't want the # of suppressed still-reachable/possibly-lost allocations to show up in the used_suppressions section though.
3) If either (2a) or (2b) is false, we can probably just skip the uninteresting stats sections - under a flag if you want.

Comment 29 Philippe Waroquiers 2012-08-05 11:00:40 UTC

(In reply to comment #28)
> > The "show-possibly-lost=yes (or no)) only indicates to show (or not) the
> > non suppressed possibly lost. (similar for show-reachable).
> That's the confusion by itself too.
> 
> In Chromium we don't care at all about still reachable stuff as there are
> too many of them by design. (*)
> We have a lot of suppressions due to the "commit, oh Valgrind found
> something, suppress, fix someday later" development process.
> We often rely on the "used suppressions" section to judge whether a
> particular leak/bug is still present. (**)
> Due to (*) and (**), we don't want the still reachable allocations matching
> the suppressions to show up in the stats.

To be sure I understand:
The problem is that a suppression entry can match definitely-lost loss 
records, possibly-lost loss records and still-reachable loss records.
Due to this, once you have added a suppression to handle
a known definitely-lost, it will also "collect" the reachable loss records,
which means you cannot track precisely anymore the amount of
definitely lost.

Is that a correct understanding ?

(is correct, the above looks to me to be a real problem: effectively, when you
suppress a leak, it is based on the assumption that this leak is
leaking a small amount of memory and so it must be possible
to differentiate the amount of "suppressed definitely-lost (real) leaks" from the amount 
of "suppressed reachable (not real) leaks").

>   b) if it's OK performance-wise after r12824
I committed the changes in r12824 as the perf. improvement
for the artificial test case was confirmed on a big application at work.
I am curious to see what it gives on Chromium.
It will not be easy to isolate specifically the perf. improvement
brought by the suppression matching, as the leak search itself
has also be improved in 3.8.0

Comment 30 Timur Iskhodzhanov 2012-10-31 14:38:05 UTC

> To be sure I understand:
> The problem is that a suppression entry can match definitely-lost loss
> records, possibly-lost loss records and still-reachable loss records.
> Due to this, once you have added a suppression to handle
> a known definitely-lost, it will also "collect" the reachable loss records,
> which means you cannot track precisely anymore the amount of
> definitely lost.
>
> Is that a correct understanding ?
>
> (is correct, the above looks to me to be a real problem: effectively, when you
> suppress a leak, it is based on the assumption that this leak is
> leaking a small amount of memory and so it must be possible
> to differentiate the amount of "suppressed definitely-lost (real) leaks" from
> the amount
> of "suppressed reachable (not real) leaks").
Yes, exactly!

>>   b) if it's OK performance-wise after r12824
> I committed the changes in r12824 as the perf. improvement
> for the artificial test case was confirmed on a big application at work.
> I am curious to see what it gives on Chromium.
> It will not be easy to isolate specifically the perf. improvement
> brought by the suppression matching, as the leak search itself
> has also be improved in 3.8.0
It gives a lot, here are some numbers:
https://bugs.kde.org/show_bug.cgi?id=266390#c6

Comment 31 Philippe Waroquiers 2012-11-28 22:55:14 UTC

https://bugs.kde.org/show_bug.cgi?id=307465#c5 should provide
the flexibility needed to specifically show and/or count as error and/or suppress
precisely what is desired.

Comment 32 Timur Iskhodzhanov 2012-12-05 12:41:19 UTC

Too many words here and some are off-topic;
there's a more focused report #307465 with pretty much the same topic as c#29.

*** This bug has been marked as a duplicate of bug 307465 ***