266390 – Leak checking takes a few minutes when running Chromium ui_tests

Bug 266390 - Leak checking takes a few minutes when running Chromium ui_tests

Summary: Leak checking takes a few minutes when running Chromium ui_tests

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	3.7 SVN
Platform:	Unlisted Binaries Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-02-15 18:15 UTC by Timur Iskhodzhanov
Modified:	2012-11-01 17:56 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Timur Iskhodzhanov 2011-02-15 18:15:29 UTC

More details on the observations can be found here:
http://code.google.com/p/chromium/issues/detail?id=17453#c15

To reproduce, you need to build Chromium.
1) Checkout Chromium
  http://dev.chromium.org/developers/how-tos/get-the-code
2) Set up Valgrind/Release build flags
  http://dev.chromium.org/developers/how-tos/using-valgrind#TOC-Building-Chromium-tests-to-use-Valg
3) Build Chromium, "make -jX BUILDTYPE=Release ui_tests"
  http://code.google.com/p/chromium/wiki/LinuxBuildInstructions
4) Run ui_tests
   CHROME_VALGRIND=<path_to_your_valgrind_dir> ./tools/valgrind/chrome_tests.sh \
     -t ui --build_dir out/Release 2>&1 | tee mylog

Grep "mylog" for "didn't" and run these tests individually, e.g.
CHROME_VALGRIND=... ./tools/valgrind/chrome_tests.sh -t ui --build_dir out/Release --gtest_filter=MyTest.MyTestCase
to benchmark the leak-checking stuff.

Comment 1 Philippe Waroquiers 2011-02-27 15:05:11 UTC

In 2009, I looked at very slow leak search.
There were 2 quadratic algorithms that were used in the leak search
at that time.
This has been solved in 3.6.0.

A test program reproducing the slow leak search is available in 
memcheck/perf/many-loss-records.c

You could perhaps modify it to match the kind of allocations that
Chromium does.

Alternatively, use a profiler tool on valgrind to see where the cpu is spent
when searching leaks : you can either try to use valgrind to profile
valgrind (might be very slow if your test is big) or use oprofile.

Philippe

Comment 2 Timur Iskhodzhanov 2011-10-21 10:47:02 UTC

FYI,
We've cut down the leak scan time very much
by skipping the suppression matching stage for allocations we don't care about (e.g. still-reachable, possibly lost). This looks like a low-hanging fruit.

This has uncovered quite a few leaks cause before the leak scan was killed by a timeout:
http://code.google.com/p/chromium/issues/detail?id=101098
http://code.google.com/p/chromium/issues/detail?id=17453#c21

Comment 3 Philippe Waroquiers 2012-07-29 16:24:32 UTC

See patch in bug 284540.

Comment 4 Timur Iskhodzhanov 2012-08-04 09:08:51 UTC

FTR, the suppression matching is likely to improve a lot after r12824 (see bug 284540)

I'll re-evaluate the performance when I get back from vacation (late August)

Comment 5 Philippe Waroquiers 2012-10-23 20:48:48 UTC

(In reply to comment #4)
> FTR, the suppression matching is likely to improve a lot after r12824 (see
> bug 284540)
> 
> I'll re-evaluate the performance when I get back from vacation (late August)

Some feedback on performance measurement difference with Valgrind 3.8.1 ?
Can we close this bug based on the patches committed under bug 284540 ?

Comment 6 Timur Iskhodzhanov 2012-10-31 14:35:39 UTC

Really sorry for a very long wait - I had a lot of stuff to do after my vacation :(
and cherry-picking patches turned out to be complex, so I've failed to run a "clean" head-to-head experiment so far.

I've performed some benchmarking on one of the Chromium's browser_tests, results are below.
Short summary: the r12824 optimization is awesome, period.
:)

FTR, what I ran:
* I built Chromium browser_tests the Valgrind-recommended way at ~r165141,
* I used --gtest_filter=NotificationsPermissionTest.TestNoUserGestureInfobar
* I've removed all non-Leak suppressions (as we use non-standard :Uninitialized and :Unaddressable everywhere, see [1] )
* I've turned off XML logs in tools/valgrind/valgrind_test.py to simplify "does leak checking work at all?" verification.
Also please note we use --num-callers=30 in Chromium which usually stresses the suppression matcher more.

Then I've built a few trunk Valgrind revisions, copying valgrind-variant's [2]
  coregrind/m_replacemalloc/vg_replace_malloc.c file at r137
  in order to universally hook TCMalloc and commented out the NaCL part.
(Otherwise Memcheck won't see the allocations and so won't do any leak-checking)

Baseline: ~1:37 - on vanilla Valgrind without TCMalloc interception and hence leak-checking.

With leak-checking:
r12700/2419 -> ~2:02
--  r12756 "patch that improves the speed of the leak search by up to 40% (on amd64)" in between
r12760/2440 -> ~3:05  - ?!?!?
-- r12789 "Improve suppression matching performance when fun or obj is a simple string" in between
r12800/2451 -> ~2:56
-- r12824 "fix 284540 (optimise suppression matching)"
r12830/2458 -> ~1:37
I've manually that each of the runs mentioned above had around ~12.6k reachable and ~11.2k suppressed blocks.

For the reference, the current valgrind-variant (based on r12307 I believe) takes 1:53 to run the same test.
We'll try to cherry-pick r12824

That said, the current performance of leak checking is fast enough to not affect the overall test runtime, so I'm happy to mark this as fixed.
I'll open a new bug if I find a case when Valgrind's leak-checking is slow again.

[1] https://bugs.kde.org/show_bug.cgi?id=256525#c6
[2] http://code.google.com/p/valgrind-variant/source/list

Comment 7 Timur Iskhodzhanov 2012-10-31 14:37:53 UTC

And thanks a lot! :)

Comment 8 Philippe Waroquiers 2012-10-31 16:16:09 UTC

(In reply to comment #6)

>   coregrind/m_replacemalloc/vg_replace_malloc.c file at r137
>   in order to universally hook TCMalloc and commented out the NaCL part.
> (Otherwise Memcheck won't see the allocations and so won't do any
> leak-checking)
Note that 3.8.1 has the option --soname-synonyms, allowing to replace
tcmalloc from the command line
  e.g.  --soname-synonyms=somalloc=*tcmalloc*
                 (for all dynamilcally linked variants of tcmalloc)
          -soname-synonyms=somalloc=NONE
                (for statically linked variant).

> 
> Baseline: ~1:37 - on vanilla Valgrind without TCMalloc interception and
> hence leak-checking.
> 
> With leak-checking:
> r12700/2419 -> ~2:02
> --  r12756 "patch that improves the speed of the leak search by up to 40%
> (on amd64)" in between
> r12760/2440 -> ~3:05  - ?!?!?
IIUC, a slowdown of about 1 minute appears between r12700 and r12760.
That is really strange. Looking at r12756, I do not see how this could slowdown
the leak search. So, a mystery (maybe another commit?)

> That said, the current performance of leak checking is fast enough to not
> affect the overall test runtime, so I'm happy to mark this as fixed.
Ok => closing, thanks.

Comment 9 Timur Iskhodzhanov 2012-11-01 10:27:38 UTC

On Wed, Oct 31, 2012 at 8:16 PM, Philippe Waroquiers
<philippe.waroquiers@skynet.be> wrote:
> https://bugs.kde.org/show_bug.cgi?id=266390
>
> Philippe Waroquiers <philippe.waroquiers@skynet.be> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|UNCONFIRMED                 |RESOLVED
>          Resolution|---                         |FIXED
>
> --- Comment #8 from Philippe Waroquiers <philippe.waroquiers@skynet.be> ---
> (In reply to comment #6)
>
>>   coregrind/m_replacemalloc/vg_replace_malloc.c file at r137
>>   in order to universally hook TCMalloc and commented out the NaCL part.
>> (Otherwise Memcheck won't see the allocations and so won't do any
>> leak-checking)
> Note that 3.8.1 has the option --soname-synonyms, allowing to replace
> tcmalloc from the command line
>   e.g.  --soname-synonyms=somalloc=*tcmalloc*
>                  (for all dynamilcally linked variants of tcmalloc)
>           -soname-synonyms=somalloc=NONE
>                 (for statically linked variant).
Yeah, wasn't sure where was it introduced at whether the handling was
the same throughout r11xxx-rLatest

>> Baseline: ~1:37 - on vanilla Valgrind without TCMalloc interception and
>> hence leak-checking.
>>
>> With leak-checking:
>> r12700/2419 -> ~2:02
>> --  r12756 "patch that improves the speed of the leak search by up to 40%
>> (on amd64)" in between
>> r12760/2440 -> ~3:05  - ?!?!?
> IIUC, a slowdown of about 1 minute appears between r12700 and r12760.
> That is really strange. Looking at r12756, I do not see how this could slowdown
> the leak search. So, a mystery (maybe another commit?)
Definitely another commit:
r12734/2425 -> ~2:04
r12734/2426 -> ~2:02
r12735/2426 -> ~3:07

-------------
r12735 | sewardj | 2012-07-13 15:24:05 +0400 (Fri, 13 Jul 2012) | 3 lines
Allow multiple rw and rx mappings in the ELF debuginfo reader.
Fixes #296318 (patch on comment 8).  (Jiří Hruška, jirka@fud.cz)
-------------
If this has perf issues I'm not surprised it affects a huge Chromium build.

Does anyone remember fixing this perf regression?
Please note that things got faster than r12734 around r128[00..30]

>> That said, the current performance of leak checking is fast enough to not
>> affect the overall test runtime, so I'm happy to mark this as fixed.
> Ok => closing, thanks.

Comment 10 Jiří Hruška 2012-11-01 17:56:49 UTC

If you could make the slowdown noticeable even on a smaller testcase,
it might be feasible to run outer cachegrind on it to see where the
bottleneck is. I've done that on the bundled perf tests and already
optimized the most obvious case, although they are small and limited.