More details on the observations can be found here: http://code.google.com/p/chromium/issues/detail?id=17453#c15 To reproduce, you need to build Chromium. 1) Checkout Chromium http://dev.chromium.org/developers/how-tos/get-the-code 2) Set up Valgrind/Release build flags http://dev.chromium.org/developers/how-tos/using-valgrind#TOC-Building-Chromium-tests-to-use-Valg 3) Build Chromium, "make -jX BUILDTYPE=Release ui_tests" http://code.google.com/p/chromium/wiki/LinuxBuildInstructions 4) Run ui_tests CHROME_VALGRIND=<path_to_your_valgrind_dir> ./tools/valgrind/chrome_tests.sh \ -t ui --build_dir out/Release 2>&1 | tee mylog Grep "mylog" for "didn't" and run these tests individually, e.g. CHROME_VALGRIND=... ./tools/valgrind/chrome_tests.sh -t ui --build_dir out/Release --gtest_filter=MyTest.MyTestCase to benchmark the leak-checking stuff.
In 2009, I looked at very slow leak search. There were 2 quadratic algorithms that were used in the leak search at that time. This has been solved in 3.6.0. A test program reproducing the slow leak search is available in memcheck/perf/many-loss-records.c You could perhaps modify it to match the kind of allocations that Chromium does. Alternatively, use a profiler tool on valgrind to see where the cpu is spent when searching leaks : you can either try to use valgrind to profile valgrind (might be very slow if your test is big) or use oprofile. Philippe
FYI, We've cut down the leak scan time very much by skipping the suppression matching stage for allocations we don't care about (e.g. still-reachable, possibly lost). This looks like a low-hanging fruit. This has uncovered quite a few leaks cause before the leak scan was killed by a timeout: http://code.google.com/p/chromium/issues/detail?id=101098 http://code.google.com/p/chromium/issues/detail?id=17453#c21
See patch in bug 284540.
FTR, the suppression matching is likely to improve a lot after r12824 (see bug 284540) I'll re-evaluate the performance when I get back from vacation (late August)
(In reply to comment #4) > FTR, the suppression matching is likely to improve a lot after r12824 (see > bug 284540) > > I'll re-evaluate the performance when I get back from vacation (late August) Some feedback on performance measurement difference with Valgrind 3.8.1 ? Can we close this bug based on the patches committed under bug 284540 ?
Really sorry for a very long wait - I had a lot of stuff to do after my vacation :( and cherry-picking patches turned out to be complex, so I've failed to run a "clean" head-to-head experiment so far. I've performed some benchmarking on one of the Chromium's browser_tests, results are below. Short summary: the r12824 optimization is awesome, period. :) FTR, what I ran: * I built Chromium browser_tests the Valgrind-recommended way at ~r165141, * I used --gtest_filter=NotificationsPermissionTest.TestNoUserGestureInfobar * I've removed all non-Leak suppressions (as we use non-standard :Uninitialized and :Unaddressable everywhere, see [1] ) * I've turned off XML logs in tools/valgrind/valgrind_test.py to simplify "does leak checking work at all?" verification. Also please note we use --num-callers=30 in Chromium which usually stresses the suppression matcher more. Then I've built a few trunk Valgrind revisions, copying valgrind-variant's [2] coregrind/m_replacemalloc/vg_replace_malloc.c file at r137 in order to universally hook TCMalloc and commented out the NaCL part. (Otherwise Memcheck won't see the allocations and so won't do any leak-checking) Baseline: ~1:37 - on vanilla Valgrind without TCMalloc interception and hence leak-checking. With leak-checking: r12700/2419 -> ~2:02 -- r12756 "patch that improves the speed of the leak search by up to 40% (on amd64)" in between r12760/2440 -> ~3:05 - ?!?!? -- r12789 "Improve suppression matching performance when fun or obj is a simple string" in between r12800/2451 -> ~2:56 -- r12824 "fix 284540 (optimise suppression matching)" r12830/2458 -> ~1:37 I've manually that each of the runs mentioned above had around ~12.6k reachable and ~11.2k suppressed blocks. For the reference, the current valgrind-variant (based on r12307 I believe) takes 1:53 to run the same test. We'll try to cherry-pick r12824 That said, the current performance of leak checking is fast enough to not affect the overall test runtime, so I'm happy to mark this as fixed. I'll open a new bug if I find a case when Valgrind's leak-checking is slow again. [1] https://bugs.kde.org/show_bug.cgi?id=256525#c6 [2] http://code.google.com/p/valgrind-variant/source/list
And thanks a lot! :)
(In reply to comment #6) > coregrind/m_replacemalloc/vg_replace_malloc.c file at r137 > in order to universally hook TCMalloc and commented out the NaCL part. > (Otherwise Memcheck won't see the allocations and so won't do any > leak-checking) Note that 3.8.1 has the option --soname-synonyms, allowing to replace tcmalloc from the command line e.g. --soname-synonyms=somalloc=*tcmalloc* (for all dynamilcally linked variants of tcmalloc) -soname-synonyms=somalloc=NONE (for statically linked variant). > > Baseline: ~1:37 - on vanilla Valgrind without TCMalloc interception and > hence leak-checking. > > With leak-checking: > r12700/2419 -> ~2:02 > -- r12756 "patch that improves the speed of the leak search by up to 40% > (on amd64)" in between > r12760/2440 -> ~3:05 - ?!?!? IIUC, a slowdown of about 1 minute appears between r12700 and r12760. That is really strange. Looking at r12756, I do not see how this could slowdown the leak search. So, a mystery (maybe another commit?) > That said, the current performance of leak checking is fast enough to not > affect the overall test runtime, so I'm happy to mark this as fixed. Ok => closing, thanks.
On Wed, Oct 31, 2012 at 8:16 PM, Philippe Waroquiers <philippe.waroquiers@skynet.be> wrote: > https://bugs.kde.org/show_bug.cgi?id=266390 > > Philippe Waroquiers <philippe.waroquiers@skynet.be> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|UNCONFIRMED |RESOLVED > Resolution|--- |FIXED > > --- Comment #8 from Philippe Waroquiers <philippe.waroquiers@skynet.be> --- > (In reply to comment #6) > >> coregrind/m_replacemalloc/vg_replace_malloc.c file at r137 >> in order to universally hook TCMalloc and commented out the NaCL part. >> (Otherwise Memcheck won't see the allocations and so won't do any >> leak-checking) > Note that 3.8.1 has the option --soname-synonyms, allowing to replace > tcmalloc from the command line > e.g. --soname-synonyms=somalloc=*tcmalloc* > (for all dynamilcally linked variants of tcmalloc) > -soname-synonyms=somalloc=NONE > (for statically linked variant). Yeah, wasn't sure where was it introduced at whether the handling was the same throughout r11xxx-rLatest >> Baseline: ~1:37 - on vanilla Valgrind without TCMalloc interception and >> hence leak-checking. >> >> With leak-checking: >> r12700/2419 -> ~2:02 >> -- r12756 "patch that improves the speed of the leak search by up to 40% >> (on amd64)" in between >> r12760/2440 -> ~3:05 - ?!?!? > IIUC, a slowdown of about 1 minute appears between r12700 and r12760. > That is really strange. Looking at r12756, I do not see how this could slowdown > the leak search. So, a mystery (maybe another commit?) Definitely another commit: r12734/2425 -> ~2:04 r12734/2426 -> ~2:02 r12735/2426 -> ~3:07 ------------- r12735 | sewardj | 2012-07-13 15:24:05 +0400 (Fri, 13 Jul 2012) | 3 lines Allow multiple rw and rx mappings in the ELF debuginfo reader. Fixes #296318 (patch on comment 8). (Jiří Hruška, jirka@fud.cz) ------------- If this has perf issues I'm not surprised it affects a huge Chromium build. Does anyone remember fixing this perf regression? Please note that things got faster than r12734 around r128[00..30] >> That said, the current performance of leak checking is fast enough to not >> affect the overall test runtime, so I'm happy to mark this as fixed. > Ok => closing, thanks.
If you could make the slowdown noticeable even on a smaller testcase, it might be feasible to run outer cachegrind on it to see where the bottleneck is. I've done that on the bundled perf tests and already optimized the most obvious case, although they are small and limited.