Bug 201923

Summary: memcheck terminates while computing leaks?
Product: [Developer tools] valgrind Reporter: Dan Kegel <dank>
Component: memcheckAssignee: Julian Seward <jseward>
Status: REPORTED ---    
Severity: normal CC: njn
Priority: NOR    
Version First Reported In: unspecified   
Target Milestone: wanted3.6.0   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Dan Kegel 2009-07-29 21:15:16 UTC
Version:            (using Devel)
OS:                Linux
Installed from:    Compiled sources

On both Mac and Linux, sometimes memcheck seems to exit early,
after printing suppcounts but before printing leaks.

You can see this happening at e.g.
http://build.chromium.org/buildbot/waterfall/builders/Linux%20UI%202%20of%203%20(valgrind)/builds/313/steps/valgrind%20test:%20ui/logs/stdio

which says

08:00:23 memcheck_analyze.py [WARNING] valgrind didn't finish writing 3 files?!
08:00:23 memcheck_analyze.py [WARNING] Last 100 lines of valgrind.tmp/memcheck.11500 :
<?xml version="1.0"?>
...
</suppcounts>
08:00:23 memcheck_analyze.py [WARNING] Last 100 lines of valgrind.tmp/memcheck.10858 :
<?xml version="1.0"?>
...
</suppcounts>
...

This happens fairly often.  We do get useful results from other processes and runs,
so it's not a showstopper, but who knows what leak reports are being masked by this.

This has been happening for some time, not sure which revision it started with,
perhaps it's been happening ever since we started.
Comment 1 Dan Kegel 2009-08-02 16:43:57 UTC
On the Mac, this turns out to be almost a showstopper.
If I try to run Chrome's ui test cases in batches
of 30 or so as usual, valgrind *always* fails to 
output its list of memory leaks.  
If I run them in smaller batches (of size 1),
valgrind usually successfully outputs a list of leaks.

I may do that anyway, since that will let me associate
tests with valgrind errors better, but it's kind of sad
that I can't just do a single long valgrind run and
expect useful output.
Comment 2 Julian Seward 2009-08-03 02:51:41 UTC
This might be another manifestation of #192634.  <theory>The leak 
checker asks the address space manager which pages are safe to visit.
If the latter has an incorrect view of which pages are accessible,
then the leak checker will segfault.</theory>

Although the fault should be caught by scan_all_valid_memory_catcher,
so that doesn't really make any sense.

Dan, are there any more details in V's stderr output in this case,
like crash messages or assertion failures?

In mc_leakcheck.c, scan_all_valid_memory_catcher, if you change 
if (0) to if (1), do you see anything?
Comment 3 Dan Kegel 2009-08-04 19:53:39 UTC
I changed the if(0) to if (1), but haven't seen OUCH printed anywhere.

I also tried the sync patch from the other bug, and it does get
rid of the sync warnings but does not prevent the incomplete
logs.  The jury is still out on whether it prevents the assertion
failures and other crashes.  (I'm testing with the patch applied to
the July 15th sources, since I ran into an unrelated problem with current
sources.)