Bug 206600 - Leak checker fails to upgrade indirect blocks when their parent becomes reachable
Summary: Leak checker fails to upgrade indirect blocks when their parent becomes reach...
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.5 SVN
Platform: Unlisted Binaries Linux
: NOR normal
Target Milestone: blocking3.5.1
Assignee: Nick Nethercote
URL:
Keywords:
: 252100 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-09-07 10:23 UTC by Tom Hughes
Modified: 2010-10-04 22:55 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Test case (632 bytes, text/plain)
2009-09-07 10:23 UTC, Tom Hughes
Details
Patch to rescan blocks on upgrade (3.08 KB, patch)
2009-09-07 10:24 UTC, Tom Hughes
Details
Patch to rescan blocks on upgrade (3.37 KB, patch)
2009-09-07 11:13 UTC, Tom Hughes
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Hughes 2009-09-07 10:23:02 UTC
Created attachment 36760 [details]
Test case

If the leak checker marks a block as "possibly reachable" and then upgrades that to "definitely reachable" when it finds an additional pointer then any indirect blocks which were found in between those two events are not upgraded.

It's not entirely clear whether this is a regression in 3.5.0 or not - certainly I've never noticed this before so I'm inclined to think the old algorithm avoided this somehow, or at least made it much less likely to be triggered.

A test case is attached.
Comment 1 Tom Hughes 2009-09-07 10:24:05 UTC
Created attachment 36762 [details]
Patch to rescan blocks on upgrade

This patch makes the leak checker rescan blocks when it changes their status from Possible to Reachable.
Comment 2 Nicholas Nethercote 2009-09-07 10:33:01 UTC
I'll look at the patch in detail later, but on a cursory scan I see that sizeof(LC_Extra) has increased, which is unfortunate as there can be a lot of them.  We could reduce the number of bits for indirect_szB by another 1, hopefully 512MB would be enough on 32-bit platforms?  Hmm.
Comment 3 Tom Hughes 2009-09-07 10:56:44 UTC
Ah sorry - I hadn't realised indirect_szB was part of the bitfield so I didn't think I was increasing the size...

The problem was that when you upgrade a block it might already be on the stack so without that flag you wind up pushing it again - in my original attempt I actually wound up overflowing the stack because I was pushing the same block multiple times and the stack is sized to the total number of blocks.
Comment 4 Tom Hughes 2009-09-07 11:13:56 UTC
Created attachment 36763 [details]
Patch to rescan blocks on upgrade

Updated patch that reduces width of indirect_szB to avoid making LC_Extra bigger and also avoids an infinite loop by not repushing an already reachable block if we see it a second time.
Comment 5 Julian Seward 2009-09-07 13:00:19 UTC
(In reply to comment #2)
> I'll look at the patch in detail later, but on a cursory scan I see that
> sizeof(LC_Extra) has increased, which is unfortunate as there can be a lot of
> them.  We could reduce the number of bits for indirect_szB by another 1,
> hopefully 512MB would be enough on 32-bit platforms?  Hmm.

Is it really so bad for LC_Extra to be 2 words instead of 1 ?  IIUC
the number of them is equal to the number of unfreed blocks at exit,
which is going to be fairly small for mostly-well-behaved programs, in
the range of a few hundred or a few thousand.  I don't think I've ever
seen a program which has (eg) a million blocks unfreed at exit, and
even in that case on a 32-bit platform, that'd only use 8MB in total
for 2 words.  Any program exiting with millions of blocks unfreed is
probably hosed w.r.t. space management anyway.

I'd prefer to not impose a 512MB block size limit on the leak checker
on 32-bit processes.  It could come back and bite us as some tedious
boundary case in the future, in which case we'd have to revisit this
yet again.  Instead let's take a bit of a space hit in the worst case.
How about this:

   ULong size:60
   UInt  state:2
   UInt  pending:1
   UInt  spare:1

That gives no size limit on 32-bit or 64-bit and is only 8 bytes
on both platforms.
Comment 6 Tom Hughes 2009-09-07 13:08:32 UTC
I believe the 512Mb limit is on the amount of indirect memory reachable from a given block, not on the total amount of allocated memory.

Even if that value overflows I think all that happens is that the statistics displayed for the amount of memory in various classes will be inaccurate.
Comment 7 Tom Hughes 2010-09-23 09:33:45 UTC
*** Bug 252100 has been marked as a duplicate of this bug. ***
Comment 8 Alexander Drozdov 2010-09-23 09:37:27 UTC
Please see my patch from Bug 252100.
Comment 9 Tom Hughes 2010-09-23 10:14:27 UTC
Your patch is a strict subset of my one which is attached to this bug already.
Comment 10 Alexander Drozdov 2010-09-23 11:26:02 UTC
My patch is just incorrect because it is possible to add the same block to the stack twice and so lc_markstack array can be overflowed. Thank you!
Comment 11 Philippe Waroquiers 2010-09-30 20:38:51 UTC
Patch was used at work on a big application, doing a lot of malloc/free.
It works properly and has reduced significantly the nr of false possibly
leaked.

Note that the forest of false possibly leaked 
was hiding some trees of real leaks :).
Comment 12 Julian Seward 2010-10-04 22:30:06 UTC
Ok, let's get this landed.  Tom, can you do it?
Comment 13 Tom Hughes 2010-10-04 22:55:21 UTC
Committed as r11395.