254420 – memory pool tracking broken

Bug 254420 - memory pool tracking broken

Summary: memory pool tracking broken

Status:	REPORTED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (show other bugs)
Version:	3.5.0
Platform:	Fedora RPMs Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2010-10-17 00:54 UTC by David Woodhouse
Modified:	2011-01-23 21:50 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
patch to add mempool annotation to glib/gslice (4.45 KB, patch) 2010-10-17 00:54 UTC, David Woodhouse	Details
proposed patch (5.40 KB, patch) 2011-01-22 14:43 UTC, Julian Seward	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description David Woodhouse 2010-10-17 00:54:52 UTC

Created attachment 52581 [details]
patch to add mempool annotation to glib/gslice

See the first and last lines of this snippet of valgrind output:

==21187== mempool_free(0x4f10900, 0x4f11000)
==21187==    at 0x4C6A04D: g_slice_free1 (gslice.c:903)
==21187==    by 0x4005E0: main (slice.c:10)
==21187== mempool_free(0x4f10900, 0x4f11000) freed chunk of 20 bytes
client request: code 4d430001,  addr 0x4F11000,  len 16
client request: code 1306,  addr 0x4F10900,  len 82907136
==21187== mempool_free(0x4f10900, 0x4f11000)
==21187==    at 0x4C6A04D: g_slice_free1 (gslice.c:903)
==21187==    by 0x4005F1: main (slice.c:11)
==21187== Invalid free() / delete / delete[]
==21187==    at 0x4C6A04D: g_slice_free1 (gslice.c:903)
==21187==    by 0x4005F1: main (slice.c:11)
==21187==  Address 0x4f11000 is not stack'd, malloc'd or (recently) free'd


First it reports that a chunk of 20 bytes at 0x4f11000 is freed, and then
*immediately* afterwards it tells us that the address 0x4f11000 is not recently
freed. It lies.

(It's a valid complaint about a double-free, but the block tracking is wrong).

I have similar issues when I deliberately write off the end of an allocated
block:
==21187== mempool_alloc(0x4f10900, 0x4f11000, 20)
==21187==    at 0x4C69652: g_slice_alloc (gslice.c:841)
==21187==    by 0x4005B5: main (slice.c:7)
==21187== Invalid write of size 1
==21187==    at 0x4005CD: main (slice.c:9)
==21187==  Address 0x4f11014 is not stack'd, malloc'd or (recently) free'd

Surely if I allocate a 20-byte block at 0x4f11000 and then write to the
immediately subsequent byte, I should be told that the invalid write was one
byte after the allocation I just made -- I shouldn't be told that it is "not
stack'd, malloc'd or (recently) free'd".

I have patched glib to add Valgrind annotations for its 'gslice' slab
allocator. Perhaps I've done something wrong, but I'm not sure what would cause
this kind of misbehaviour from Valgrind.

http://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg02045.html
http://www.mail-archive.com/valgrind-users@lists.sourceforge.net/msg02051.html

Comment 1 Julian Seward 2011-01-22 00:22:28 UTC

IIUC, the problem is that, in both cases, the description of the
invalid address ..

  Address 0x4f11000 is not stack'd, malloc'd or (recently) free'd

which is a polite way to say "I haven't got a clue what this is", is,
well, at best misleading, and in short completely wrong.

Similarly in the mail-archive.com links, the messages you're getting
identify the failing address as part of the superblock, which is
unhelpful: you want to relate it to the individual blocks.

I think these are both easily fixed.  Basically the
error-message-maker-upper looks through the allocated and recently
freed blocks it knows about, to see if the failing address is
in or anywhere near them.  Unfortunately it doesn't bother to look
through set of mempool blocks when doing this.  That explains why

1. if the superblocks are allocated by malloc, then it incorrectly
   identifies a superblock as containing the address, and

2. if the superblocks are allocated by mmap, it fails to identify
   anything as containing the address

I'll try and hack up a fix.  First thing to do is concoct a small
self contained test case, so it can be regtested without having to
drag around glib goop.  You don't happen to have one lying around,
do you?

Comment 2 Julian Seward 2011-01-22 14:43:52 UTC

Created attachment 56319 [details]
proposed patch

Comment 3 Julian Seward 2011-01-22 15:07:29 UTC

Dave, can you try the attached patch?  It's not perfect but it is
better than it was.  I think it's as good as I can get it without
introducing extra bookkeeping data structures.  What it can do is

* describe block overruns in terms of the blocks themselves
  both for malloc and mmap-backed pools

* describe accesses after free and double frees in terms of the
  backing block for malloc-backed pools

Unfortunately for mmap backed pools, accesses after free and double
frees are not related to anything, because Memcheck doesn't keep track
of who (which stack trace) mmap'd which pages.

Some example error messages, from an as-yet uncommitted test program,
follow.

------ out of range reads in malloc-backed pool ------

==4698== Invalid read of size 1
==4698==    at 0x400FAE: test (mempool2.c:135)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698==  Address 0x51b10a7 is 1 bytes before a block of size 10 client-defined
==4698==    at 0x400EF7: allocate (mempool2.c:108)
==4698==    by 0x400F68: test (mempool2.c:130)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698== 
==4698== Invalid read of size 1
==4698==    at 0x400FC3: test (mempool2.c:136)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698==  Address 0x51b10b2 is 0 bytes after a block of size 10 client-defined
==4698==    at 0x400EF7: allocate (mempool2.c:108)
==4698==    by 0x400F68: test (mempool2.c:130)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698== 

------ out of range reads in mmap-backed pool ------

==4698== Invalid read of size 1
==4698==    at 0x400FFC: test (mempool2.c:140)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698==  Address 0x4023007 is 1 bytes before a block of size 20 client-defined
==4698==    at 0x400EF7: allocate (mempool2.c:108)
==4698==    by 0x400F7D: test (mempool2.c:131)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698== 
==4698== Invalid read of size 1
==4698==    at 0x401013: test (mempool2.c:141)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698==  Address 0x402301c is 0 bytes after a block of size 20 client-defined
==4698==    at 0x400EF7: allocate (mempool2.c:108)
==4698==    by 0x400F7D: test (mempool2.c:131)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698== 

------ read free in malloc-backed pool ------

==4698== Illegal memory pool address
==4698==    at 0x4010A6: test (mempool2.c:145)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698==  Address 0x51b1040 is 0 bytes inside a block of size 32 alloc'd
==4698==    at 0x4C27878: malloc (vg_replace_malloc.c:236)
==4698==    by 0x400AFA: make_pool (mempool2.c:46)
==4698==    by 0x400F23: test (mempool2.c:122)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698== 

------ read free in mmap-backed pool ------

==4698== Illegal memory pool address
==4698==    at 0x40114F: test (mempool2.c:150)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698==  Address 0x4022000 is not stack'd, malloc'd or (recently) free'd
==4698== 

------ double free in malloc-backed pool ------

==4698== Illegal memory pool address
==4698==    at 0x4011F8: test (mempool2.c:155)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698==  Address 0x51b1040 is 0 bytes inside a block of size 32 alloc'd
==4698==    at 0x4C27878: malloc (vg_replace_malloc.c:236)
==4698==    by 0x400AFA: make_pool (mempool2.c:46)
==4698==    by 0x400F23: test (mempool2.c:122)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698== 

------ double free in mmap-backed pool ------

==4698== Illegal memory pool address
==4698==    at 0x40128A: test (mempool2.c:159)
==4698==    by 0x40130E: main (mempool2.c:174)
==4698==  Address 0x4022000 is not stack'd, malloc'd or (recently) free'd
==4698== 

------ done ------

Comment 4 Julian Seward 2011-01-23 21:50:06 UTC

Committed: fix r11509, regtest r11510.