Bug 257027

Summary:	memcheck address false negatives (amd64, custom malloc)
Product:	[Developer tools] valgrind	Reporter:	Tye McQueen <tye.mcqueen>
Component:	memcheck	Assignee:	Julian Seward <jseward>
Status:	RESOLVED WAITINGFORINFO
Severity:	normal	CC:	marinus.savoritias, tom
Priority:	NOR
Version First Reported In:	3.6.0
Target Milestone:	---
Platform:	Compiled Sources
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:
Attachments:	Wrappers for OpenSIPS's custom malloc functions Full log First part of full log

Description Tye McQueen 2010-11-15 22:59:23 UTC

Created attachment 53450 [details]
Wrappers for OpenSIPS's custom malloc functions

Version:           3.6.0 (using KDE 1.2) 
OS:                Linux

In trying to track down the source of a core dump in OpenSIPS on amd64 Linux, I wrote wrappers for OpenSIPS's custom malloc and was getting false positives for "Invalid read" operations.  In debugging my wrappers, I added a lot of debugging information and it shows that the MALLOCLIKE_BLOCK and MEMPOOL macros are highly unreliable in this configuration, giving tons of false negatives.  (The false positives may have all been bugs in my wrappers, I'm still investigating that.)

The log shows a custom malloc pool correctly being marked as NOACCESS: (Lines starting with "#" are comments I have added to explain what the nearby lines from the log file mean and/or to note "..." which indicates that I deleted lines in summarizing.)

# ...
==24599== prenewpool of 0x795da0+1048576
# No "Unaddressable bytes" reported before VALGRIND_MAKE_MEM_NOACCESS called
==24599== New pool: 0x795da0+0x100000
==24599== postnewpool of 0x795da0+1048576
==24599== Unaddressable byte(s) found during client check request
==24599==    at 0x4E27BD3: report_bits (in sipGrind.so)
==24599==    by 0x4E27E86: fm_malloc_init (in sipGrind.so)
==24599==    by 0x4B4E1F: init_pkg_mallocs (mem.c:67)
==24599==    by 0x42B08B: main (main.c:955)
==24599==  Address 0x795da0 is 0 bytes inside data symbol "mem_pool"

And the first 5 calls to the custom malloc correctly show the memory going from NOACCESS to accessible (except in the red zones):

==24599== premalloc of 0x79e128+120
==24599== Unaddressable byte(s) found during client check request
# ... Shows not-yet-MALLOCLIKE_BLOCK'd region starts out NO ACCESS
==24599==  Address 0x79e128 is 33672 bytes inside data symbol "mem_pool"
# ... After calling MALLOCLICK_BLOCK, only red zones are NO ACCESS:
==24599== postmalloc of 0x79e128+120, -12
==24599== Unaddressable byte(s) found during client check request
# ... Red zone before block is NO ACCESS
==24599==  Address 0x79e11c is 12 bytes before a block of size 120 alloc'd
# ...
==24599== postmalloc of 0x79e128+120
# No "Unaddressable bytes" reported for the malloc()d buffer
==24599== postmalloc of 0x79e128+120, +12
==24599== Unaddressable byte(s) found during client check request
# ... Red zone after block is NO ACCESS
==24599==  Address 0x79e1a0 is 0 bytes after a block of size 120 alloc'd
# ...
==24599== alloced: [0x79e128,0x79e1a0) 120b

But the 6th malloc-like call shows the buffer and red zones being accessible before MALLOCLIKE_BLOCK is called (when all 3 should be NO ACCESS) and after (when the red zones should be NO ACCESS even if they weren't before):

==24599== alloced: [0x79e9f8,0x79ea78) 128b
==24599== premalloc of 0x79ea90+256, -12
# FAIL: No error reading pre-block red zone before MALLOCLIKE_BLOCK
==24599== premalloc of 0x79ea90+256
# FAIL: No error reading block before MALLOCLIKE_BLOCK
==24599== premalloc of 0x79ea90+256, +12
# FAIL: No error reading post-block red zone before MALLOCLIKE_BLOCK
==24599== postmalloc of 0x79ea90+256, -12
# FAIL: No error reading pre-block red zone after MALLOCLIKE_BLOCK
==24599== postmalloc of 0x79ea90+256
# Good: No error reading block after MALLOCLIKE_BLOCK
==24599== postmalloc of 0x79ea90+256, +12
# FAIL: No error reading post-block red zone after MALLOCLIKE_BLOCK
==24599== alloced: [0x79ea90,0x79eb90) 256b
==24599== prefree of 0x79e9f8+0, -12

The rest of the run shows a mix of correct behavior and "no errors accessing" behavior around malloc-like calls.

Around free-like calls, we have a similar mix of correct behavior:

==24599== prefree of 0x79e9f8+0, -12
==24599== Unaddressable byte(s) found during client check request
# ... Good: NO ACCESS to pre-block red zone before FREELIKE_BLOCK
==24599==  Address 0x79e9ec is 12 bytes after a block of size 128 alloc'd
# ...
==24599== prefree of 0x79e9f8+0
# ... Good: Have ACCESS to block before FREELIKE_BLOCK
==24599== postfree of 0x79e9f8+0, -12
==24599== Unaddressable byte(s) found during client check request
# ... Good: NO ACCESS to pre-block red zone after FREELIKE_BLOCK
==24599==  Address 0x79e9ec is 12 bytes before a block of size 128 free'd
# ...
==24599== postfree of 0x79e9f8+0
==24599== Unaddressable byte(s) found during client check request
# ... Good: NO ACCESS to block after FREELIKE_BLOCK
==24599==  Address 0x79e9f8 is 0 bytes inside a block of size 128 free'd
# ...
==24599== freed:    0x79e9f8

and broken "no errors" behavior:

==24599== alloced: [0x79f618,0x79f61f) 7b
==24599== prefree of 0x79f618+0, -12
==24599== prefree of 0x79f618+0
==24599== postfree of 0x79f618+0, -12
==24599== postfree of 0x79f618+0
==24599== freed:    0x79f618
==24599== premalloc of 0x79f930+128, -12

I get similar results when using MEMPOOL macros instead of MALLOCLIKE_BLOCK macros (except the diagnostics are less clear).

I've attached the wrappers source code.  Note that the first round of false negatives fire before free or realloc get called so only the wrappers for fm_malloc_init() and fm_malloc() have been called.

I couldn't figure out any pattern to which calls behave correctly and which do not, though the behavior seems to be at least mostly consistent between runs.

I hope to be able to produce a stand-alone demonstration of the problem but would appreciate any pointers to additional debugging tools I could use to provide more insights into what is happening (I'm not confident a stand-alone simulation will exhibit the problem).

I'd love a way to query the address status bits directly so the logs can be much less noisy and I can, for example, show that every single byte in a range is marked as "NOACCESS" without potentially producing a verbose error for each byte.


Reproducible: Always

Comment 1 Tye McQueen 2010-11-15 23:00:38 UTC

Created attachment 53451 [details]
Full log

Comment 2 Tye McQueen 2010-11-15 23:08:02 UTC

Created attachment 53452 [details]
First part of full log

I tried to attach the full log but my browser seemed to just "spin".

Comment 3 Tye McQueen 2010-11-16 22:10:58 UTC

FYI, I have now tried adding explicit calls to mark the red zones NOACCESS and this had no impact on the problem.

void* I_WRAP_SONAME_FNNAME_ZU(NONE,fm_malloc)( void* s, unsigned long size )
{
    void* addr;
    OrigFn fn;
    VALGRIND_GET_ORIG_FN(fn);
    CALL_FN_W_WW( addr, fn, s, size );
    report_bits( "premalloc", addr, size, RZ_BYTES );
    /*
    VALGRIND_MEMPOOL_ALLOC( pool, addr, size );
    report_bits( "postpool", addr, size, RZ_BYTES );
    */
    VALGRIND_MALLOCLIKE_BLOCK( addr, size, RZ_BYTES, 0 );
    VALGRIND_MAKE_MEM_NOACCESS( addr-RZ_BYTES, RZ_BYTES ); /* ADDED */
    VALGRIND_MAKE_MEM_NOACCESS( addr+size, RZ_BYTES );     /* ADDED */
    report_bits( "postmalloc", addr, size, RZ_BYTES );
    if(  ! isRealloc  ) {
        fprintf( stderr,
            "==%d== alloced: [0x%lx,0x%lx) %ldb\n",
            getpid(), (unsigned long)addr, size+(unsigned long)addr, size );
    }
    return addr;
}

So, despite the two lines marked "ADDED", the very next line calls report_bits() which (often) demonstrates that trying to access memory in the red zones is still not flagged as an invalid access.

Comment 4 Julian Seward 2011-01-28 19:16:50 UTC

If it's broken, I'd like to fix it.  But I can't do that without
some way to reproduce the problem.  So I need a testcase of some sort.

Comment 5 Julian Seward 2011-02-09 13:58:10 UTC

Needs test case, as per comment 4.

Comment 6 fbampaloukas 2019-06-16 13:58:54 UTC

Closing as Worksforme due to inactivity for more than 15 days as per:

https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging#Policies

Fanis

Comment 7 Tom Hughes 2019-06-16 14:30:17 UTC

Please don't apply KDE policies to non-KDE packages.