Bug 162229 - VALGRIND_DO_LEAK_CHECK emits false positive
Summary: VALGRIND_DO_LEAK_CHECK emits false positive
Status: RESOLVED WORKSFORME
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: unspecified
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: wanted3.6.0
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-17 19:50 UTC by Sergei Trofimovich
Modified: 2024-02-09 06:25 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
FP sample1 (definitely lost) (835 bytes, text/x-c++src)
2008-05-17 19:51 UTC, Sergei Trofimovich
Details
FP sample2 (possibly lost, std::string) (158 bytes, text/x-c++src)
2008-05-17 19:53 UTC, Sergei Trofimovich
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sergei Trofimovich 2008-05-17 19:50:27 UTC
Version:           current SVN: valgrind -r8099 (using KDE 3.5.8)
Installed from:    Gentoo Packages
Compiler:          gcc-4.3.0 
OS:                Linux

This problem discussed a little in valgrind-developers@ :
Date: Mon, 24 Mar 2008 21:44:39 +0200
From: Sergei Trofimovich <slyfox@inbox.ru>
To: "Valgrind Developers" <valgrind-developers@lists.sourceforge.net>
Subject: [Valgrind-developers] VALGRIND_DO_LEAK_CHECK false positives

in short:
I'd like to write my program such way:
int main()
{
  glo * g;
  init_glo_data(g);
  for(;;)
  {
    do_stuff(g);
    VALGRIND_DO_LEAK_CHECK;
    VALGRIND_COUNT_LEAKS(leaked, dubious, reachable, suppressed);
    if (leaked)
    {
        report_detected_leak();
        break;
    }
  }
  free_glo_data(g);
}

I have large C++ program, which tracks it's global data as GNU STL's std::map<std::string, std::string>

This scheme fails.
Here is an attached example (emulates std::string _Rep creation and handling)

$valgrind ./vg_ml_test 
==13416== searching for pointers to 2 not-freed blocks.
==13416== checked 130,988 bytes.
==13416== 
==13416== 1,024 bytes in 1 blocks are possibly lost in loss record 1 of 2
==13416==    at 0x402235E: operator new[](unsigned) (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==13416==    by 0x8048503: f() (main.cc:7)
==13416==    by 0x8048533: test() (main.cc:31)
==13416==    by 0x80485F3: main (main.cc:40)
==13416== 
==13416== 
==13416== 49,380 bytes in 1 blocks are definitely lost in loss record 2 of 2
                                       ^^^^^^^^^^
---------------------------------------^^^^^^^^^^------------------------

==13416==    at 0x402235E: operator new[](unsigned) (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==13416==    by 0x8048512: f() (main.cc:12)
==13416==    by 0x8048533: test() (main.cc:31)
==13416==    by 0x80485F3: main (main.cc:40)
==13416== 
==13416== LEAK SUMMARY:
==13416==    definitely lost: 49,380 bytes in 1 blocks.
==13416==      possibly lost: 1,024 bytes in 1 blocks.
==13416==    still reachable: 0 bytes in 0 blocks.
==13416==         suppressed: 0 bytes in 0 blocks.
==13416== All heap blocks were freed -- no leaks are possible.
==13416== All heap blocks were freed -- no leaks are possible.
                                        ^^^^^^^^^^^^^^^^^^^^^^
----------------------------------------^^^^^^^^^^^^^^^^^^^^^^
==13416== 
==13416== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7 from 1)
==13416== malloc/free: in use at exit: 0 bytes in 0 blocks.
==13416== malloc/free: 2 allocs, 2 frees, 50,404 bytes allocated.
==13416== For counts of detected errors, rerun with: -v
==13416== All heap blocks were freed -- no leaks are possible.

I don't know how to track it efficiently in memcheck.
As I understand memcheck tracks pointers to exact memory chunk starts and validates referenced memory(which?) whether it have pointer to memory start.

so if we have 2 mem chunks:
   c1 = a1; c2 = a2; c3 = a3;
and refs:
   b1 = a1; b2 = a2 + off1; b3 = &b2[off2] = a3 + off3; (as in example)
c1 - ok
c2 - possibly leaked
c3 - definitely leaked

This logic could be changed to:
                memcheck tracks pointers to memory chunk starts and validates referenced memory(which?) whether it have pointer to any byte of allocated memory. Will MEMPOOL_ALLOC ops break?

c1 - ok
c2 - ok (because b2 points to _internals_, not start of c2, IOW because b2 >= c2 && b2 < c2 + c2_size)
c3 - ok (same)

From user's point memcheck could have --thorough-check option.
Comment 1 Sergei Trofimovich 2008-05-17 19:51:34 UTC
Created attachment 24813 [details]
FP sample1 (definitely lost)
Comment 2 Sergei Trofimovich 2008-05-17 19:53:58 UTC
Created attachment 24814 [details]
FP sample2 (possibly lost, std::string)
Comment 3 Nicholas Nethercote 2009-06-26 04:22:05 UTC
Can you try this again with a recent SVN trunk?  The leak checker was rewritten recently, it's possible that this will work now... it should work, eg. see memcheck/tests/leak.h which has some macros that work just like this, they are used in several files in memcheck/tests.
Comment 4 Sergei Trofimovich 2009-06-26 15:52:02 UTC
(In reply to comment #3)
> Can you try this again with a recent SVN trunk?  The leak checker was rewritten
> recently, it's possible that this will work now... it should work, eg. see
> memcheck/tests/leak.h which has some macros that work just like this, they are
> used in several files in memcheck/tests.

Those problems raise after each std::string allocation. Sorry,
I can't figure out how to use those macros in this case.

Please, have a look at attached samples.
I hope they are easy to understand. One of them is 4 LOC.
Anoters - it's just an emulation how it works in stdc++ templates.

valgrind-3.5.0.SVN -r10380

main1:
==17328== 1,024 bytes in 1 blocks are possibly lost in loss record 1 of 2
==17328==    at 0x4C2266C: operator new[](unsigned long) (vg_replace_malloc.c:262)
==17328==    by 0x4006E6: f() (in /tmp/z/main)
==17328==    by 0x40071E: test() (in /tmp/z/main)
==17328==    by 0x4007EA: main (in /tmp/z/main)
==17328== 
==17328== 
==17328== 49,380 bytes in 1 blocks are definitely lost in loss record 2 of 2
==17328==    at 0x4C2266C: operator new[](unsigned long) (vg_replace_malloc.c:262)
==17328==    by 0x4006F4: f() (in /tmp/z/main)
==17328==    by 0x40071E: test() (in /tmp/z/main)
==17328==    by 0x4007EA: main (in /tmp/z/main)
==17328== 
==17328== LEAK SUMMARY:
==17328==    definitely lost: 49,380 bytes in 1 blocks.
==17328==    indirectly lost: 0 bytes in 0 blocks.
==17328==      possibly lost: 1,024 bytes in 1 blocks.
==17328==    still reachable: 0 bytes in 0 blocks.
==17328==         suppressed: 0 bytes in 0 blocks.
==17328== All heap blocks were freed -- no leaks are possible.
==17328== All heap blocks were freed -- no leaks are possible.
==17328== 
==17328== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 1)
==17328== malloc/free: in use at exit: 0 bytes in 0 blocks.
==17328== malloc/free: 2 allocs, 2 frees, 50,404 bytes allocated.
==17328== For counts of detected errors, rerun with: -v
==17328== All heap blocks were freed -- no leaks are possible.

main2:
==17417== 30 bytes in 1 blocks are possibly lost in loss record 1 of 1
==17417==    at 0x4C22B6E: operator new(unsigned long) (vg_replace_malloc.c:218)
==17417==    by 0x4ECC420: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib64/gcc/x86_64-pc-linux-gnu/4.3.2/libstdc++.so.6.0.10)
==17417==    by 0x4ECCDE4: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) (in /usr/lib64/gcc/x86_64-pc-linux-gnu/4.3.2/libstdc++.so.6.0.10)
==17417==    by 0x4ECCF22: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib64/gcc/x86_64-pc-linux-gnu/4.3.2/libstdc++.so.6.0.10)
==17417==    by 0x4007F1: main (in /tmp/z/main2)
==17417== 
==17417== LEAK SUMMARY:
==17417==    definitely lost: 0 bytes in 0 blocks.
==17417==    indirectly lost: 0 bytes in 0 blocks.
==17417==      possibly lost: 30 bytes in 1 blocks.
==17417==    still reachable: 0 bytes in 0 blocks.
==17417==         suppressed: 0 bytes in 0 blocks.
==17417== 
==17417== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 1)
==17417== malloc/free: in use at exit: 0 bytes in 0 blocks.
==17417== malloc/free: 1 allocs, 1 frees, 30 bytes allocated.
==17417== For counts of detected errors, rerun with: -v
==17417== All heap blocks were freed -- no leaks are possible.
Comment 5 Paul Floyd 2024-02-08 12:13:18 UTC
I haven't checked the code, but I believe that the cause of this is that the search for pointers isn't exhaustive and byte by byte. Instead it is done on word boundaries.

With the latest code

git describe
VALGRIND_3_22_0-128-ga836d21

I get

==11932== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==11932== Using Valgrind-3.23.0.GIT and LibVEX; rerun with -h for copyright info
==11932== Command: ./bug162229_1
==11932== 
==11932== 1,024 bytes in 1 blocks are possibly lost in loss record 1 of 3
==11932==    at 0x4030637: operator new[](unsigned long) (vg_replace_malloc.c:727)
==11932==    by 0x401377: f() (bug162229_1.cpp:10)
==11932==    by 0x40140B: test() (bug162229_1.cpp:34)
==11932==    by 0x4014DD: main (bug162229_1.cpp:43)
==11932== 
==11932== 49,380 bytes in 1 blocks are definitely lost in loss record 2 of 3
==11932==    at 0x4030637: operator new[](unsigned long) (vg_replace_malloc.c:727)
==11932==    by 0x401385: f() (bug162229_1.cpp:15)
==11932==    by 0x40140B: test() (bug162229_1.cpp:34)
==11932==    by 0x4014DD: main (bug162229_1.cpp:43)

So, still a definite leak.

If I change the code to be

static size_t offset = 16;

char * f(void)
{
    //std::string allocates representation and returns
    //fancy pointer to its' guts
    char * p = new char[1024];

    // then stores some data in its internals
    // same approach as below
    // here 1 is definitely lost
    int * q = new int[12345];
    *(int**)(p + offset) = q;

    // here p is "possibly lost"
    return p + 1; // return offset
}


then I get

==13219== 1,024 bytes in 1 blocks are possibly lost in loss record 1 of 3
==13219==    at 0x4030637: operator new[](unsigned long) (vg_replace_malloc.c:727)
==13219==    by 0x401377: f() (bug162229_1.cpp:10)
==13219==    by 0x40140B: test() (bug162229_1.cpp:34)
==13219==    by 0x4014DD: main (bug162229_1.cpp:43)
==13219== 
==13219== 49,380 bytes in 1 blocks are possibly lost in loss record 2 of 3
==13219==    at 0x4030637: operator new[](unsigned long) (vg_replace_malloc.c:727)
==13219==    by 0x401385: f() (bug162229_1.cpp:15)
==13219==    by 0x40140B: test() (bug162229_1.cpp:34)
==13219==    by 0x4014DD: main (bug162229_1.cpp:43)

As a quick hack, I modified lc_scan_memory to do an exhaustive scan - I had to change the increment from adding a word size to just using ++, and also turn off a couple oof asserts.

However I don't think that we want to do that, even under an option. Firstly it's going to be 4 or 8 times slower, depending on the word size. Secondly I'm not sure that this won't cause misaligned access errors on some platforms.
Comment 6 Tom Hughes 2024-02-08 12:39:46 UTC
No I don't think it's anything to do with word boundaries, rather I suspect it's actually everything working as expected.

What "possibly lost" means is that valgrind didn't find a pointer to the start of the allocated block but did find a pointer to an address inside the block.

That is of course exactly what is happening here - inside std::string it is storing a pointer to the actual string data but that isn't the start of the block it allocated because it has stashed some metadata before the string data. I assume it's doing something like storing the length before the start of the string data?
Comment 7 Paul Floyd 2024-02-08 13:05:02 UTC
(In reply to Tom Hughes from comment #6)
> No I don't think it's anything to do with word boundaries, rather I suspect
> it's actually everything working as expected.

For the attachment "FP sample1" it definitely is a question of the word boundaries.

I haven't tried reproducing the issue with "FP sample2". Presumably it only happens with old pre-C++11 libstdc++ CoW std::string. Does that have a non-word aligned pointer squirelled away?
Comment 8 Paul Floyd 2024-02-08 21:20:21 UTC
On FreeBSD 14 if I compile the C++ example with

paulf> g++ -g -o main main.cc -m32 -std=c++03 -L /usr/local/lib32/gcc12 -Wl,-rpath,/usr/local/lib32/gcc12 -D_GLIBCXX_USE_CXX11_ABI=0

then I get

==4005== LEAK SUMMARY:
==4005==    definitely lost: 0 bytes in 0 blocks
==4005==    indirectly lost: 0 bytes in 0 blocks
==4005==      possibly lost: 0 bytes in 0 blocks
==4005==    still reachable: 18,962 bytes in 2 blocks
==4005==                       of which reachable via heuristic:
==4005==                         stdstring          : 18 bytes in 1 blocks
==4005==         suppressed: 0 bytes in 0 blocks
==4005== Reachable blocks (those to which a pointer was found) are not shown.

which looks spot on to me.
Comment 9 Paul Floyd 2024-02-09 06:25:28 UTC
Same output for FP sample2 on Debian. Closing this as I don't see any problem - I might check that the manual says that leaks are only searched for on word boundaries.