Bug 345307

Summary:	Warning about "still reachable" memory when using libstdc++ from gcc 5
Product:	[Developer tools] valgrind	Reporter:	Dmitry Shachnev <mitya57>
Component:	memcheck	Assignee:	Ivo Raisr <ivosh>
Status:	RESOLVED FIXED
Severity:	normal	CC:	aleksandar.rikalo, anatol.pomozov, invalid, ivosh, mark, mips32r2, octoploid, philippe.waroquiers
Priority:	NOR
Version:	3.12 SVN
Target Milestone:	---
Platform:	Debian unstable
OS:	Linux
Latest Commit:		Version Fixed In:
Sentry Crash Report:
Attachments:	patch for Solaris patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ libstdc++.supp patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ (v2) patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ (v3) leak_cpp_interior additional .exps leak_cpp_interior additional .exps

Description Dmitry Shachnev 2015-03-18 12:24:05 UTC

When checking a C++ program which was compiled using gcc 5 (and thus using libstdc++ from gcc 5), I get this warning:

18,944 bytes in 1 blocks are still reachable in loss record 1 of 1
   at 0x40291CC: malloc (vg_replace_malloc.c:296)
   by 0x40D630A: pool (eh_alloc.cc:117)
   by 0x40D630A: __static_initialization_and_destruction_0 (eh_alloc.cc:244)
   by 0x40D630A: _GLOBAL__sub_I_eh_alloc.cc (eh_alloc.cc:307)
   by 0x400E86D: call_init.part.0 (dl-init.c:78)
   by 0x400E963: call_init (dl-init.c:36)
   by 0x400E963: _dl_init (dl-init.c:126)
   by 0x4000D3E: ??? (in /lib/i386-linux-gnu/ld-2.19.so)

According to GCC developers [1], this is intentional and they are not going to fix it, and Valgrind should not complain about it.
Can you please whitelist/suppress this?

[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65434

Reproducible: Always

Steps to Reproduce:
$ cat test.cc
int main() {
  return 0;
}
$ g++-5 -g test.cc
$ valgrind ./a.out

Actual Results:  
HEAP SUMMARY:
    in use at exit: 18,944 bytes in 1 blocks
  total heap usage: 1 allocs, 0 frees, 18,944 bytes allocated

Expected Results:  
HEAP SUMMARY:
    in use at exit: 0 bytes in 0 blocks
  total heap usage: 0 allocs, 0 frees, 0 bytes allocated

Comment 1 Nicholas Fraser 2015-06-15 02:10:34 UTC

I can confirm this bug on latest Arch packages. My OSS project relies on Valgrind with --leak-check=full for its unit tests, which is currently failing due to this bug.

nick@nick-desktop ~/test
$ cat test.cpp
int main() {
    return 0;
}

nick@nick-desktop ~/test
$ g++ test.cpp

nick@nick-desktop ~/test
$ valgrind --leak-check=full --errors-for-leak-kinds=all --show-leak-kinds=all --error-exitcode=1 ./a.out 
==2477== Memcheck, a memory error detector
==2477== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==2477== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==2477== Command: ./a.out
==2477== 
==2477== 
==2477== HEAP SUMMARY:
==2477==     in use at exit: 72,704 bytes in 1 blocks
==2477==   total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated
==2477== 
==2477== 72,704 bytes in 1 blocks are still reachable in loss record 1 of 1
==2477==    at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2477==    by 0x4EC01EF: pool (eh_alloc.cc:117)
==2477==    by 0x4EC01EF: __static_initialization_and_destruction_0 (eh_alloc.cc:244)
==2477==    by 0x4EC01EF: _GLOBAL__sub_I_eh_alloc.cc (eh_alloc.cc:307)
==2477==    by 0x400F0E9: call_init.part.0 (in /usr/lib/ld-2.21.so)
==2477==    by 0x400F1FA: _dl_init (in /usr/lib/ld-2.21.so)
==2477==    by 0x4000DB9: ??? (in /usr/lib/ld-2.21.so)
==2477== 
==2477== LEAK SUMMARY:
==2477==    definitely lost: 0 bytes in 0 blocks
==2477==    indirectly lost: 0 bytes in 0 blocks
==2477==      possibly lost: 0 bytes in 0 blocks
==2477==    still reachable: 72,704 bytes in 1 blocks
==2477==         suppressed: 0 bytes in 0 blocks
==2477== 
==2477== For counts of detected and suppressed errors, rerun with: -v
==2477== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Comment 2 Florian Krohm 2015-09-21 21:05:27 UTC

*** Bug 348978 has been marked as a duplicate of this bug. ***

Comment 3 Markus Trippelsdorf 2015-10-05 08:05:04 UTC

See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64535 
for a discussion about why a simple destructor isn't feasible in this case.

Please add the suppression, because this issue causes confusion to a lot of users.

Comment 4 Ivo Raisr 2016-02-20 21:53:53 UTC

I hit this also on Solaris 12 which now ships gcc 5.3.

Comment 5 Ivo Raisr 2016-02-21 14:48:20 UTC

Attached is a patch introducing suppression for Solaris 12.
Also expected output of the following tests was adjusted:
- massif/tests/new-cpp
- massif/tests/overloaded-new
- memcheck/tests/leak_cpp_interior

Comment 6 Ivo Raisr 2016-02-21 14:49:00 UTC

Created attachment 97339 [details]
patch for Solaris

Comment 7 Ivo Raisr 2016-02-24 22:33:33 UTC

Will be fixed as a joint work on both libstdc++ and Valgrind side.
Tracked under https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69945 for libstdc++.

Comment 8 Ivo Raisr 2016-02-25 20:49:12 UTC

Created attachment 97556 [details]
patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++

This is fix for the Valgrind side.

Fix for libstdc++ from gcc is here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69945
which provides new function __gnu_cxx::__freeres in libstdc++.

Comment 9 Ivo Raisr 2016-02-25 20:53:14 UTC

Regression testing on Linux and Solaris went fine.
Together with patch for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69945
we see now:

==1862== HEAP SUMMARY:
==1862==     in use at exit: 0 bytes in 0 blocks
==1862==   total heap usage: 1 allocs, 1 frees, 72,704 bytes allocated
==1862== 
==1862== All heap blocks were freed -- no leaks are possible

instead of previously:

==1868== HEAP SUMMARY:
==1868==     in use at exit: 72,704 bytes in 1 blocks
==1868==   total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated
==1868== 
==1868== LEAK SUMMARY:
==1868==    definitely lost: 0 bytes in 0 blocks
==1868==    indirectly lost: 0 bytes in 0 blocks
==1868==      possibly lost: 0 bytes in 0 blocks
==1868==    still reachable: 72,704 bytes in 1 blocks
==1868==         suppressed: 0 bytes in 0 blocks

Comment 10 Mark Wielaard 2016-03-03 15:29:40 UTC

The __gnu_cxx::__freeres patch looks really good. I have been testing it with the gcc patch on x86_64-pc-linux-gnu. And things look fine. No more still reachable leaks.

Only question is whether we really need both VG_USERREQ__LIBC_FREERES_DONE and VG_USERREQ__CXX_FREERES_DONE. They seem to do exactly the same (setting VgSrc_ExitThread). Can't we just use one VG_USERREQ__FREERES_DONE for all kinds of freeres runners?

Comment 11 Ivo Raisr 2016-03-03 20:37:47 UTC

Yes, it can be done with just one FREERES_DONE hook.
But it means we need to pass argument(s) to it (which __freeres functions to call) and this will involve some extra code for all supported architectures. See invoker_invoke_gdbserver() in vgdb-invoker-ptrace.c and  setup_stack_frame() in vgdb-invoker-solaris.c for some inspiration.

Comment 12 Ivo Raisr 2016-03-03 20:38:33 UTC

At the moment we are waiting on libstdc++ changes which need to materialize first.

Comment 13 Mark Wielaard 2016-03-03 21:20:30 UTC

(In reply to Ivo Raisr from comment #11)
> Yes, it can be done with just one FREERES_DONE hook.
> But it means we need to pass argument(s) to it (which __freeres functions to
> call) and this will involve some extra code for all supported architectures.
> See invoker_invoke_gdbserver() in vgdb-invoker-ptrace.c and 
> setup_stack_frame() in vgdb-invoker-solaris.c for some inspiration.

The LIBC/CXX_FREERES_DONE user requests are only handled in scheduler.c os_client_request (). They are called from the hooks and only set VgSrc_ExitThread. I must be missing something, but I cannot see how that is related to invoker_invoke_gdbserver or setup_stack_frame. Or how those currently make a distinction between VG_USERREQ__LIBC_FREERES_DONE and VG_USERREQ__CXX_FREERES_DONE. Could you explain where/how which arguments need to be passed to which function if there was just one VG_USERREQ__FREERES_DONE called from both freeres and cxx_freeres in vg_preloaded.c?

I don't want there to be more (complicated) code, I just don't understand why it isn't simpler to have only one kind of FREERES user request.

Comment 14 Ivo Raisr 2016-03-04 17:33:16 UTC

My apologies, we were talking about different things which just now I realized.
Now I understand you are talking about VG_USERREQ__LIBC_FREERES_DONE client request
whereas I was talking about _vgnU_freeres wrapper(s). Yes, that is certainly possible.

Now thinking about it more, I think also _vgnU_freeres should be just one. It will simplify things
in vg_preloaded.c, m_main.c and scheduler.c. Passing parameter(s) to it is not so difficult after all.

Comment 15 Mark Wielaard 2016-03-10 21:25:12 UTC

Created attachment 97829 [details]
libstdc++.supp

This is the patch the valgrind fedora package is currently using as a workaround till we get proper __gnu_cxx::__freeres support in libstdc++.

Comment 16 Ivo Raisr 2016-03-27 23:02:38 UTC

Created attachment 98117 [details]
patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ (v2)

Patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ (v2).
Addresses review comment by Mark - freeres wrappers have been consolidate into one.

I tested Linux x86/amd64 architectures. Please test other architectures on Linux as necessary.

Comment 17 Ivo Raisr 2016-03-28 06:30:12 UTC

Created attachment 98120 [details]
patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ (v3)

This time with proper tracking of writing a value to a register in the guest state.

Previous patch was sometimes causing spurious warnings about using uninitialized variable
in the freeres wrapper.

Comment 18 Mark Wielaard 2016-03-30 14:23:09 UTC

The patch looks good to me and work as expected on GNU/Linux x86_64 with libstdc++ from gcc trunk.

I don't have much experience with the different architecture function call arguments. But it looks OK to me.

Comment 19 Mark Wielaard 2016-03-30 15:26:38 UTC

I did some quick tests with this patch on x86, x86_64, arm, s390x, ppc64, ppc64le and arm64. None of those setups had the new libstdc++ hook yet, but all had glibc. So it seems the new code calling the freeres function with argument, does work.

Comment 20 Ivo Raisr 2016-03-30 16:40:55 UTC

Mark, thank you for review.
However I found two problems with the patch (v3):
- memcheck/tests/reach_thread_register fails on amd64/Solaris
- freeres wrapper is called on Solaris, even if there is no __libc_freeres and __gnu_cxx::__freeres

I need to investigate these properly.

Comment 21 Ivo Raisr 2016-03-30 16:41:43 UTC

There is some backgroud for memcheck/tests/reach_thread_register: https://bugs.kde.org/show_bug.cgi?id=324227

Comment 22 Ivo Raisr 2016-03-30 17:36:49 UTC

The first issue is fixed by SVN r15839.

Comment 23 Ivo Raisr 2016-03-30 17:53:23 UTC

Fixed in SVN r15840, for gcc 6.

Comment 24 Anatolik 2016-04-03 21:35:04 UTC

Is this bug fixed for gcc5?

Comment 25 Mark Wielaard 2016-04-03 21:57:17 UTC

(In reply to Anatolik from comment #24)
> Is this bug fixed for gcc5?

No.
You might try using the  libstdc++.supp as a workaround for now.

Comment 26 Ivo Raisr 2016-04-08 21:32:20 UTC

Followup fix for x86 architecture: SVN r15854.

Failed test case was memcheck/tests/addressable built as 32-bit executable which showed:
Conditional jump or move depends on uninitialised value(s)
    at 0x4FFA0E4A: _vgnU_freeres (vg_preloaded.c:68)
    by 0x2: ???
    by 0x8051987: main (addressable.c:125)
Uninitialised value was created by a stack allocation
    at 0x8051473: test2 (addressable.c:42)

because parameter 'to_run' was written to a wrong address.

Comment 27 Ivo Raisr 2016-07-05 17:03:51 UTC

Status:
- Fixed for libstdc++ from gcc 6.
- Not fixed for libstdc++ from gcc 5 because they won't accept a new public symbol there. You can use a workaround from comment #15.

Comment 28 Aleksandar Rikalo 2016-12-01 13:28:51 UTC

Created attachment 102559 [details]
leak_cpp_interior additional .exps

memcheck/tests/leak_cpp_interior fails consistently on systems with gcc 5.0+ and it would be good if we could find some solution for it.

This patch adds additional .exp files for gcc6 case.
Together with https://bugsfiles.kde.org/attachment.cgi?id=102518 it makes test pass with gcc 6.0+.

Please, see: https://bugs.kde.org/show_bug.cgi?id=373069

Comment 29 Aleksandar Rikalo 2017-01-19 08:53:39 UTC

Can anyone take a look on this?

Thank you in advance!
Aleksandar

Comment 30 Aleksandar Rikalo 2017-03-15 14:27:18 UTC

Patch from #373069 has been committed.
Now we need just https://bugs.kde.org/attachment.cgi?id=102559 to make memcheck/leak_cpp_interior pass with gcc-6.

Comment 31 Aleksandar Rikalo 2017-09-18 14:17:58 UTC

Created attachment 107894 [details]
leak_cpp_interior additional .exps

Re-based patch.

Can someone take a look ?

Comment 32 Philippe Waroquiers 2017-09-20 17:20:22 UTC

(In reply to Aleksandar Rikalo from comment #31)
> Created attachment 107894 [details]
> leak_cpp_interior additional .exps
> 
> Re-based patch.
> 
> Can someone take a look ?

As I understand, the problem is 'cleanly' fixed in a newer version of glibc, by having valgrind calling  __gnu_cxx::__freeres, but which is only present in newer library version.

Having various different .exp files for the same test is difficult/painful to maintain : a functional change implies to update all the templates (and so implies to build and run valgrind in all the different setups for these templates). And a real test failure is more difficult to analyse, because you get a diff between the test output and all possible .exp candidates.

So, IMO, we better do not add new .exp files, in particular for a problem that will be cleanly fixed by an upgrade to the library having the __freeres symbol.

The workaround is either to just ignore the test failure, or add a suppression entry (see comment 5).

If we want a fix in the valgrind release, IMO, we better add the supp entry in valgrind, as this should make both the test work, and suppress this for all the users launched executables using the same lib version

Comment 33 Philippe Waroquiers 2017-09-20 17:22:01 UTC

(In reply to Philippe Waroquiers from comment #32)
> The workaround is either to just ignore the test failure, or add a
> suppression entry (see comment 5).
Typo : see rather comment 15,

Comment 34 Philippe Waroquiers 2017-09-20 17:40:16 UTC

(In reply to Philippe Waroquiers from comment #32)
> As I understand, the problem is 'cleanly' fixed in a newer version of glibc,
> by having valgrind calling  __gnu_cxx::__freeres, but which is only present
> in newer library version.
Humph, in fact, no :).
The problem of having a reachable at the end of the execution is solved,
but even when the __gnu_cxx::__freeres is there, during the run, the test
produces different output.

So, yes, we need to do something to have the test succeeding
even with a new gcc and a new lib.

I am however wondering if we can do something cleaner that new .exp files, e.g. by rather having some filter script on the output, to make the filtered output identical on various platforms and/or combine a filter with a suppression ntry.

I think the test objective can be reasonably verified just by counting
the nr of blocks. The nr of bytes leaked is not critical to verify the 
heuristics are working.

If the filter transforms all 'byte values' in xxx bytes
(similarly for the increase), and we filter the suppressed: line, I think we can end up with a single .exp file for all our current cases ciombinations (32 bits, 64 bits, solaris; old or new gcc/glib)

Comment 35 Philippe Waroquiers 2017-09-22 22:05:13 UTC

I am closing this bug as:
* the still reachable produced at the end of execution is fixed when the
   libstdc++ freeres function is called. Of course; this implies to have
   a new enough libc

* the test has been fixed in commit f053756e2880dbb7291aac086aa48e1fd3b6f812
  Now, the test output should not depend anymore on the libstdc++ version
  and/or on the platform. So, we only have 2 .exp files remaining
  (a 32 bits and a 64 bits one).