Summary: | Warning about "still reachable" memory when using libstdc++ from gcc 5 | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Dmitry Shachnev <mitya57> |
Component: | memcheck | Assignee: | Ivo Raisr <ivosh> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | aleksandar.rikalo, anatol.pomozov, invalid, ivosh, mark, mips32r2, octoploid, philippe.waroquiers |
Priority: | NOR | ||
Version: | 3.12 SVN | ||
Target Milestone: | --- | ||
Platform: | Debian unstable | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
patch for Solaris
patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ libstdc++.supp patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ (v2) patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ (v3) leak_cpp_interior additional .exps leak_cpp_interior additional .exps |
Description
Dmitry Shachnev
2015-03-18 12:24:05 UTC
I can confirm this bug on latest Arch packages. My OSS project relies on Valgrind with --leak-check=full for its unit tests, which is currently failing due to this bug. nick@nick-desktop ~/test $ cat test.cpp int main() { return 0; } nick@nick-desktop ~/test $ g++ test.cpp nick@nick-desktop ~/test $ valgrind --leak-check=full --errors-for-leak-kinds=all --show-leak-kinds=all --error-exitcode=1 ./a.out ==2477== Memcheck, a memory error detector ==2477== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==2477== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info ==2477== Command: ./a.out ==2477== ==2477== ==2477== HEAP SUMMARY: ==2477== in use at exit: 72,704 bytes in 1 blocks ==2477== total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated ==2477== ==2477== 72,704 bytes in 1 blocks are still reachable in loss record 1 of 1 ==2477== at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==2477== by 0x4EC01EF: pool (eh_alloc.cc:117) ==2477== by 0x4EC01EF: __static_initialization_and_destruction_0 (eh_alloc.cc:244) ==2477== by 0x4EC01EF: _GLOBAL__sub_I_eh_alloc.cc (eh_alloc.cc:307) ==2477== by 0x400F0E9: call_init.part.0 (in /usr/lib/ld-2.21.so) ==2477== by 0x400F1FA: _dl_init (in /usr/lib/ld-2.21.so) ==2477== by 0x4000DB9: ??? (in /usr/lib/ld-2.21.so) ==2477== ==2477== LEAK SUMMARY: ==2477== definitely lost: 0 bytes in 0 blocks ==2477== indirectly lost: 0 bytes in 0 blocks ==2477== possibly lost: 0 bytes in 0 blocks ==2477== still reachable: 72,704 bytes in 1 blocks ==2477== suppressed: 0 bytes in 0 blocks ==2477== ==2477== For counts of detected and suppressed errors, rerun with: -v ==2477== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) *** Bug 348978 has been marked as a duplicate of this bug. *** See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64535 for a discussion about why a simple destructor isn't feasible in this case. Please add the suppression, because this issue causes confusion to a lot of users. I hit this also on Solaris 12 which now ships gcc 5.3. Attached is a patch introducing suppression for Solaris 12. Also expected output of the following tests was adjusted: - massif/tests/new-cpp - massif/tests/overloaded-new - memcheck/tests/leak_cpp_interior Created attachment 97339 [details]
patch for Solaris
Will be fixed as a joint work on both libstdc++ and Valgrind side. Tracked under https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69945 for libstdc++. Created attachment 97556 [details] patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ This is fix for the Valgrind side. Fix for libstdc++ from gcc is here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69945 which provides new function __gnu_cxx::__freeres in libstdc++. Regression testing on Linux and Solaris went fine. Together with patch for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69945 we see now: ==1862== HEAP SUMMARY: ==1862== in use at exit: 0 bytes in 0 blocks ==1862== total heap usage: 1 allocs, 1 frees, 72,704 bytes allocated ==1862== ==1862== All heap blocks were freed -- no leaks are possible instead of previously: ==1868== HEAP SUMMARY: ==1868== in use at exit: 72,704 bytes in 1 blocks ==1868== total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated ==1868== ==1868== LEAK SUMMARY: ==1868== definitely lost: 0 bytes in 0 blocks ==1868== indirectly lost: 0 bytes in 0 blocks ==1868== possibly lost: 0 bytes in 0 blocks ==1868== still reachable: 72,704 bytes in 1 blocks ==1868== suppressed: 0 bytes in 0 blocks The __gnu_cxx::__freeres patch looks really good. I have been testing it with the gcc patch on x86_64-pc-linux-gnu. And things look fine. No more still reachable leaks. Only question is whether we really need both VG_USERREQ__LIBC_FREERES_DONE and VG_USERREQ__CXX_FREERES_DONE. They seem to do exactly the same (setting VgSrc_ExitThread). Can't we just use one VG_USERREQ__FREERES_DONE for all kinds of freeres runners? Yes, it can be done with just one FREERES_DONE hook. But it means we need to pass argument(s) to it (which __freeres functions to call) and this will involve some extra code for all supported architectures. See invoker_invoke_gdbserver() in vgdb-invoker-ptrace.c and setup_stack_frame() in vgdb-invoker-solaris.c for some inspiration. At the moment we are waiting on libstdc++ changes which need to materialize first. (In reply to Ivo Raisr from comment #11) > Yes, it can be done with just one FREERES_DONE hook. > But it means we need to pass argument(s) to it (which __freeres functions to > call) and this will involve some extra code for all supported architectures. > See invoker_invoke_gdbserver() in vgdb-invoker-ptrace.c and > setup_stack_frame() in vgdb-invoker-solaris.c for some inspiration. The LIBC/CXX_FREERES_DONE user requests are only handled in scheduler.c os_client_request (). They are called from the hooks and only set VgSrc_ExitThread. I must be missing something, but I cannot see how that is related to invoker_invoke_gdbserver or setup_stack_frame. Or how those currently make a distinction between VG_USERREQ__LIBC_FREERES_DONE and VG_USERREQ__CXX_FREERES_DONE. Could you explain where/how which arguments need to be passed to which function if there was just one VG_USERREQ__FREERES_DONE called from both freeres and cxx_freeres in vg_preloaded.c? I don't want there to be more (complicated) code, I just don't understand why it isn't simpler to have only one kind of FREERES user request. My apologies, we were talking about different things which just now I realized. Now I understand you are talking about VG_USERREQ__LIBC_FREERES_DONE client request whereas I was talking about _vgnU_freeres wrapper(s). Yes, that is certainly possible. Now thinking about it more, I think also _vgnU_freeres should be just one. It will simplify things in vg_preloaded.c, m_main.c and scheduler.c. Passing parameter(s) to it is not so difficult after all. Created attachment 97829 [details]
libstdc++.supp
This is the patch the valgrind fedora package is currently using as a workaround till we get proper __gnu_cxx::__freeres support in libstdc++.
Created attachment 98117 [details]
patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ (v2)
Patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ (v2).
Addresses review comment by Mark - freeres wrappers have been consolidate into one.
I tested Linux x86/amd64 architectures. Please test other architectures on Linux as necessary.
Created attachment 98120 [details]
patch for Linux and Solaris calling __gnu_cxx::__freeres from libstdc++ (v3)
This time with proper tracking of writing a value to a register in the guest state.
Previous patch was sometimes causing spurious warnings about using uninitialized variable
in the freeres wrapper.
The patch looks good to me and work as expected on GNU/Linux x86_64 with libstdc++ from gcc trunk. I don't have much experience with the different architecture function call arguments. But it looks OK to me. I did some quick tests with this patch on x86, x86_64, arm, s390x, ppc64, ppc64le and arm64. None of those setups had the new libstdc++ hook yet, but all had glibc. So it seems the new code calling the freeres function with argument, does work. Mark, thank you for review. However I found two problems with the patch (v3): - memcheck/tests/reach_thread_register fails on amd64/Solaris - freeres wrapper is called on Solaris, even if there is no __libc_freeres and __gnu_cxx::__freeres I need to investigate these properly. There is some backgroud for memcheck/tests/reach_thread_register: https://bugs.kde.org/show_bug.cgi?id=324227 The first issue is fixed by SVN r15839. Fixed in SVN r15840, for gcc 6. Is this bug fixed for gcc5? (In reply to Anatolik from comment #24) > Is this bug fixed for gcc5? No. You might try using the libstdc++.supp as a workaround for now. Followup fix for x86 architecture: SVN r15854. Failed test case was memcheck/tests/addressable built as 32-bit executable which showed: Conditional jump or move depends on uninitialised value(s) at 0x4FFA0E4A: _vgnU_freeres (vg_preloaded.c:68) by 0x2: ??? by 0x8051987: main (addressable.c:125) Uninitialised value was created by a stack allocation at 0x8051473: test2 (addressable.c:42) because parameter 'to_run' was written to a wrong address. Status: - Fixed for libstdc++ from gcc 6. - Not fixed for libstdc++ from gcc 5 because they won't accept a new public symbol there. You can use a workaround from comment #15. Created attachment 102559 [details] leak_cpp_interior additional .exps memcheck/tests/leak_cpp_interior fails consistently on systems with gcc 5.0+ and it would be good if we could find some solution for it. This patch adds additional .exp files for gcc6 case. Together with https://bugsfiles.kde.org/attachment.cgi?id=102518 it makes test pass with gcc 6.0+. Please, see: https://bugs.kde.org/show_bug.cgi?id=373069 Can anyone take a look on this? Thank you in advance! Aleksandar Patch from #373069 has been committed. Now we need just https://bugs.kde.org/attachment.cgi?id=102559 to make memcheck/leak_cpp_interior pass with gcc-6. Created attachment 107894 [details]
leak_cpp_interior additional .exps
Re-based patch.
Can someone take a look ?
(In reply to Aleksandar Rikalo from comment #31) > Created attachment 107894 [details] > leak_cpp_interior additional .exps > > Re-based patch. > > Can someone take a look ? As I understand, the problem is 'cleanly' fixed in a newer version of glibc, by having valgrind calling __gnu_cxx::__freeres, but which is only present in newer library version. Having various different .exp files for the same test is difficult/painful to maintain : a functional change implies to update all the templates (and so implies to build and run valgrind in all the different setups for these templates). And a real test failure is more difficult to analyse, because you get a diff between the test output and all possible .exp candidates. So, IMO, we better do not add new .exp files, in particular for a problem that will be cleanly fixed by an upgrade to the library having the __freeres symbol. The workaround is either to just ignore the test failure, or add a suppression entry (see comment 5). If we want a fix in the valgrind release, IMO, we better add the supp entry in valgrind, as this should make both the test work, and suppress this for all the users launched executables using the same lib version (In reply to Philippe Waroquiers from comment #32) > The workaround is either to just ignore the test failure, or add a > suppression entry (see comment 5). Typo : see rather comment 15, (In reply to Philippe Waroquiers from comment #32) > As I understand, the problem is 'cleanly' fixed in a newer version of glibc, > by having valgrind calling __gnu_cxx::__freeres, but which is only present > in newer library version. Humph, in fact, no :). The problem of having a reachable at the end of the execution is solved, but even when the __gnu_cxx::__freeres is there, during the run, the test produces different output. So, yes, we need to do something to have the test succeeding even with a new gcc and a new lib. I am however wondering if we can do something cleaner that new .exp files, e.g. by rather having some filter script on the output, to make the filtered output identical on various platforms and/or combine a filter with a suppression ntry. I think the test objective can be reasonably verified just by counting the nr of blocks. The nr of bytes leaked is not critical to verify the heuristics are working. If the filter transforms all 'byte values' in xxx bytes (similarly for the increase), and we filter the suppressed: line, I think we can end up with a single .exp file for all our current cases ciombinations (32 bits, 64 bits, solaris; old or new gcc/glib) I am closing this bug as: * the still reachable produced at the end of execution is fixed when the libstdc++ freeres function is called. Of course; this implies to have a new enough libc * the test has been fixed in commit f053756e2880dbb7291aac086aa48e1fd3b6f812 Now, the test output should not depend anymore on the libstdc++ version and/or on the platform. So, we only have 2 .exp files remaining (a 32 bits and a 64 bits one). |