SUMMARY valgrind 3.24 is stuck in TCMalloc, no progress STEPS TO REPRODUCE 1. link an executable against tcmalloc_minimal from gperftools_2.16 2. valgrind --soname-synonyms=somalloc=*tcmalloc* ./starter 3. Observe that nothing happens (except 100% CPU usage). Ctrl-C then prints the following: ==3101210== Memcheck, a memory error detector ==3101210== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al. ==3101210== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info ==3101210== Command: ./starter ==3101210== ^C==3101210== ==3101210== Process terminating with default action of signal 2 (SIGINT) ==3101210== at 0x52731A6: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==3101210== by 0x52739C8: TCMallocGuard::TCMallocGuard() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==3101210== by 0x5265EF0: _sub_I_65535_0.0 (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==3101210== by 0x400551D: call_init (dl-init.c:70) ==3101210== by 0x400551D: call_init (dl-init.c:26) ==3101210== by 0x400560B: _dl_init (dl-init.c:117) ==3101210== by 0x401D7F9: ??? (in /usr/lib64/ld-linux-x86-64.so.2) OBSERVED RESULT EXPECTED RESULT SOFTWARE/OS VERSIONS Windows: macOS: (available in the Info Center app, or by running `kinfo` in a terminal window) Linux/KDE Plasma: KDE Plasma Version: KDE Frameworks Version: Qt Version: ADDITIONAL INFORMATION
Do you know if your tcmalloc build was done with the Valgrind headers avaialble? (see https://github.com/gperftools/gperftools/blob/80cdbea9a9be8d0541a162cb6bc9d2119c34d913/src/base/dynamic_annotations.cc#L60). In case that it wasn't do you get the hang when you set RUNNING_ON_VALGRIND to something in your environment?
exporting RUNNING_ON_VALGRIND doesn't seem to have an effect, still hangs. I recompiled our gperftools, returning a hard 1 from GetRunningOnValgrind() - no change, still hangs.
Can you get a stacktrace from Valgrind when it hangs? Can you also get file/line numbers for the tcmalloc stacktrace? This is happening somewhere in the tcmalloc startup code. https://github.com/gperftools/gperftools/blob/80cdbea9a9be8d0541a162cb6bc9d2119c34d913/src/malloc_extension.cc#L331 I assume here that the std::atomic::load() will succeed (this region is bound to be monothreaded). I also guess here that inst is nullptr on the first call. The tc_free(tc_malloc(32)) should be using Valgrind's replacement functions (the tc_xxx functions are weak aliases). Then it recursively calls itself. On the second call inst ought not be nullptr and it should be returned. Nothing there stands out to me as the cause of a hang. Have you opened an issue with gperftools? This is likely to be a regression on their side. At work we've been using an old version of tcmalloc (2.5.93) for years.
Interesting: I enabled -g for the gperftools compile (to get your line numbers), the valgrind output now changed: ==256708== Memcheck, a memory error detector ==256708== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al. ==256708== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info ==256708== Command: ./starter ==256708== ==256708== Stack overflow in thread #1: can't grow stack to 0x1ffe601000 ==256708== ==256708== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==256708== Access not within mapped region at address 0x1FFE601FF8 ==256708== Stack overflow in thread #1: can't grow stack to 0x1ffe601000 ==256708== at 0x484482F: malloc (vg_replace_malloc.c:446) ==256708== If you believe this happened as a result of a stack ==256708== overflow in your program's main thread (unlikely but ==256708== possible), you can try to increase the size of the ==256708== main thread stack using the --main-stacksize= flag. ==256708== The main thread stack size used in this run was 10485760. After increasing the stack size, it now hangs again (or, is in a infinite recursion). Ctrl-C now prints: ==260595== Process terminating with default action of signal 2 (SIGINT) ==260595== at 0x527A280: free (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==260595== by 0x528FC3B: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==260595== by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==260595== by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==260595== by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==260595== by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==260595== by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==260595== by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==260595== by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==260595== by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==260595== by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) ==260595== by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0) But still no line numbers... The valgrind provided by RedHat doesn't have debug symbols, pstack is unusable.
That recursion fits to your assumption, but the tf_free(32) call doesn't seem to initialize the instance when running under valgrind, resulting in endless recursion.
Yes, that makes sense. Sounds like a bug in their code - they should be checking RunningOnValgrind before potentially getting into infinite recursion.
Thank you for your quick support. With this last suggestion, I was able to restore the "old" behavior of tcmalloc when RunningOnValgrind(). I'll create a bug report for gperftools. If anyone else runs into this problem, I've attached a patch to malloc_extension.cc that fixes the problem.
Created attachment 185096 [details] Patch for gperftool's malloc_extension.cc so it works with valgrind
Hi. I am gperftools maintainer. We should perhaps coordinate what is the intended behavior here. The code that gets into infinite loop explicitly does tc_free and tc_malloc because it assumes it will call _our_ code. The expectation is if whoever is overriding memory allocation APIs they'll have malloc/free etc, but not tc_XYZ functions. I am wondering why valgrind etc isn't "simply" doing that. I.e. have linker "find" malloc and other functions where you need them to be as opposed to using code rewriting magic. Depending on the logic here we can further establish what fixes (if any) need to be done to gperftools.
(In reply to Aliaksei Kandratsenka from comment #9) > Hi. I am gperftools maintainer. > > We should perhaps coordinate what is the intended behavior here. > > The code that gets into infinite loop explicitly does tc_free and tc_malloc > because it assumes it will call _our_ code. > > The expectation is if whoever is overriding memory allocation APIs they'll > have malloc/free etc, but not tc_XYZ functions. > > I am wondering why valgrind etc isn't "simply" doing that. I.e. have linker > "find" malloc and other functions where you need them to be as opposed to > using code rewriting magic. > > Depending on the logic here we can further establish what fixes (if any) > need to be done to gperftools. Why don't we do "simple" things like use the linker? Because that would cause a lot of problems. Quick overview of how Valgrind works. It is a static binary that does not link to any libraries. That way we can be sure that there are no conflicts between libc or libc++ internal state should they used by Valgrind and used by the guest. Valgrind does the same job as the OS regarding loading the guest exe. This has two big advantages. Firstly we get to see the whole picture for all memory. Tools that start after the guest exe has loaded have to handle memory whose origins they have not recorded. This also means that we check any startup code in guest exes. The other big advantage is that we don't rely on dynamic linking tricks. We can load static exes as well. We just look at all of the ELF symbols and record the ones that we need to redirect. In your libraries you use aliases so malloc and tc_malloc are the same from our perspective. Same for free with 8 aliases. We could probably detect these aliases (and that could be useful for the free/new/new[] aliases). As long as you are using these aliases it is impossible for us to tell them apart. If we replace malloc then we also replace tc_malloc since they are the same.
Thanks for the clarification. That you want to have the same behavior between dynamic and static linking is fair game. So I just submitted a fix to gperftools that avoids the assumption that tc_{malloc,free} are our "own" functions. BTW given that gperftools (and few other allocators that attempt to make things go fast) alias various functions, valgrind now fails to detect if free or operator delete are called, so it gives me "bogus" warning about operator new/free kind of mismatch. I.e. consider updating valgrind to detect this kind of aliasing and silencing the warning.
I did add this to the FAQ https://valgrind.org/docs/manual/faq.html#faq.mismatches Please could you open a bugzilla item for the mismatched free issue?
Paul, thank you for updating the FAQ!