Bug 509666 - valgrind hangs on an executable linked against tcmalloc from gperftools 2.16
Summary: valgrind hangs on an executable linked against tcmalloc from gperftools 2.16
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (other bugs)
Version First Reported In: 3.24.0
Platform: RedHat Enterprise Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL: https://github.com/gperftools/gperfto...
Keywords:
Depends on:
Blocks:
 
Reported: 2025-09-19 06:26 UTC by Marc-Oliver Straub
Modified: 2025-10-07 15:30 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Patch for gperftool's malloc_extension.cc so it works with valgrind (927 bytes, text/plain)
2025-09-19 13:27 UTC, Marc-Oliver Straub
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marc-Oliver Straub 2025-09-19 06:26:22 UTC
SUMMARY
valgrind 3.24 is stuck in TCMalloc, no progress

STEPS TO REPRODUCE
1. link an executable against tcmalloc_minimal from gperftools_2.16
2. valgrind --soname-synonyms=somalloc=*tcmalloc* ./starter
3. Observe that nothing happens (except 100% CPU usage).
Ctrl-C then prints the following:
==3101210== Memcheck, a memory error detector
==3101210== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==3101210== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==3101210== Command: ./starter
==3101210== 
^C==3101210== 
==3101210== Process terminating with default action of signal 2 (SIGINT)
==3101210==    at 0x52731A6: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==3101210==    by 0x52739C8: TCMallocGuard::TCMallocGuard() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==3101210==    by 0x5265EF0: _sub_I_65535_0.0 (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==3101210==    by 0x400551D: call_init (dl-init.c:70)
==3101210==    by 0x400551D: call_init (dl-init.c:26)
==3101210==    by 0x400560B: _dl_init (dl-init.c:117)
==3101210==    by 0x401D7F9: ??? (in /usr/lib64/ld-linux-x86-64.so.2)


OBSERVED RESULT


EXPECTED RESULT


SOFTWARE/OS VERSIONS
Windows: 
macOS: 
(available in the Info Center app, or by running `kinfo` in a terminal window)
Linux/KDE Plasma: 
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
Comment 1 Paul Floyd 2025-09-19 07:49:25 UTC
Do you know if your tcmalloc build was done with the Valgrind headers avaialble? (see https://github.com/gperftools/gperftools/blob/80cdbea9a9be8d0541a162cb6bc9d2119c34d913/src/base/dynamic_annotations.cc#L60).

In case that it wasn't do you get the hang when you set RUNNING_ON_VALGRIND to something in your environment?
Comment 2 Marc-Oliver Straub 2025-09-19 08:53:33 UTC
exporting RUNNING_ON_VALGRIND doesn't seem to have an effect, still hangs.
I recompiled our gperftools, returning a hard 1 from GetRunningOnValgrind() - no change, still hangs.
Comment 3 Paul Floyd 2025-09-19 11:11:21 UTC
Can you get a stacktrace from Valgrind when it hangs?

Can you also get file/line numbers for the tcmalloc stacktrace?

This is happening somewhere in the tcmalloc startup code. https://github.com/gperftools/gperftools/blob/80cdbea9a9be8d0541a162cb6bc9d2119c34d913/src/malloc_extension.cc#L331 I assume here that the std::atomic::load() will succeed (this region is bound to be monothreaded). I also guess here that inst is nullptr on the first call. The tc_free(tc_malloc(32)) should be using Valgrind's replacement functions (the tc_xxx functions are weak aliases). Then it recursively calls itself. On the second call inst ought not be nullptr and it should be returned. Nothing there stands out to me as the cause of a hang.

Have you opened an issue with gperftools? This is likely to be a regression on their side. At work we've been using an old version of tcmalloc (2.5.93) for years.
Comment 4 Marc-Oliver Straub 2025-09-19 11:37:52 UTC
Interesting: I enabled -g for the gperftools compile (to get your line numbers), the valgrind output now changed:
==256708== Memcheck, a memory error detector
==256708== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==256708== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==256708== Command: ./starter
==256708== 
==256708== Stack overflow in thread #1: can't grow stack to 0x1ffe601000
==256708== 
==256708== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==256708==  Access not within mapped region at address 0x1FFE601FF8
==256708== Stack overflow in thread #1: can't grow stack to 0x1ffe601000
==256708==    at 0x484482F: malloc (vg_replace_malloc.c:446)
==256708==  If you believe this happened as a result of a stack
==256708==  overflow in your program's main thread (unlikely but
==256708==  possible), you can try to increase the size of the
==256708==  main thread stack using the --main-stacksize= flag.
==256708==  The main thread stack size used in this run was 10485760.

After increasing the stack size, it now hangs again (or, is in a infinite recursion). Ctrl-C now prints:
==260595== Process terminating with default action of signal 2 (SIGINT)
==260595==    at 0x527A280: free (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==260595==    by 0x528FC3B: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==260595==    by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==260595==    by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==260595==    by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==260595==    by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==260595==    by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==260595==    by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==260595==    by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==260595==    by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==260595==    by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)
==260595==    by 0x528FC40: MallocExtension::instance() (in /opt/hp93000rt/el9/x86_64/gperftools_2.16/lib/libtcmalloc_minimal.so.4.6.0)

But still no line numbers...
The valgrind provided by RedHat doesn't have debug symbols, pstack is unusable.
Comment 5 Marc-Oliver Straub 2025-09-19 11:45:00 UTC
That recursion fits to your assumption, but the tf_free(32) call doesn't seem to initialize the instance when running under valgrind, resulting in endless recursion.
Comment 6 Paul Floyd 2025-09-19 12:51:37 UTC
Yes, that makes sense.

Sounds like a bug in their code - they should be checking RunningOnValgrind before potentially getting into infinite recursion.
Comment 7 Marc-Oliver Straub 2025-09-19 13:26:03 UTC
Thank you for your quick support. With this last suggestion, I was able to restore the "old" behavior of tcmalloc when RunningOnValgrind(). I'll create a bug report for gperftools.
If anyone else runs into this problem, I've attached a patch to malloc_extension.cc that fixes the problem.
Comment 8 Marc-Oliver Straub 2025-09-19 13:27:26 UTC
Created attachment 185096 [details]
Patch for gperftool's malloc_extension.cc so it works with valgrind
Comment 9 Aliaksei Kandratsenka 2025-09-21 08:02:25 UTC
Hi. I am gperftools maintainer.

We should perhaps coordinate what is the intended behavior here.

The code that gets into infinite loop explicitly does tc_free and tc_malloc because it assumes it will call _our_ code.

The expectation is if whoever is overriding memory allocation APIs they'll have malloc/free etc, but not tc_XYZ functions.

I am wondering why valgrind etc isn't "simply" doing that. I.e. have linker "find" malloc and other functions where you need them to be as opposed to using code rewriting magic.

Depending on the logic here we can further establish what fixes (if any) need to be done to gperftools.
Comment 10 Paul Floyd 2025-09-21 12:55:34 UTC
(In reply to Aliaksei Kandratsenka from comment #9)
> Hi. I am gperftools maintainer.
> 
> We should perhaps coordinate what is the intended behavior here.
> 
> The code that gets into infinite loop explicitly does tc_free and tc_malloc
> because it assumes it will call _our_ code.
> 
> The expectation is if whoever is overriding memory allocation APIs they'll
> have malloc/free etc, but not tc_XYZ functions.
> 
> I am wondering why valgrind etc isn't "simply" doing that. I.e. have linker
> "find" malloc and other functions where you need them to be as opposed to
> using code rewriting magic.
> 
> Depending on the logic here we can further establish what fixes (if any)
> need to be done to gperftools.

Why don't we do "simple" things like use the linker? Because that would cause a lot of problems.

Quick overview of how Valgrind works. It is a static binary that does not link to any libraries. That way we can be sure that there are no conflicts between libc or libc++ internal state should they used by Valgrind and used by the guest. Valgrind does the same job as the OS regarding loading the guest exe. This has two big advantages. Firstly we get to see the whole picture for all memory. Tools that start after the guest exe has loaded have to handle memory whose origins they have not recorded. This also means that we check any startup code in guest exes. The other big advantage is that we don't rely on dynamic linking tricks. We can load static exes as well. We just look at all of the ELF symbols and record the ones that we need to redirect.

In your libraries you use aliases so malloc and tc_malloc are the same from our perspective. Same for free with 8 aliases. We could probably detect these aliases (and that could be useful for the free/new/new[] aliases). As long as you are using these aliases it is impossible for us to tell them apart. If we replace malloc then we also replace tc_malloc since they are the same.
Comment 11 Aliaksei Kandratsenka 2025-10-07 14:42:58 UTC
Thanks for the clarification. That you want to have the same behavior between dynamic and static linking is fair game. So I just submitted a fix to gperftools that avoids the assumption that tc_{malloc,free} are our "own" functions.

BTW given that gperftools (and few other allocators that attempt to make things go fast) alias various functions, valgrind now fails to detect if free or operator delete are called, so it gives me "bogus" warning about operator new/free kind of mismatch. I.e. consider updating valgrind to detect this kind of aliasing and silencing the warning.
Comment 12 Paul Floyd 2025-10-07 14:47:28 UTC
I did add this to the FAQ

https://valgrind.org/docs/manual/faq.html#faq.mismatches

Please could you open a bugzilla item for the mismatched free issue?
Comment 13 Marc-Oliver Straub 2025-10-07 15:30:18 UTC
Paul, thank you for updating the FAQ!