Bug 435441 - valgrind fails to interpose malloc on musl 1.2.2 due to weak symbol name and no libc soname
Summary: valgrind fails to interpose malloc on musl 1.2.2 due to weak symbol name and ...
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Paul Floyd
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-04-06 23:03 UTC by Michael Forney
Modified: 2023-01-25 09:59 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Patch to handle weak symbols as global (1.04 KB, patch)
2021-11-04 21:37 UTC, Michael Forney
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Forney 2021-04-06 23:03:25 UTC
SUMMARY
Starting with musl 1.2.2, malloc is a weak symbol in libc.so. Also, musl's libc.so is built without any soname (though Alpine Linux, a popular musl-based distribution, builds it with -Wl,-soname,libc.musl-x86_64.so.1). The combination of these two seems to prevent valgrind from replacing malloc, though free is still replaced correctly.

STEPS TO REPRODUCE
1. Download and extract a musl toolchain from http://musl.cc/x86_64-linux-musl-cross.tgz
2. Install a /lib/ld-musl-x86_64.so.1 symlink with ln -s /path/to/x86_64-linux-musl-cross/x86_64-linux-musl/libc.so /lib/ld-musl-x86_64.so.1
3. Build valgrind with ./configure --host=x86_64-linux-musl CC=/path/to/x86_64-linux-musl-cross/bin/x86_64-linux-musl-gcc.
4. Compile the following C program with x86_64-linux-musl-gcc:

	#include <stdlib.h>
	int main(void) { free(malloc(32)); }

5. Run the program with ./vg-in-place ./a.out

From a musl-based system with valgrind installed, you can skip steps 2 and 3.

OBSERVED RESULT
valgrind prints the following error:

==21034== Invalid free() / delete / delete[] / realloc()
==21034==    at 0x4A0CA4B: free (vg_replace_malloc.c:538)
==21034==    by 0x48007DA: main (in /tmp/a.out)
==21034==  Address 0x4a01d10 is in a rw- mapped file /tmp/a.out segment
==21034==
==21034==
==21034== HEAP SUMMARY:
==21034==     in use at exit: 0 bytes in 0 blocks
==21034==   total heap usage: 0 allocs, 1 frees, 0 bytes allocated

This seems to because malloc did not get replaced, so valgrind was unaware of the allocation.

EXPECTED RESULT
valgrind should not print any errors, and detect 1 alloc and 1 free.

SOFTWARE/OS VERSIONS
Linux, musl 1.2.2

ADDITIONAL INFORMATION
Passing --soname-synonyms=somalloc=NONE or rebuilding musl libc.so with a strong malloc symbol both seem to solve the problem. However, the weak malloc symbol is needed in order to prevent __libc_malloc from clashing with a malloc linked by the application.

musl developers suspect that the problem is due to something valgrind is doing with weak symbols.
Comment 1 Tom Hughes 2021-04-06 23:47:23 UTC
Surely it's just down to the fact that we don't intercept symbols with that soname, as you've proved by removing the soname match with that switch?

Does --soname-synonyms=somalloc=libc.musl-x86_64.so.1 also make it work?
Comment 2 Michael Forney 2021-04-07 00:43:41 UTC
With --soname-synonyms=somalloc=libc.musl-x86_64.so.1 the warnings go away, but this is because it makes valgrind unable to track free as well as malloc:

==31116== HEAP SUMMARY:
==31116==     in use at exit: 0 bytes in 0 blocks
==31116==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated

To be clear, by default musl libc.so does not have a soname. It is only on Alpine Linux that it is built with a soname of libc.musl-x86_64.so.1.
Comment 3 Szabolcs Nagy 2021-04-07 19:52:02 UTC
if the malloc symbol is changed to be strong in libc.so then the test passes.

in a shared library, defined symbols with weak binding should be treated exactly
the same way as symbols with global binding, so valgrind is definitely doing
something wrong.

historically glibc had bugs where the dynamic linker treated weak symbols specially,
but that was 20 years ago (see LD_DYNAMIC_WEAK in the ld.so manual).

it is not clear why valgrind cares about the soname, i think the malloc symbol
should be interposed independently of the soname of the defining library.
Comment 4 Michael Forney 2021-11-04 21:37:04 UTC
Created attachment 143217 [details]
Patch to handle weak symbols as global

Here's a patch that fixes the issue for me by setting *is_global_out=True for STB_WEAK symbols in addition to STB_GLOBAL.
Comment 5 Paul Floyd 2021-11-22 09:13:55 UTC
Have you tested this on other platforms?

My concern with this is that it is an essential piece of code. I've seen lots of problems with pthread functions across Linux/FreeBSD/Solaris/Musl with various combinations of stubs and weak symbols.
Comment 6 Paul Floyd 2022-07-09 12:54:17 UTC
No new failures on FreeBSD.
Comment 7 Sam James 2023-01-22 23:43:49 UTC
Thanks, I was wondering why I got spurious failures on fclose() w/ Gentoo+musl. This patch sorts that.
Comment 8 Paul Floyd 2023-01-23 08:08:09 UTC
Pushed the change. I want to check that there is no falllout before closing this.

commit 5a6f1c1322d742308aaa4b3c1d937942b3de6c5a (HEAD -> master, origin/master, origin/HEAD)
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Mon Jan 23 09:05:50 2023 +0100

    Bug 435441 - valgrind fails to interpose malloc on musl 1.2.2 due to weak symbol name and no libc soname
    
    Patch by Michael Forney <mforney@mforney.org>
Comment 9 Paul Floyd 2023-01-25 09:59:09 UTC
I didn't see any fallout.