Summary: | helgrind: need to intercept duplicate libc definitions for Fedora 33 | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Stacy <stacy.gaikovaia> |
Component: | helgrind | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | bart.vanassche+kde, mark, pjfloyd |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Add libc aliases for helgrind pthread function wrappers
updated patch with musl Update NEWS and capitalize macros Try to make clearer conditional macros Add configure time checking |
Description
Stacy
2020-11-09 21:18:10 UTC
(In reply to Stacy from comment #0) > ADDITIONAL INFORMATION > If I have a fix myself, can I sumbit it? How? I notice this question also > wasn't answered on https://bugs.kde.org/show_bug.cgi?id=428035. Thanks. Sorry for missing this question earlier. Best/simplest is to just attach the patch to this bug (or if it is not for this bug, to find an existing bug or open a new one and attach it to that). I'll take a look at this - I upgraded from 32 to 33 over the weekend. Currently Helgrind and DRD have slightly different approaches to the PTH_FUNC macro. The DRD version also generates both functions declaration and definition with body, whilst Helgrind generates declaration and the definition without the body. That means that there are two possible approaches: change Helgrind to be like DRD so that the macro can generate wrappers for both libpthread and libc or add another macro for libc. The first approach will produces the shorter source file. However, over on FreeBSD I do need to have two separate macros because pthjread functions are in libpthread but the semaphore functions are in libc (and not duplicated). Any thoughts or preferences? > that there are two possible approaches: change Helgrind to be like DRD so that the macro can generate wrappers for both libpthread and libc or add another macro for libc.
If the second approach needs to be done anyway in order to accommodate FreeBSD, then it makes sense to just do it right off the bat. Out of curiosity, why do the approaches between Helgrind and DRD differ anyway?
Can you post a patch? I have been fiddling but can't quite get it - I am not the best at understanding preprocessor directives.
I'm not sure about the history. DRD is more factored (but a bit less efficient as a result, at least in terms of redundant code). Helgrind hs quite a few special cases for non-Linux OSes. I'll try to post a first patch in the next couple of days. Created attachment 133454 [details]
Add libc aliases for helgrind pthread function wrappers
In the end I found a better approach. I just added weak aliases for all of the pthread functions in libc.
Thanks Paul - this is exactly what I was looking for! Stacy Created attachment 133507 [details]
updated patch with musl
This patch contains additional logic to handle build environments where libc is musl (where weak aliasing is unnecessary).
Created attachment 133548 [details]
Update NEWS and capitalize macros
I must admit that I don't understand how the weak alias trick works. Why does the same trick not work for musl? And how is solaris different? BTW. I think it would be nice to make this 2 commits (maybe even bugs), one for newer glibc and musl (if it needs to be treated different from glibc). Just to help confused hackers like me to have to deal with one thing at a time :) I'll attempt to add as many explanations as possible. First, the true nature of the problem on Fedora 33. For some very bizarre reason, pthread_cond_destroy is no longer in libpthread: paulf> nm /lib64/libpthread.so.0 | grep cond_destroy U __pthread_cond_destroy@@GLIBC_PRIVATE instead it is in libc paulf> nm /lib64/libc.so.6 | grep cond_destroy [snip loads more] 0000000000086440 T pthread_cond_destroy@GLIBC_2.2.5 00000000000869f0 T pthread_cond_destroy@@GLIBC_2.3.2 DRD and Helgrind are doing essentially the sme thing for the intercepts, but differ a bit in the details. For DRD, see drd_pthread_intercepts.c where the PTH_FUNCS macro generates 3 wrapper functions covering "name", "name@*" and "name$*". Bart extended this for both libc and libpthread (but only libthread with MUSL) Hence I now see paulf> nm vgpreload_drd-amd64-linux.so | grep pthread | grep cond | grep destroy U pthread_cond_destroy 0000000000017c90 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroy 0000000000017f70 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZAZa 0000000000018250 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZDZa 0000000000017e00 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroy 00000000000180e0 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZAZa 00000000000183c0 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZDZa Helgrind is more on a case by case basis (which means the code is a bit longer but only the required wrappers are generated). So now I see paulf> nm vgpreload_helgrind-amd64-linux.so | grep pthread | grep cond | grep destroy 000000000000a4f9 t pthread_cond_destroy_WRK 000000000000d87d W _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZAZa 000000000000d87d T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZAZa The difference here is that DRD has 3 separate functions whilst Helgrind has just one plus an alias. They both work the same way with the redirection mechanism - they're just symbols and calling them does the same thing. The differences between Linux and other OSes (Solaris, FreeBSD and macOS) concern the names of the libc / libppthread libraries and the mangling used). Finally, MUSL. I'm not sure if this is strictly necessary (I don't have a system to test on, perhaps I will spin up yet another Virtual Box ...). It's certainly more consistent with DRD. Hi Paul, (In reply to Paul Floyd from comment #10) > I'll attempt to add as many explanations as possible. Thanks, sorry for being so slow. It doesn't really help (IMHO) that we are trying to fix an issue with newer glibc and musl at the same time. > First, the true nature of the problem on Fedora 33. For some very bizarre > reason, pthread_cond_destroy is no longer in libpthread: > > paulf> nm /lib64/libpthread.so.0 | grep cond_destroy > U __pthread_cond_destroy@@GLIBC_PRIVATE > > instead it is in libc > > paulf> nm /lib64/libc.so.6 | grep cond_destroy > > [snip loads more] > 0000000000086440 T pthread_cond_destroy@GLIBC_2.2.5 > 00000000000869f0 T pthread_cond_destroy@@GLIBC_2.3.2 OK. I talked to a glibc hacker and their intention is to (eventually) move everything from libpthread.so.0 into libc.so.6: https://sourceware.org/legacy-ml/libc-alpha/2019-10/msg00080.html So it seems we should try to wrap all pthread functions we need to intercept in both pthread.so and libc.so to be future proof (and work with older glibc versions where all, or most, pthread_ functions are still in libpthread.so). > DRD and Helgrind are doing essentially the sme thing for the intercepts, but > differ a bit in the details. For DRD, see drd_pthread_intercepts.c where the > PTH_FUNCS macro generates 3 wrapper functions covering "name", "name@*" and > "name$*". Bart extended this for both libc and libpthread (but only > libthread with MUSL) Is libthread (without p) the name of the pthread_ functions library in MUSL? > Hence I now see > > paulf> nm vgpreload_drd-amd64-linux.so | grep pthread | grep cond | grep > destroy > U pthread_cond_destroy > 0000000000017c90 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroy > 0000000000017f70 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZAZa > 0000000000018250 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZDZa > 0000000000017e00 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroy > 00000000000180e0 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZAZa > 00000000000183c0 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZDZa OK, for this I always need to reread the Z-encoding description in pub_tool_redir.h So libcZdsoZa is the normal (glibc) libc.so* And libpthreadZdsoZd0 is the normal (glibc) libpthread.so.0 The ZAZa variant is @* And ZDZa is $* I assume we need the @* variant to intercept any versioned symbols. I am not sure why we need the $* variants, what does that match? > Helgrind is more on a case by case basis (which means the code is a bit > longer but only the required wrappers are generated). So now I see > > paulf> nm vgpreload_helgrind-amd64-linux.so | grep pthread | grep cond | > grep destroy > 000000000000a4f9 t pthread_cond_destroy_WRK > 000000000000d87d W _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZAZa > 000000000000d87d T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZAZa > > The difference here is that DRD has 3 separate functions whilst Helgrind has > just one plus an alias. They both work the same way with the redirection > mechanism - they're just symbols and calling them does the same thing. OK, and drd uses VG_WRAP_FUNCTION_ZZ from pub_tool_redir.h and helgrind uses I_WRAP_SONAME_FNNAME_ZZ from valgrind.h. Which seem identical, but use slightly different macro expansions. So here we only care about the versioned variants (ZAZa). And the one in libc.so.* is weak (why?) > The differences between Linux and other OSes (Solaris, FreeBSD and macOS) > concern the names of the libc / libppthread libraries and the mangling used). Right, I see them defined in pub_tool_redir.h as VG_Z_LIBC_SONAME and VG_Z_LIBPTHREAD_SONAME So one issue I had missed is that VG_Z_LIBPTHREAD_SONAME is defined as libcZdZa (libc.*) is the source of the problem with MUSL because when we duplicate the wrappers for VG_Z_LIBPTHREAD_SONAME and VG_Z_LIBC_SONAME in that case. If we want to also fix this for musl then maybe redir.h should define something like VG_LIBPTHREAD_IS_LIBC_SONAME which is 1 for MUSL, but 0 for anything else and/or define something like VG_LIBPTHREAD_FUNCS_MIGHT_BE_IN_LIBC_SONAME which would be 1 for glibc and indicates that all pthread_ function wrappers need to be duplicated? OK, I think I finally understand your patch, the alias is needed as declaration, but then you still need to define the implementation. In drd the implementation is done by calling one of the arguments of the macro, but in helgrind the implementation follows the macro as a function body, so you cannot refer to it in the PTH_FUNC macro itself. I am still somewhat unclear on why it needs to be a weak alias, but I understand at least why you want an function alias and cannot simply declare two functions in the helgrind case. Created attachment 133729 [details]
Try to make clearer conditional macros
I've split the possibilities into
VG_WRAP_THREAD_FUNCTION_LIBPTHREAD_ONLY
VG_WRAP_THREAD_FUNCTION_LIBC_ONLY
and
VG_WRAP_THREAD_FUNCTION_LIBC_AND_LIBPTHREAD
This now affects redir.c and DRD drd_pthread_intercepts.c, which I changet to keep consistent.
MUSL no longer uses VG_Z_LIBPTHREAD_SONAME to refer to libc.*
Forgot to say. the testing that I did for this. No change on any platform. Fedora 33 amd64 Alpine Linux (for MUSL) - all Helgrind tests are failing anyway, but at least it builds OK Solaris 11.3 OS X 10.7.5 - again most Helgrind tests fail. ifdef-code like the following is hard to maintain: +#if defined(VGO_darwin) +#define VG_WRAP_THREAD_FUNCTION_LIBPTHREAD_ONLY +#elif defined(VGO_solaris) || (defined(VGO_linux) && defined(MUSL_LIBC)) +#define VG_WRAP_THREAD_FUNCTION_LIBC_ONLY +#elif defined(VGO_linux) +#define VG_WRAP_THREAD_FUNCTION_LIBC_AND_LIBPTHREAD +#else +# error "Unknown platform" +#endif Please change this into configure tests that verify whether pthreads functions exist in libpthread, libc or in both. > Please change this into configure tests that verify whether pthreads
> functions exist in libpthread, libc or in both.
Hi Bart
That's a good idea, but the devil is in the details. I have two fairly big questions.
1) which functions to test and
2) how to test for them
For point 1, for the platforms that I know of at the moment
Fedora 33
There are 34 pthread functions in libc and 72 in libpthread (28 only in libc, 66 only in libpthread and 6 in both).
(To count these I used
nm /lib64/libc.so.6 | grep " T " | grep " pthread" | sed 's/@.*// | awk '{print $3}' | sort -u > c
and
nm /lib64/libpthread.so.0 | grep " T " | grep " pthread" | sed 's/@.*// | awk '{print $3}' | sort -u > p
to generate two text file lists c and p for libc and libpthread respectively, then I used combinations of
comm c p -2 -3 | wc
comm c p -1 -3 | wc
comm c p -1 -2 | wc
to count libc, libpthread and both.
This is a bit simplistic as I'm not counting W symbols).
For comparison, openSuSE Leap 15.2 which uses glibc 2.26 has 7 only in libc 70 only in libpthread and 24 in both
Linux MUSL, everything is in /lib/ld-musl-x86_64.so.1. libc is a symlink to this and there is no libpthread.
macOS I don't quite understand what is where, on my old macbook there is some stuff in /usr/lib/libSystem.B.dylib and /usr/lib/libpthread.dylib is just a symlink to this.
Solaris, libc contains everything and libpthread exists but just contains absolute symbols.
FreeBSD, pthread functions are in libpthread but sem functions are in libc.
Next, question b) how to test this? I'm no expert in autoconf. It seems to me that the AC_CHECK_LIB and AC_SEARCH_LIBS are not suitable since they don't exclude libc, so this would require full AC_LINK_IFELSE style tests using -nostlib and then either -lc or -lpthread, for each required platform variation.
Created attachment 133799 [details]
Add configure time checking
This patch now uses configure. The tests are fairly complicated, but seem to work OK for Fedora 33, Alpine Linux (musl), Solaris 11.3 and FreeBSD. However, I don't get what I expect for macOS. I only have access to 1 mac running 10.7 to test this on, and Apple's libc/libpthread are rather different and change fairly often.
Bottom line is that I'm not entirely sure that this is better than the prvious patch.
(In reply to Paul Floyd from comment #14) > Created attachment 133729 [details] > Try to make clearer conditional macros > > I've split the possibilities into > > VG_WRAP_THREAD_FUNCTION_LIBPTHREAD_ONLY > VG_WRAP_THREAD_FUNCTION_LIBC_ONLY > and > VG_WRAP_THREAD_FUNCTION_LIBC_AND_LIBPTHREAD > > This now affects redir.c and DRD drd_pthread_intercepts.c, which I changet > to keep consistent. > > MUSL no longer uses VG_Z_LIBPTHREAD_SONAME to refer to libc.* Shouldn't that first #if defined(VGO_darwin) in drd_pthread_intercepts.c be #if defined(VG_WRAP_THREAD_FUNCTION_LIBPTHREAD_ONLY) ? Otherwise looks good. This looks good, thanks. I am testing this with Fedora 33+ now. (In reply to Paul Floyd from comment #18) > Created attachment 133799 [details] > Add configure time checking > > This patch now uses configure. The tests are fairly complicated, but seem to > work OK for Fedora 33, Alpine Linux (musl), Solaris 11.3 and FreeBSD. > However, I don't get what I expect for macOS. I only have access to 1 mac > running 10.7 to test this on, and Apple's libc/libpthread are rather > different and change fairly often. > > Bottom line is that I'm not entirely sure that this is better than the > prvious patch. I find the logic in the previous patch much easier to follow. The configure tests look complex/fragile. Also I am not sure the first ac_thread_fns_only_in_libc=yes is correct. Why couldn't those (or other) functions then not also be in libpthread? Normally I would agree with Bart, but I think this is one example of when configure tests are actually harder to keep correct than the simple ifdefs based on platform. (In reply to Mark Wielaard from comment #19) > Shouldn't that first #if defined(VGO_darwin) in drd_pthread_intercepts.c be > #if defined(VG_WRAP_THREAD_FUNCTION_LIBPTHREAD_ONLY) ? > > Otherwise looks good. > This looks good, thanks. I am testing this with Fedora 33+ now. I assumes that the never-used call to fflush() is specific to Darwin, and Darwin is the only platform that wraps only VG_Z_LIBPTHREAD_SONAME currently. If we go with this patch I'll add a comment. (In reply to Mark Wielaard from comment #20) > I find the logic in the previous patch much easier to follow. The configure > tests look complex/fragile. Also I am not sure the first > ac_thread_fns_only_in_libc=yes is correct. Why couldn't those (or other) > functions then not also be in libpthread? Indeed I'm sure it is the case that functions can be in both libraries. I'm not sure which platforms use this, and which functions are impacted, but I believe that one way of doing things is to have stub functions in libc that are weak symbols and then to have the full implementations in libpthread. So when the application is not linked against libpthread you get single threaded behaviour. When libpthread is linked, the full versions get chosesn by the link loaded and the application runs multithreaded. ac_thread_fns_only_in_libc=yes is an empirical approximation to the truth based on my experience and a bit of trial and error. > Normally I would agree with Bart, but I think this is one example of when > configure tests are actually harder to keep correct than the simple ifdefs > based on platform. I'll wait to see what Bart says before proceeding. On 12/3/20 6:54 AM, Paul Floyd wrote:
> https://bugs.kde.org/show_bug.cgi?id=428909
>
> --- Comment #22 from Paul Floyd <pjfloyd@wanadoo.fr> ---
> (In reply to Mark Wielaard from comment #20)
>>
>> Normally I would agree with Bart, but I think this is one example of when
>> configure tests are actually harder to keep correct than the simple ifdefs
>> based on platform.
>
> I'll wait to see what Bart says before proceeding.
If Mark agrees to maintain the #ifdef's, I'm fine with the ifdefs.
I'll also keep an eye on this for other platforms. |