Bug 428909 - helgrind: need to intercept duplicate libc definitions for Fedora 33
Summary: helgrind: need to intercept duplicate libc definitions for Fedora 33
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: helgrind (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-09 21:18 UTC by Stacy
Modified: 2020-12-04 09:16 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Add libc aliases for helgrind pthread function wrappers (1.03 KB, patch)
2020-11-19 13:49 UTC, Paul Floyd
Details
updated patch with musl (1.89 KB, patch)
2020-11-20 19:25 UTC, Stacy
Details
Update NEWS and capitalize macros (1.75 KB, patch)
2020-11-22 07:43 UTC, Paul Floyd
Details
Try to make clearer conditional macros (6.48 KB, patch)
2020-11-29 15:34 UTC, Paul Floyd
Details
Add configure time checking (8.06 KB, patch)
2020-12-02 08:57 UTC, Paul Floyd
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stacy 2020-11-09 21:18:10 UTC
SUMMARY
In this commit (https://sourceware.org/git/?p=valgrind.git;a=commit;h=15330adf7c2471fbaa6a0818db07078d81dbff97), PTH_FUNC was modified in order to intercept posix thread functions in both libc and libpthread. The same fix needs to be done in helgrind/hg_intercepts.c, I believe. 

See also https://bugs.kde.org/show_bug.cgi?id=428035 for comments on musl.

STEPS TO REPRODUCE
1. On a machine running glibc v 2.32 (eg, Fedora 33), make valgrind from source
2. run helgrind vgtests
3. pth_destroy_cond fails 

OBSERVED RESULT
helgrind/tests/pth_destroy_cond.stderr.out is empty.

EXPECTED RESULT
from helgrind/tests/pth_destroy_cond.stderr.exp: 
---Thread-Announcement------------------------------------------

Thread #x was created
   ...
   by 0x........: pthread_create@* (hg_intercepts.c:...)
   by 0x........: main (pth_destroy_cond.c:29)

----------------------------------------------------------------

Thread #x: pthread_cond_destroy: destruction of condition variable being waited upon
   at 0x........: pthread_cond_destroy_WRK (hg_intercepts.c:...)
   by 0x........: pthread_cond_destroy@* (hg_intercepts.c:...)
   by 0x........: ThreadFunction (pth_destroy_cond.c:18)
   by 0x........: mythread_wrapper (hg_intercepts.c:...)
   ...



SOFTWARE/OS VERSIONS
Windows: n/a
macOS: n/a
Linux/KDE Plasma: Poky (openembedded reference distro) 
(available in About System)
KDE Plasma Version: n/a
KDE Frameworks Version: n/a  
Qt Version: n/a

ADDITIONAL INFORMATION
If I have a fix myself, can I sumbit it? How? I notice this question also wasn't answered on https://bugs.kde.org/show_bug.cgi?id=428035. Thanks.
Comment 1 Mark Wielaard 2020-11-09 21:57:09 UTC
(In reply to Stacy from comment #0)
> ADDITIONAL INFORMATION
> If I have a fix myself, can I sumbit it? How? I notice this question also
> wasn't answered on https://bugs.kde.org/show_bug.cgi?id=428035. Thanks.

Sorry for missing this question earlier.
Best/simplest is to just attach the patch to this bug (or if it is not for this bug, to find an existing bug or open a new one and attach it to that).
Comment 2 Paul Floyd 2020-11-16 06:29:01 UTC
I'll take a look at this - I upgraded from 32 to 33 over the weekend.

Currently Helgrind and DRD have slightly different approaches to the PTH_FUNC macro. The DRD version also generates both functions declaration and definition with body, whilst Helgrind generates declaration and the definition without the body. That means that there are two possible approaches: change Helgrind to be like DRD so that the macro can generate wrappers for both libpthread and libc or add another macro for libc.

The first approach will produces the shorter source file. However, over on FreeBSD I do need to have two separate macros because pthjread functions are in libpthread but the semaphore functions are in libc (and not duplicated).

Any thoughts or preferences?
Comment 3 Stacy 2020-11-18 15:39:34 UTC
> that there are two possible approaches: change Helgrind to be like DRD so that the macro can generate wrappers for both libpthread and libc or add another macro for libc.

If the second approach needs to be done anyway in order to accommodate FreeBSD, then it makes sense to just do it right off the bat. Out of curiosity, why do the approaches between Helgrind and DRD differ anyway? 

Can you post a patch? I have been fiddling but can't quite get it - I am not the best at understanding preprocessor directives.
Comment 4 Paul Floyd 2020-11-18 20:59:50 UTC
I'm not sure about the history. DRD is more factored (but a bit less efficient as a result, at least in terms of redundant code). Helgrind hs quite a few special cases for non-Linux OSes.

I'll try to post a first patch in the next couple of days.
Comment 5 Paul Floyd 2020-11-19 13:49:20 UTC
Created attachment 133454 [details]
Add libc aliases for helgrind pthread function wrappers

In the end I found a better approach. I just added weak aliases for all of the pthread functions in libc.
Comment 6 Stacy 2020-11-19 21:14:20 UTC
Thanks Paul - this is exactly what I was looking for! 

Stacy
Comment 7 Stacy 2020-11-20 19:25:30 UTC
Created attachment 133507 [details]
updated patch with musl

This patch contains additional logic to handle build environments where libc is musl (where weak aliasing is unnecessary).
Comment 8 Paul Floyd 2020-11-22 07:43:43 UTC
Created attachment 133548 [details]
Update NEWS and capitalize macros
Comment 9 Mark Wielaard 2020-11-24 00:48:22 UTC
I must admit that I don't understand how the weak alias trick works.
Why does the same trick not work for musl?
And how is solaris different?

BTW. I think it would be nice to make this 2 commits (maybe even bugs), one for newer glibc and musl (if it needs to be treated different from glibc). Just to help confused hackers like me to have to deal with one thing at a time :)
Comment 10 Paul Floyd 2020-11-24 07:56:41 UTC
I'll attempt to add as many explanations as possible.

First, the true nature of the problem on Fedora 33. For some very bizarre reason, pthread_cond_destroy is no longer in libpthread:

paulf> nm /lib64/libpthread.so.0 | grep cond_destroy
                 U __pthread_cond_destroy@@GLIBC_PRIVATE

instead it is in libc

paulf> nm /lib64/libc.so.6  | grep cond_destroy                               
[snip loads more]
0000000000086440 T pthread_cond_destroy@GLIBC_2.2.5
00000000000869f0 T pthread_cond_destroy@@GLIBC_2.3.2

DRD and Helgrind are doing essentially the sme thing for the intercepts, but differ a bit in the details. For DRD, see drd_pthread_intercepts.c where the PTH_FUNCS macro generates 3 wrapper functions covering "name", "name@*" and "name$*". Bart extended this for both libc and libpthread (but only libthread with MUSL)

Hence I now see

paulf> nm vgpreload_drd-amd64-linux.so | grep pthread | grep cond | grep destroy
                 U pthread_cond_destroy
0000000000017c90 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroy
0000000000017f70 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZAZa
0000000000018250 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZDZa
0000000000017e00 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroy
00000000000180e0 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZAZa
00000000000183c0 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZDZa


Helgrind is more on a case by case basis (which means the code is a bit longer but only the required wrappers are generated). So now I see

paulf> nm vgpreload_helgrind-amd64-linux.so | grep pthread | grep cond | grep destroy 
000000000000a4f9 t pthread_cond_destroy_WRK
000000000000d87d W _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZAZa
000000000000d87d T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZAZa

The difference here is that DRD has 3 separate functions whilst Helgrind has just one plus an alias. They both work the same way with the redirection mechanism - they're just symbols and calling them does the same thing.

The differences between Linux and other OSes (Solaris, FreeBSD and macOS) concern the names of the libc / libppthread libraries and the mangling used).

Finally, MUSL. I'm not sure if this is strictly necessary (I don't have a system to test on, perhaps I will spin up yet another Virtual Box ...). It's certainly more consistent with DRD.
Comment 11 Mark Wielaard 2020-11-27 15:04:17 UTC
Hi Paul,

(In reply to Paul Floyd from comment #10)
> I'll attempt to add as many explanations as possible.

Thanks, sorry for being so slow.
It doesn't really help (IMHO) that we are trying to fix an issue with newer glibc and musl at the same time.

> First, the true nature of the problem on Fedora 33. For some very bizarre
> reason, pthread_cond_destroy is no longer in libpthread:
> 
> paulf> nm /lib64/libpthread.so.0 | grep cond_destroy
>                  U __pthread_cond_destroy@@GLIBC_PRIVATE
> 
> instead it is in libc
> 
> paulf> nm /lib64/libc.so.6  | grep cond_destroy                             
> 
> [snip loads more]
> 0000000000086440 T pthread_cond_destroy@GLIBC_2.2.5
> 00000000000869f0 T pthread_cond_destroy@@GLIBC_2.3.2

OK. I talked to a glibc hacker and their intention is to (eventually) move everything from libpthread.so.0 into libc.so.6:
https://sourceware.org/legacy-ml/libc-alpha/2019-10/msg00080.html

So it seems we should try to wrap all pthread functions we need to intercept in both pthread.so and libc.so to be future proof (and work with older glibc versions where all, or most, pthread_ functions are still in libpthread.so).

> DRD and Helgrind are doing essentially the sme thing for the intercepts, but
> differ a bit in the details. For DRD, see drd_pthread_intercepts.c where the
> PTH_FUNCS macro generates 3 wrapper functions covering "name", "name@*" and
> "name$*". Bart extended this for both libc and libpthread (but only
> libthread with MUSL)

Is libthread (without p) the name of the pthread_ functions library in MUSL?
 
> Hence I now see
> 
> paulf> nm vgpreload_drd-amd64-linux.so | grep pthread | grep cond | grep
> destroy
>                  U pthread_cond_destroy
> 0000000000017c90 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroy
> 0000000000017f70 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZAZa
> 0000000000018250 T _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZDZa
> 0000000000017e00 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroy
> 00000000000180e0 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZAZa
> 00000000000183c0 T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZDZa

OK, for this I always need to reread the Z-encoding description in pub_tool_redir.h

So libcZdsoZa is the normal (glibc) libc.so*
And libpthreadZdsoZd0 is the normal (glibc) libpthread.so.0

The ZAZa variant is @*
And ZDZa is $*

I assume we need the @* variant to intercept any versioned symbols.
I am not sure why we need the $* variants, what does that match?

> Helgrind is more on a case by case basis (which means the code is a bit
> longer but only the required wrappers are generated). So now I see
> 
> paulf> nm vgpreload_helgrind-amd64-linux.so | grep pthread | grep cond |
> grep destroy 
> 000000000000a4f9 t pthread_cond_destroy_WRK
> 000000000000d87d W _vgw00000ZZ_libcZdsoZa_pthreadZucondZudestroyZAZa
> 000000000000d87d T _vgw00000ZZ_libpthreadZdsoZd0_pthreadZucondZudestroyZAZa
> 
> The difference here is that DRD has 3 separate functions whilst Helgrind has
> just one plus an alias. They both work the same way with the redirection
> mechanism - they're just symbols and calling them does the same thing.

OK, and drd uses VG_WRAP_FUNCTION_ZZ from pub_tool_redir.h and helgrind uses I_WRAP_SONAME_FNNAME_ZZ from valgrind.h. Which seem identical, but use slightly different macro expansions.

So here we only care about the versioned variants (ZAZa).
And the one in libc.so.* is weak (why?)

> The differences between Linux and other OSes (Solaris, FreeBSD and macOS)
> concern the names of the libc / libppthread libraries and the mangling used).

Right, I see them defined in pub_tool_redir.h as VG_Z_LIBC_SONAME and VG_Z_LIBPTHREAD_SONAME
Comment 12 Mark Wielaard 2020-11-27 15:20:38 UTC
So one issue I had missed is that VG_Z_LIBPTHREAD_SONAME is defined as libcZdZa (libc.*) is the source of the problem with MUSL because when we duplicate the wrappers for VG_Z_LIBPTHREAD_SONAME and VG_Z_LIBC_SONAME in that case.

If we want to also fix this for musl then maybe redir.h should define something like VG_LIBPTHREAD_IS_LIBC_SONAME which is 1 for MUSL, but 0 for anything else and/or define something like VG_LIBPTHREAD_FUNCS_MIGHT_BE_IN_LIBC_SONAME which would be 1 for glibc and indicates that all pthread_ function wrappers need to be duplicated?
Comment 13 Mark Wielaard 2020-11-27 15:52:47 UTC
OK, I think I finally understand your patch, the alias is needed as declaration, but then you still need to define the implementation. In drd the implementation is done by calling one of the arguments of the macro, but in helgrind the implementation follows the macro as a function body, so you cannot refer to it in the PTH_FUNC macro itself.

I am still somewhat unclear on why it needs to be a weak alias, but I understand at least why you want an function alias and cannot simply declare two functions in the helgrind case.
Comment 14 Paul Floyd 2020-11-29 15:34:39 UTC
Created attachment 133729 [details]
Try to make clearer conditional macros

I've split the possibilities into

VG_WRAP_THREAD_FUNCTION_LIBPTHREAD_ONLY
VG_WRAP_THREAD_FUNCTION_LIBC_ONLY
and
VG_WRAP_THREAD_FUNCTION_LIBC_AND_LIBPTHREAD

This now affects redir.c and DRD drd_pthread_intercepts.c, which I changet to keep consistent.

MUSL no longer uses VG_Z_LIBPTHREAD_SONAME to refer to libc.*
Comment 15 Paul Floyd 2020-11-29 15:37:58 UTC
Forgot to say. the testing that I did for this. No change on any platform.

Fedora 33 amd64
Alpine Linux (for MUSL) - all Helgrind tests are failing anyway, but at least it builds OK
Solaris 11.3
OS X 10.7.5 - again most Helgrind tests fail.
Comment 16 Bart Van Assche 2020-11-29 22:57:39 UTC
ifdef-code like the following is hard to maintain:

+#if defined(VGO_darwin)
+#define VG_WRAP_THREAD_FUNCTION_LIBPTHREAD_ONLY
+#elif defined(VGO_solaris) || (defined(VGO_linux) && defined(MUSL_LIBC))
+#define VG_WRAP_THREAD_FUNCTION_LIBC_ONLY
+#elif defined(VGO_linux)
+#define VG_WRAP_THREAD_FUNCTION_LIBC_AND_LIBPTHREAD
+#else
+#  error "Unknown platform"
+#endif

Please change this into configure tests that verify whether pthreads functions exist in libpthread, libc or in both.
Comment 17 Paul Floyd 2020-11-30 09:31:57 UTC
> Please change this into configure tests that verify whether pthreads
> functions exist in libpthread, libc or in both.

Hi Bart

That's a good idea, but the devil is in the details. I have two fairly big questions.

1) which functions to test and
2) how to test for them

For point 1, for the platforms that I know of at the moment

Fedora 33
There are 34 pthread functions in libc and 72 in libpthread (28 only in libc, 66 only in libpthread and 6 in both).

(To count these I used

nm /lib64/libc.so.6 | grep " T " | grep " pthread" | sed 's/@.*// | awk '{print $3}' | sort -u > c

and 

nm /lib64/libpthread.so.0  | grep " T " | grep " pthread" | sed 's/@.*// | awk '{print $3}' | sort -u > p

to generate two text file lists c and p for libc and libpthread respectively, then I used combinations of

comm c p -2 -3 | wc
comm c p -1 -3 | wc
comm c p -1 -2 | wc

to count libc, libpthread and both.

This is a bit simplistic as I'm not counting W symbols).

For comparison, openSuSE Leap 15.2 which uses glibc 2.26 has 7 only in libc 70 only in libpthread and 24 in both

Linux MUSL, everything is in /lib/ld-musl-x86_64.so.1. libc is a symlink to this and there is no libpthread.

macOS I don't quite understand what is where, on my old macbook there is some stuff in /usr/lib/libSystem.B.dylib and /usr/lib/libpthread.dylib is just a symlink to this.

Solaris, libc contains everything and libpthread exists but just contains absolute symbols.

FreeBSD, pthread functions are in libpthread but sem functions are in libc.

Next, question b) how to test this? I'm no expert in autoconf. It seems to me that the AC_CHECK_LIB and AC_SEARCH_LIBS are not suitable since they don't exclude libc, so this would require full AC_LINK_IFELSE style tests using -nostlib and then either -lc or -lpthread, for each required platform variation.
Comment 18 Paul Floyd 2020-12-02 08:57:24 UTC
Created attachment 133799 [details]
Add configure time checking

This patch now uses configure. The tests are fairly complicated, but seem to work OK for Fedora 33, Alpine Linux (musl), Solaris 11.3 and FreeBSD. However, I don't get what I expect for macOS. I only have access to 1 mac running 10.7 to test this on, and Apple's libc/libpthread are rather different and change fairly often.

Bottom line is that I'm not entirely sure that this is better than the prvious patch.
Comment 19 Mark Wielaard 2020-12-03 11:15:24 UTC
(In reply to Paul Floyd from comment #14)
> Created attachment 133729 [details]
> Try to make clearer conditional macros
> 
> I've split the possibilities into
> 
> VG_WRAP_THREAD_FUNCTION_LIBPTHREAD_ONLY
> VG_WRAP_THREAD_FUNCTION_LIBC_ONLY
> and
> VG_WRAP_THREAD_FUNCTION_LIBC_AND_LIBPTHREAD
> 
> This now affects redir.c and DRD drd_pthread_intercepts.c, which I changet
> to keep consistent.
> 
> MUSL no longer uses VG_Z_LIBPTHREAD_SONAME to refer to libc.*

Shouldn't that first #if defined(VGO_darwin) in drd_pthread_intercepts.c be  #if defined(VG_WRAP_THREAD_FUNCTION_LIBPTHREAD_ONLY) ?

Otherwise looks good. 
This looks good, thanks. I am testing this with Fedora 33+ now.
Comment 20 Mark Wielaard 2020-12-03 11:24:50 UTC
(In reply to Paul Floyd from comment #18)
> Created attachment 133799 [details]
> Add configure time checking
> 
> This patch now uses configure. The tests are fairly complicated, but seem to
> work OK for Fedora 33, Alpine Linux (musl), Solaris 11.3 and FreeBSD.
> However, I don't get what I expect for macOS. I only have access to 1 mac
> running 10.7 to test this on, and Apple's libc/libpthread are rather
> different and change fairly often.
> 
> Bottom line is that I'm not entirely sure that this is better than the
> prvious patch.

I find the logic in the previous patch much easier to follow. The configure tests look complex/fragile. Also I am not sure the first ac_thread_fns_only_in_libc=yes is correct. Why couldn't those (or other) functions then not also be in libpthread?

Normally I would agree with Bart, but I think this is one example of when configure tests are actually harder to keep correct than the simple ifdefs based on platform.
Comment 21 Paul Floyd 2020-12-03 14:40:33 UTC
(In reply to Mark Wielaard from comment #19)

> Shouldn't that first #if defined(VGO_darwin) in drd_pthread_intercepts.c be 
> #if defined(VG_WRAP_THREAD_FUNCTION_LIBPTHREAD_ONLY) ?
> 
> Otherwise looks good. 
> This looks good, thanks. I am testing this with Fedora 33+ now.


I assumes that the never-used call to fflush() is specific to Darwin, and Darwin is the only platform that wraps only VG_Z_LIBPTHREAD_SONAME currently.

If we go with this patch I'll add a comment.
Comment 22 Paul Floyd 2020-12-03 14:54:26 UTC
(In reply to Mark Wielaard from comment #20)

> I find the logic in the previous patch much easier to follow. The configure
> tests look complex/fragile. Also I am not sure the first
> ac_thread_fns_only_in_libc=yes is correct. Why couldn't those (or other)
> functions then not also be in libpthread?

Indeed I'm sure it is the case that functions can be in both libraries.
I'm not sure which platforms use this, and which functions are impacted,
but I believe that one way of doing things is to have stub functions in
libc that are weak symbols and then to have the full implementations in
libpthread. So when the application is not linked against libpthread you
get single threaded behaviour. When libpthread is linked, the full versions
get chosesn by the link loaded and the application runs multithreaded.

ac_thread_fns_only_in_libc=yes is an empirical approximation to the truth
based on my experience and a bit of trial and error.
 
> Normally I would agree with Bart, but I think this is one example of when
> configure tests are actually harder to keep correct than the simple ifdefs
> based on platform.

I'll wait to see what Bart says before proceeding.
Comment 23 Bart Van Assche 2020-12-04 04:48:09 UTC
On 12/3/20 6:54 AM, Paul Floyd wrote:
> https://bugs.kde.org/show_bug.cgi?id=428909
> 
> --- Comment #22 from Paul Floyd <pjfloyd@wanadoo.fr> ---
> (In reply to Mark Wielaard from comment #20)
>>
>> Normally I would agree with Bart, but I think this is one example of when
>> configure tests are actually harder to keep correct than the simple ifdefs
>> based on platform.
> 
> I'll wait to see what Bart says before proceeding.

If Mark agrees to maintain the #ifdef's, I'm fine with the ifdefs.
Comment 24 Paul Floyd 2020-12-04 09:16:28 UTC
I'll also keep an eye on this for other platforms.