Bug 206013 - valgrind redirection doesn't work with STT_GNU_IFUNC symbols
Summary: valgrind redirection doesn't work with STT_GNU_IFUNC symbols
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.5 SVN
Platform: Compiled Sources Linux
: NOR major
Target Milestone: blocking3.5.1
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-09-02 17:37 UTC by Jakub Jelinek
Modified: 2011-08-12 15:46 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Candidate patch against valgrind trunk (18.68 KB, patch)
2009-09-30 00:20 UTC, Dodji Seketeli
Details
output of "valgrind -v /bin/true" without the patch (28.22 KB, patch)
2009-09-30 00:22 UTC, Dodji Seketeli
Details
Output of "valgrind -v /bin/true" with the patch (2.34 KB, patch)
2009-09-30 00:23 UTC, Dodji Seketeli
Details
Updated patch (20.69 KB, patch)
2009-10-02 15:32 UTC, Dodji Seketeli
Details
Another patch trying another approach (6.06 KB, patch)
2009-10-04 20:44 UTC, Dodji Seketeli
Details
Another candidate patch (29.20 KB, patch)
2009-10-07 16:39 UTC, Dodji Seketeli
Details
Updated patch (29.38 KB, patch)
2009-10-07 21:27 UTC, Dodji Seketeli
Details
Updated patch that passes regtests ;) (31.35 KB, patch)
2009-10-13 16:24 UTC, Dodji Seketeli
Details
New IFUNC patch (18.27 KB, patch)
2009-10-28 13:48 UTC, Tom Hughes
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2009-09-02 17:37:18 UTC
glibc 2.10.90*+ on x86_64 uses STT_GNU_IFUNC relocations for many string functions.  Unfortunately this seems to break symbol redirection, because what you have in the symbol table for STT_GNU_IFUNC symbols is not say strstr address, but address of a function which returns the address of a strstr implementation.

To fix this, for STT_GNU_IFUNC symbols I think either we should redirect the address of the indirect function to a function which returns address of the real redirected strstr etc. (this assumes valgrind redirection happens before the shared library is actually relocated), or we'll need to call this function and redirect whatever address it returned.

Fedora development (rawhide) shows this problem on basically all binaries on x86_64.
Comment 1 Dodji Seketeli 2009-09-30 00:20:36 UTC
Created attachment 37249 [details]
Candidate patch against valgrind trunk

The attached is an early implementation of indirect functions (of type STT_GNU_IFUNC) redirection.

Here is a little bit of context about what it does.

For instance, in the case of strlen, the idea is to redirect the indirect strlen function in libc to a little function in valgrind that takes no parameter and returns a pointer to the valgrind replacement of strlen.

This approach seemed quite simple to me. So I provided redirections for a couple of indirect functions, namely strlen, strcmp and strncmp.

There is another issue that appeared. In glibc, the indirect function (say strlen) returns a pointer to an actual function that is going to be used by objects that reference strlen. That actual function name starts with the string "__GI_". In the case of strlen, the actual function is __GI_strlen, for instance.
The issue is that glibc itself call those __GI_* directly without calling the indirect strlen.

So we need to provide redirection for those __GI_* functions as well.

The attached patch provides redirection for a couple of __GI_* functions as well as redirection of a couple of indirect functions. If the approach of the patch seems correct, I can go on and add much more redirections.

To be able to redirect indirect functions, the patch added support for STT_GNU_IFUNC (indirect functions) into the elf reading machinery. It also amended the redirection machinery to redirect indirect functions to known indirect functions defined in valgrind.

For the sake of clarity, the contains in preamble a GNU style detailed changelog. I believe I will have to add more comments in the code itself instead, but I will wait for comments on the global approach before diving into that.

As far as results are concerned, I could make valgrind -v /bin/true display 0 errors with the patch, whereas it  would yield many errors without the patch. I will shortly attach a file with the errors reported by valgrind -v /bin/true without the patch.

Comments ?
Comment 2 Dodji Seketeli 2009-09-30 00:22:11 UTC
Created attachment 37250 [details]
output of "valgrind -v /bin/true" without the patch
Comment 3 Dodji Seketeli 2009-09-30 00:23:09 UTC
Created attachment 37251 [details]
Output of "valgrind -v /bin/true" with the patch
Comment 4 Dodji Seketeli 2009-10-02 15:32:56 UTC
Created attachment 37327 [details]
Updated patch

This is an updated and hopefuly more solid patch than the previous one.
Comment 5 Dodji Seketeli 2009-10-04 20:36:56 UTC
(In reply to comment #4)
> Created an attachment (id=37327) [details]
> Updated patch

There is at least one serious problem with the patch above.

There are sometimes references to STT_GNU_IFUNC functions that yields assembly code that references the GOT entry of the said indirect function.

Consider for instance this code snippet taken from the gconf_client_flush_notifies function in gconf-client.c:

~=~
to_notify = g_slist_sort (client->notify_list, (GCompareFunc) strcmp);
~=

Taking the address of the strcmp function (that is a STT_GNU_IFUNC function) yields this assembly snippet, right before calling the g_slist_sort function:

0x00000031c76263ca mov  0x215de7(%rip),%rsi

The rsi register is the last parameter of the g_slist_sort function. So it is loaded with 0x215de7(%rip) which is the address of the GOT table entry of the strcmp function.

That GOT table entry should be filled (by the dynamic linker) with the absolute address of the target of the indirect strcmp function, _only when_ the indirect strcmp is *called* for the first time.

The problem is that the patch redirects the indirect strcmp function to another implementation provided by valgrind. So the initial indirect strcmp is *never* called. So the GOT entry 0x215de7(%rip) is *never* updated by the dynamic linker.

The net effect is that g_slist_sort is passed garbage (NULL in our case) instead of the address of strcmp.
Comment 6 Dodji Seketeli 2009-10-04 20:44:39 UTC
Created attachment 37363 [details]
Another patch trying another approach

This patch just redirects the targets of the STT_GNU_IFUNC functions of the libc that we are interested in.

It's much simpler than the previous one and it seems to work better.
Comment 7 Jakub Jelinek 2009-10-05 18:27:16 UTC
IMHO we want something like:

Add
Addr ifunc_to_addr;
field to Active type in m_redir.c.

In generate_and_add_actives when seeing a STT_GNU_IFUNC                                
(seginfo_syms_getidx would return boolean another arg by reference),
set act.to_addr to wrap_ifunc below and set act.ifunc_to_addr to
sp->to_addr.

void *wrap_ifunc (void)
{
    OrigFn fn;
    void *result;
    Active *old, act;   
    VALGRIND_GET_ORIG_FN(fn);
    CALL_FN_W_v(result, fn);
    act.from_addr = (Addr) fn;
    old = VG_(OSetGen_Lookup)( activeSet, &act.from_addr ); 
    vg_assert (old != NULL);
    vg_assert (old->ifunc_to_addr != NULL);
    act = *old;
    act.from_addr = (Addr) result;
    act.to_addr = old->ifunc_to_addr;
    act.ifunc_to_addr = NULL;
    maybe_add_active (&act);
    return result;
}

That way, we'll run the original indirect function to tell us which of the possible many implementations of the string routine will be used and record redirection for it when it finishes.  It is possible this can't work this exact way, if wrap_ifunc runs as emulated and the old = VG_(OSetGen_Lookup)( activeSet, &act.from_addr ); through maybe_add_active must run natively, then some sort of special insn to get this behavior from wrap_ifunc would be needed.
Comment 8 Dodji Seketeli 2009-10-07 16:39:44 UTC
Created attachment 37425 [details]
Another candidate patch

Jakub, thanks for your explanation.

This patch should implement the idea you expressed, augmented by the discussions we had on IRC. I have tested it on x86_64.
Comment 9 Jakub Jelinek 2009-10-07 17:05:51 UTC
Minor comments:
1) you should wrap all uses of STT_GNU_IFUNC with #ifdef STT_GNU_IFUNC, not just some
2) why do you define VG_REPLACE_I_FUNCTION_ZZ at all?  Also, isn't VG_WRAP_I_FUNCTION_ZZ meant to be used just once, thus shouldn't be documented that it doesn't need to be added for each symbols?
3) the __GI_ redirections should be wrapped with #if defined(VGO_linux), those are glibc specific.
Comment 10 Dodji Seketeli 2009-10-07 21:27:54 UTC
Created attachment 37434 [details]
Updated patch

Jakub, thanks for the review.
This attached patch should address your comments.
Comment 11 Dodji Seketeli 2009-10-13 16:24:26 UTC
Created attachment 37556 [details]
Updated patch that passes regtests ;)

For a given ifunc symbol, this patch only activates a given ifunc --> normal func redirection only if there is a nomal func --> normal func redirection that matches the ifunc symbol.

For instance, suppose we have an strlen ifunc in libc that points to the normal function __GI_strlen.

If a redirection strlen --> vg_strlen is registered, then, and only then, the ifunc strlen will be redirected to a wrapper that will execute the ifunc strlen, get the pointer to the __GI_strlen it returns and activate the __GI_strlen --> vg_strlen redirection.

Otherwise, if no strlen --> vg_strlen redirection is registered, nothing happens. No redirection is activated with respect to strlen.

This patch does seem to pass valgrind regression tests on x86_64 Fedora Rawhide (which is somewhere between Fedora 12 Alpha and Fedora 12 final at the moment).
Comment 12 Tom Hughes 2009-10-28 11:30:35 UTC
Can you explain why you going to the trouble of doing the conditional redirection?

If there is a redirection for the strlen IFUNC and we see a strlen IFUNC surely we should redirect it, regardless of whether or not there is any redirection for ordinary strlen? Why would anybody add one redirection to valgrind and not the other anyway?
Comment 13 Jakub Jelinek 2009-10-28 12:16:43 UTC
Older versions of the patch were redirecting all STT_GNU_IFUNC symbols, no matter whether they need to be redirected or not.  STT_GNU_IFUNC can be used for all kinds of symbols, and for 99.99% of symbols valgrind doesn't want to redirect them.

STT_GNU_IFUNC is just a type of the symbol some relocations resolve to, so you don't have ordinary and IFUNC strlen at the same time (well, in theory you could have it in different DSOs, but strlen relocations will resolve to just one of them anyway).
Comment 14 Tom Hughes 2009-10-28 12:33:33 UTC
Sure, but the current implementation goes beyond just catching the ones which need to be redirected and does this complicated dance to match up the indirect redirection with a normal redirection.

It looks like the main purpose is to avoid adding the "real" target address of the redirection to the active table and instead keep it in a separate table, but I'm still working through the patch at the moment.
Comment 15 Dodji Seketeli 2009-10-28 12:40:18 UTC
You can think of it this way.

What you want to redirect is, say, strlen. You don't want to know if strlen actually resolves to an STT_GNU_IFUNC or not. You just redirect the strlen symbol.

It under the hood it resolves to a strlen symbol, then the core will take of redirecting the referenced IFUNC symbol.
Comment 16 Julian Seward 2009-10-28 12:42:41 UTC
(In reply to comment #14)
> [...] complicated dance to match up the indirect
> redirection with a normal redirection.

+1 for the simplest possible and least Linux-specific solution to
this.  The redirection stuff is already more complex than I'd prefer.
Comment 17 Tom Hughes 2009-10-28 12:42:56 UTC
I understand what it's doing now - there is only actually one IFUNC wrapper which is actually a wrapper from *:* that matches any function in any SO and so the code then only triggers when there is also any ordinary redirection that matches.

I can't help feeling that might be just a bit too clever though, and it might be simpler just to explicitly declare ifunc wrappers for the functions which need them.

My concern is about how this works in the general case, as the current setup requires that valgrind sees any ifunc wrapper before the corresponding ordinary redirect. A wrapper for *:* also makes it difficult for anybody to ever create a more specific wrapper if they wanted.
Comment 18 Jakub Jelinek 2009-10-28 12:50:50 UTC
I also think having two hash tables is unnecessary.  We just need some magic way how to find the magic single ifunc wrapper in vgpreload*core*.so.  We'd add
Addr to_ifunc_addr; to Active type, then whenever trying to redirect some symbol which is defined with STT_GNU_IFUNC type we'd just put the magic ifunc wrapper address into to_addr and the real target into to_ifunc_addr.  The magic ifunc wrapper then could use the standard hash table for redirections, and would just create a new redirection with the discovered address as from_addr and to_ifunc_addr as to_addr.
Comment 19 Tom Hughes 2009-10-28 12:52:11 UTC
It's OK I've worked out how to do it I think. Instead of exposing the concept of ifunc wrapping to tools it will just be something the core magically does.

Patch coming shortly...
Comment 20 Tom Hughes 2009-10-28 13:48:18 UTC
Created attachment 37931 [details]
New IFUNC patch

Here's my version of the IFUNC patch.

It adds the same wrapper function as before but uses VG_NOTIFY_ON_LOAD to tell the core about it rather than exposing a general system for IFUNC wrapping to all the tools.

The core then returns the address of that wrapper as the target for any IFunc redirection instead of the to_addr from the redirection and the client request that wrapper makes installs a normal redirection using the to_addr from the original IFunc redirection.

Tested on rawhide and appears to work.
Comment 21 Tom Hughes 2009-10-29 10:28:19 UTC
Committed as r10920.
Comment 22 Julian Seward 2011-08-12 15:46:41 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > Created an attachment (id=37327) [details] [details]
> > Updated patch
> 
> There is at least one serious problem with the patch above. [...]

Using a marginally modified version of this patch, I cannot reproduce
the problem, at least on Ubuntu 10.04.2 x86_64 (glibc-2.11) when
running Firefox.