Bug 275284 - Valgrind memcpy/memmove redirection stopped working in glibc 2.14/x86_64
Summary: Valgrind memcpy/memmove redirection stopped working in glibc 2.14/x86_64
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.6 SVN
Platform: Unlisted Binaries Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL: https://bugzilla.redhat.com/show_bug....
Keywords:
: 275212 278502 281828 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-06-09 14:36 UTC by Jakub Jelinek
Modified: 2011-09-11 21:41 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
valgrind-amd64-linux-memcpy.patch (3.41 KB, patch)
2011-06-09 14:54 UTC, Jakub Jelinek
Details
Last piece of the puzzle (first attempt) (35.64 KB, patch)
2011-08-16 15:43 UTC, Julian Seward
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2011-06-09 14:36:09 UTC
glibc 2.14+ contains a workaround for lame programs whose programmers don't understand the difference between memcpy and memmove.
In particular, we have:
  1103: 00000035e7a83160    71 IFUNC   GLOBAL DEFAULT   12 memcpy@GLIBC_2.2.5
  1104: 00000035e7a88d40    60 IFUNC   GLOBAL DEFAULT   12 memcpy@@GLIBC_2.14
  1944: 00000035e7a83160    71 IFUNC   GLOBAL DEFAULT   12 memmove@@GLIBC_2.2.5
in libc.so.6 .dynsym and:
  5575: 00000035e7a83160    71 IFUNC   GLOBAL DEFAULT   12 memcpy@GLIBC_2.2.5
  6284: 00000035e7a83160    71 IFUNC   GLOBAL DEFAULT   12 memmove
  7443: 00000035e7a88d40    60 IFUNC   GLOBAL DEFAULT   12 memcpy@@GLIBC_2.14
in libc.so.6 .symtab.  The reason for this is to keep old broken apps like flash which Adobe wasn't able to fix after many months since it has been reported to them, old libraries or binaries when using memcpy will just have it implemented as memmove, and only newly linked apps will perform real memcpy.
Unfortunately it seems that then neither memcpy nor memmove are then redirected
to memcheck's own version.
Given a short testcase compiled with -O0:
extern void *memcpy (void *, const void *, __SIZE_TYPE__);
extern void *memmove (void *, const void *, __SIZE_TYPE__);
extern void *malloc (__SIZE_TYPE__);
extern void *memset (void *, int, __SIZE_TYPE__);
extern int posix_memalign(void **, __SIZE_TYPE__, __SIZE_TYPE__);

int
main (void)
{
  int n = 128 * 1024 * 1024;
  void *p, *q;
  if (posix_memalign (&p, 4096, n - 3)
      || posix_memalign (&q, 4096, n - 3))
    return 1;
  memset (p, 1, n - 3);
  memmove (q, p, n - 3);
  return 0;
}

I see:
valgrind -v -v --trace-symtab=yes /tmp/zzz 2>&1 | grep '\(prefer\|REDIR\).*mem\(cpy\|move\|set\)'
sym at 0x35e7a82fd0: prefer '__GI_memmove' to '__memmove_sse2'
sym at 0x35e7a83160: prefer 'memcpy' to 'memmove'
sym at 0x35e7a83160: prefer 'memcpy@GLIBC_2.2.5' to 'memmove'
sym at 0x35e7a831f0: prefer '__GI_memset' to '__memset_sse2'
sym at 0x35e7a88d40: prefer 'memcpy@@GLIBC_2.14' to 'memcpy'
sym at 0x35e7a88d90: prefer '__GI_memcpy' to '__memcpy_sse2'
sym at 0x35e7a90430: prefer 'wmemcpy' to '__wmemcpy'
sym at 0x35e7a90450: prefer 'wmemset' to '__GI_wmemset'
sym at 0x35e7af4870: prefer '__memset_chk' to '__memset_zero_constant_len_parameter'
sym at 0x35e7a83160: prefer 'memcpy@GLIBC_2.2.5' to 'memcpy'
sym at 0x35e7a88d40: prefer 'memcpy@@GLIBC_2.14' to '__new_memcpy'
sym at 0x35e7a90440: prefer 'wmemmove' to '__wmemmove'
--20575-- REDIR: 0x35e7a831b0 (memset) redirected to 0x48015d0 (_vgnU_ifunc_wrapper)
--20575-- REDIR: 0x35e7a831f0 (__GI_memset) redirected to 0x4a08990 (memset)

As you can see, memmove wasn't redirected, supposedly because storage.c preferred memcpy over memmove there, and then preferred memcpy@GLIBC_2.2.5 over memcpy for the same address.
Comment 1 Tom Hughes 2011-06-09 14:44:08 UTC
Is this related to #275212 perhaps - if we're not catching it then glibc's selection routine kicks in and sends us off to the ssse3 versions?
Comment 2 Jakub Jelinek 2011-06-09 14:54:48 UTC
Created attachment 60827 [details]
valgrind-amd64-linux-memcpy.patch

Untested patch (except for this testcase).
It would be much better if we could differentiate based on whether the caller was calling memcpy@GLIBC_2.2.5 or memmove or memmove@@GLIBC_2.2.5, if the former, do the overlapping checks, otherwise avoid those, but not sure how to do that when
they are calling the same address in libc.so.6.
Comment 3 Jakub Jelinek 2011-06-09 14:57:10 UTC
Yeah, #275212 should be closed as a dup of this.  Special casing just one of the many memcpy/memmove implementations is going to work only on some hw and not on other hw.
Comment 4 Tom Hughes 2011-06-09 15:03:01 UTC
I was thinking that a solution might be to prefer the linkable symbols (with @@ in them) over the legacy symbols (with the single @ in them).

The problem with that though is that we don't currently see the version decorations on the name when processing the dynamic symbol table (hence the appearance of the plain memcpy symbol).

It looks like .symtab actually adds the version decoration to the name, but .dynsym must store the version as separate metadata. Actually it looks like it does it by adding symbols with a name of GLIBC_N.NN and size zero, with each symbol being versioned according to the previous such symbol?
Comment 5 Tom Hughes 2011-06-09 15:04:40 UTC
*** Bug 275212 has been marked as a duplicate of this bug. ***
Comment 6 Jakub Jelinek 2011-06-09 15:28:50 UTC
.symtab only contains explicit @ and @@s when the source contained them through an .symver directive.  Normally this isn't done, then only one symbol of that name is defined and symbol version script determines which symbol version it gets during linking (if doing symbol versioning at all).  For the case where one symbol needs to be available in more than once version, or when it is only a compatibility symbol that shouldn't be available for linking against for new programs/libraries, .symver directives are used and then it shows up in .symtab.

For dynamic symbol resolution, the @ and @@ chars and symbol version names indeed don't appear in .dynsym, you need to figure it out from .gnu.version section (SHT_GNU_versym).  See http://www.akkadia.org/drepper/symbol-versioning for details.  .gnu.version section contains 2 bytes per symbol, which includes symbol version id and in msb bit flag whether it is a hidden symbol or not (if set, it is @ symbol, if unset it is @@ symbol).

Though, the symbol versions used for glibc stringop/malloc etc. symbols differ a lot between different targets, so if those were to be used always, it would be quite hard to maintain.

FOr the sole purpose of this memcpy alias to memmove and new memcpy, I think my patch should work if .symtab is available, and if it is not available perhaps just a hack to prefer memmove over memcpy symbol (as an exception in storage.c (prefersym) could do the job.
Comment 7 Julian Seward 2011-07-25 08:47:07 UTC
I read through this a couple of times.  I'm trying to get the trunk
to work sanely on a Sandy Bridge box running F15, which it doesn't
right now.

I'm not sure I understand all the details.  But one thing that strikes
me is that the problem exists (or is made worse?) because our symbol
table mechanism has the assumption that only one label is associated
with each address.  Hence the preferSym hack (yes, it is a hack) and
the problems above.

Would it help to redo the symbol table mechanism so that each address
(or, really, address range) can be associated with arbitrarily many
symbols?  That seems to me like a necessary infrastructural change to
fix this properly.  I volunteer to do it if it would help.
Comment 8 Jakub Jelinek 2011-07-25 09:04:33 UTC
That alone wouldn't help I'm afraid.  If the redirection is based on addresses of the called functions and one of memcpy@GLIBC_2.2.5 and memmove@@GLIBC_2.2.5 is called, there is no easy way to find out which one of those it was.  And with IFUNC it isn't enough to even adjust based on what the IFUNC indirect function is returning, because here the IFUNC symbols are aliased.
Comment 9 Tom Hughes 2011-07-25 09:09:04 UTC
The redirection isn't based on addresses, it's based on names, which is the problem.

Basically at the moment when an so is loaded we process the symbol table and choose a name for each function (a single name) then we match that name against our list of functions to trap and if it matches we trap it.

Julian is proposing to allow multiple names to be assigned to a function and then, presumably, to trap it if any of the names matches.

Of course the problem here would be whether to use the memcpy or memmove trap for it, as both would match...
Comment 10 Julian Seward 2011-07-25 10:34:03 UTC
(In reply to comment #9)
> Julian is proposing to allow multiple names to be assigned to a function and
> then, presumably, to trap it if any of the names matches.

Yes.

> Of course the problem here would be whether to use the memcpy or memmove trap
> for it, as both would match...

In this case we could have both of the memcpy and memmove traps behave
like memmove, and then it wouldn't matter (correct?)  IIRC the memcpy
trap already handles overlapping areas, for the sake of robustness.
Comment 11 Tom Hughes 2011-07-25 21:25:54 UTC
*** Bug 278502 has been marked as a duplicate of this bug. ***
Comment 12 Julian Seward 2011-07-26 17:00:53 UTC
I'm mulling over the following fix.

Firstly, change how symbols are stored (DiSym) so that we can record
the fact that multiple symbols may have the same address (range).  At
the moment we can't even accurately represent what's in the ELF symbol
tables since our representation is based on the broken assumption that
only one symbol is associated with any given address.

That said, there still needs to be a way to give some preferredness
ranking to multiple symbols with the same address.  IOW, prefersym()
needs to be preserved in some form.  This is because, when we come to
map an address to a symbol (when making backtraces, basically) we
don't want to print all the candidates, just the "most preferred"
candidate.

When m_redir processes a newly mapped in library, it examines all the
symbols, not just the most-preferred ones (which is effectively is
what is happening at present).  By doing this it may discover that
some of the new symbols must be redirected.

Now the question arises, as per comment #9 para 4, of what happens if
(eg) the same piece of code is labelled "memcpy" and "memmove".  I
think this is simple.  We look through our collection of redir
specifications, and find a redirect address for each symbol.  If they
are all the same then there is no conflict and we use that.  If
they're different then there's an inconsistency and we abort the run.

Darn.  That last bit isn't going to work, since the replacements for
memcpy and memmove may be cut-n-paste (hence, functionally) identical
at the source level, but they will have different addresses.  Hmm.
Back to the drawing board.
Comment 13 Julian Seward 2011-07-27 12:41:58 UTC
(In reply to comment #12)
> Darn.  That last bit isn't going to work, since the replacements for
> memcpy and memmove may be cut-n-paste (hence, functionally) identical
> at the source level, but they will have different addresses.

One semi-kludge that would work is as follows.  Currently the
replacement function names encode two pieces of text, a soname pattern
and a function name pattern, which are used to determine which symbols
it is a replacement for, in the general format

   _vgrZZ_sonamepatt_fnnamepatt

We could add to that, a third piece of text, giving a "behavioral
equivalence class":

   _vgrZZ_memcopylike_sonamepatt_fnnamepatt

with the understanding that two replacement functions with the same
equivalance class text are functionally identical.  Then, when
multiple different replacements are found for the multiple symbols
associated with a given address, if they all have the same equivalance
class text, we're OK.

-----

Urr.  That's still a kludge.  Really, need a way to separate the
specification of replacement functions from the statement of which
functions they replace.  Currently those two concepts are conflated.
Comment 14 Julian Seward 2011-08-15 10:18:26 UTC
(In reply to comment #12)
> I'm mulling over the following fix.
> 
> Firstly, change how symbols are stored (DiSym) so that we can record
> the fact that multiple symbols may have the same address (range).

Done, r11981 and r11982.  Symbol table now contains, for each address,
a primary symbol and a bunch of secondary symbols.  The primary symbol
is selected using the same preferSym hack as before, and redirection
is only done against the primary symbol, so the behaviour of the
system as a whole is unchanged (this bug is not yet fixed).  But at
least now we're in a position where it can be fixed properly.

Next thing to do is to fix the redir machinery so it considers all the
symbols for a given address when processing redirections, not just the
primary symbol.
Comment 15 Julian Seward 2011-08-16 15:38:31 UTC
(In reply to comment #14)
> Next thing to do is to fix the redir machinery so it considers all the
> symbols for a given address when processing redirections, not just the
> primary symbol.

Committed as r11984, ftr.  Doesn't do the final step of resolving 
conflicting redirections, though.
Comment 16 Julian Seward 2011-08-16 15:43:02 UTC
Created attachment 62872 [details]
Last piece of the puzzle (first attempt)

Last piece of the puzzle (first attempt).  Add equivalence class
tags for most functions intercepted by memcheck.  On Linux, map
memmove and memcpy to the same function (which I think formally
fixes this bug); add an intercept for strspn(); remove a conflicting
intercepts for strlen in ld.so.  This appears to make V work sanely
on Fedora 15, x86_64.

Still WIP, but good enough for initial testing now.

Needs to be applied on top of r11984 or later.
Comment 17 Julian Seward 2011-08-16 15:47:06 UTC
(In reply to comment #2)
> It would be much better if we could differentiate based on whether the caller
> was calling memcpy@GLIBC_2.2.5 or memmove or memmove@@GLIBC_2.2.5, if the
> former, do the overlapping checks, otherwise avoid those, but not sure how to
> do that

Well, yes.  Given that this decision by the glibc maintainers
effectively makes memcpy and memmove be the same function on Linux,
I don't see how we can avoid losing overlap checking for memcpy.
Comment 18 Tom Hughes 2011-08-16 15:59:54 UTC
(In reply to comment #17)

> Well, yes.  Given that this decision by the glibc maintainers
> effectively makes memcpy and memmove be the same function on Linux,
> I don't see how we can avoid losing overlap checking for memcpy.

They're only the same for programs linked against an old glibc and run against a new one. If you link against a new one you get memcpy@GLIBC_2.14 which is a different routine which doesn't handle overlaps.
Comment 19 Julian Seward 2011-08-16 16:27:33 UTC
(In reply to comment #18)
> They're only the same for programs linked against an old glibc and run against
> a new one. If you link against a new one you get memcpy@GLIBC_2.14 which is a
> different routine which doesn't handle overlaps.

Hmm, well, I _think_ I can accommodate that, thusly:

  /* For older memcpy we have to use memmove-like semantics and skip the
     overlap check; sigh; see #275284. */
  MEMMOVE(VG_Z_LIBC_SONAME, memcpy)
  MEMCPY(VG_Z_LIBC_SONAME,  memcpyZAZAGLIBCZu2Zd14) /* memcpy@@GLIBC_2.14 */
  MEMCPY(VG_Z_LD_SO_1,      memcpy) /* ld.so.1 */
  MEMCPY(VG_Z_LD64_SO_1,    memcpy) /* ld64.so.1 */

and memcheck/tests/overlap still works -- the deliberate memcpy
overlaps are reported, as expected.
Comment 20 Tom Hughes 2011-08-16 17:01:38 UTC
Actually it doesn't even seem to need that. Using your patch, on Fedora 15, and with the memcpy mappings changed to set the map overlap flag I find my active redirections wind up including:

--366--     0x320c0831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a09770 memmove
--366--     0x320c088d90 (memcpy@@GLIBC_2.14  ) R-> (2018) 0x04a08830 memcpy

and when I call memcpy (which maps to the 2.14 version) with overlapping arguments this is what I see:

--366-- REDIR: 0x320c088d90 (memcpy@@GLIBC_2.14) redirected to 0x48025ca (_vgnU_ifunc_wrapper)
--366-- Adding redirect for indirect function 0x320c088d90 from 0x320c1352f0 -> 0x4a08830
--366-- REDIR: 0x320c1352f0 (__memcpy_ssse3_back) redirected to 0x4a08830 (memcpy)
==366== Source and destination overlap in memcpy(0x7feffecb0, 0x7feffecb0, 4092)
==366==    at 0x4A08896: memcpy (mc_replace_strmem.c:670)
==366==    by 0x40061D: main (x.c:10)
==366== 
==366== Source and destination overlap in memcpy(0x7feffecb0, 0x7feffecb0, 4092)
==366==    at 0x4A08896: memcpy (mc_replace_strmem.c:670)
==366==    by 0x400642: main (x.c:12)

then calling memmove I get:

--366-- REDIR: 0x320c0831b0 (memcpy@GLIBC_2.2.5) redirected to 0x48025ca (_vgnU_ifunc_wrapper)
--366-- Adding redirect for indirect function 0x320c0831b0 from 0x320c13a890 -> 0x4a09770
--366-- REDIR: 0x320c13a890 (__memmove_ssse3_back) redirected to 0x4a09770 (memmove)

which all looks good to me?
Comment 21 Julian Seward 2011-08-16 18:48:05 UTC
(In reply to comment #20)
> Actually it doesn't even seem to need that. Using your patch, on Fedora 15, and
> with the memcpy mappings changed to set the map overlap flag I find my active
> redirections wind up including:
> 
> --366--     0x320c0831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a09770 memmove
> --366--     0x320c088d90 (memcpy@@GLIBC_2.14  ) R-> (2018) 0x04a08830 memcpy

Well, that's good, especially considering my attempts to intercept 
the versioned symbols directly didn't work out, since they lead to
conflicts:

==11908== WARNING: new redirection conflicts with existing -- ignoring it
--11908--     old: 0x39fbc831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a08b10 memcpy@GLIBC_2.2.5
--11908--     new: 0x39fbc831b0 (memcpy@GLIBC_2.2.5  ) R-> (2022) 0x04a08870 memcpy

Do you see anything like that?


So .. I didn't understand what you meant by

> with the memcpy mappings changed to set the map overlap flag

Can you clarify?
Comment 22 Julian Seward 2011-08-16 21:50:56 UTC
(In reply to comment #16)
> Created an attachment (id=62872) [details]
> Last piece of the puzzle (first attempt)

Committed, with minor tidying, as r11985.
Comment 23 Julian Seward 2011-08-16 23:00:09 UTC
(In reply to comment #22)
> (In reply to comment #16)
> > Created an attachment (id=62872) [details] [details]
> > Last piece of the puzzle (first attempt)
> Committed, with minor tidying, as r11985.

And with MacOS fixes in r11986.

At least to a first approximation, this completes my attempts to fix
this problem.  I suspect some followup tidying will be necessary.  As
it stands at r11985, we've lost the ability to do overlap checking for
memcpy on Linux, even the new versions.  I don't know if it will be
possible to reinstate it for new glibcs.  I'm not exactly clear what
Tom is saying about this comment 20.
Comment 24 Tom Hughes 2011-08-17 08:17:55 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > Actually it doesn't even seem to need that. Using your patch, on Fedora 15, and
> > with the memcpy mappings changed to set the map overlap flag I find my active
> > redirections wind up including:
> > 
> > --366--     0x320c0831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a09770 memmove
> > --366--     0x320c088d90 (memcpy@@GLIBC_2.14  ) R-> (2018) 0x04a08830 memcpy
> 
> Well, that's good, especially considering my attempts to intercept 
> the versioned symbols directly didn't work out, since they lead to
> conflicts:

That's because we are always matching based on the undecorated symbol without the version so you can't add a matcher for a specific version.

> ==11908== WARNING: new redirection conflicts with existing -- ignoring it
> --11908--     old: 0x39fbc831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a08b10
> memcpy@GLIBC_2.2.5
> --11908--     new: 0x39fbc831b0 (memcpy@GLIBC_2.2.5  ) R-> (2022) 0x04a08870
> memcpy
> 
> Do you see anything like that?

When I add -v then I do unfortunately see that, yes. I also added annotations to maybe_add_active to report both accepted redirections, and ones ignored as equivalent, and I see this for memmove:

==27728== Adding active redirection:
--27728--     new: 0x3461e831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a09770 memmove
==27728== WARNING: new redirection conflicts with existing -- ignoring it
--27728--     old: 0x3461e831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a09770 memmove
--27728--     new: 0x3461e831b0 (memcpy@GLIBC_2.2.5  ) R-> (2022) 0x04a08830 memcpy
==27728== WARNING: new redirection conflicts with existing -- ignoring it
--27728--     old: 0x3461e831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a09770 memmove
--27728--     new: 0x3461e831b0 (memcpy@GLIBC_2.2.5  ) R-> (2022) 0x04a08830 memcpy

and then for memcpy:

==27728== Adding active redirection:
--27728--     new: 0x3461e88d90 (memcpy@@GLIBC_2.14  ) R-> (2022) 0x04a08830 memcpy

So basically everything is working only because of the order in which the specs are processed and the fact that the memove one is matched first.

> 
> So .. I didn't understand what you meant by
> 
> > with the memcpy mappings changed to set the map overlap flag
> 
> Can you clarify?

Well in terms of the modified patch you committed I changed MEMMOVE to MEMCPY in the three memcpy match lines for linux in me_replace_strmem.c. Without that we don't get the clashes, but also don't get overlap checks:

==25498== Adding active redirection:
--25498--     new: 0x3461e831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a09620 memmove
==25498== Ignoring duplicate redirection:
--25498--     old: 0x3461e831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a09620 memmove
--25498--     new: 0x3461e831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a08830 memcpy
==25498== Ignoring duplicate redirection:
--25498--     old: 0x3461e831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a09620 memmove
--25498--     new: 0x3461e831b0 (memcpy@GLIBC_2.2.5  ) R-> (2018) 0x04a08830 memcpy

and for memcpy:

==25498== Adding active redirection:
--25498--     new: 0x3461e88d90 (memcpy@@GLIBC_2.14  ) R-> (2018) 0x04a08830 memcpy
Comment 25 Julian Seward 2011-08-18 13:52:24 UTC
I just committed a followup fix in r11991.  It tries to
re-enable overlap checking for calls to memcpy@@GLIBC_2.14
whilst skipping them for memcpy@GLIBC_2.2.5.
Comment 26 Tom Hughes 2011-08-18 14:28:42 UTC
I think we can call this resolved now....
Comment 27 Tom Hughes 2011-09-11 21:41:59 UTC
*** Bug 281828 has been marked as a duplicate of this bug. ***