glibc 2.14+ contains a workaround for lame programs whose programmers don't understand the difference between memcpy and memmove. In particular, we have: 1103: 00000035e7a83160 71 IFUNC GLOBAL DEFAULT 12 memcpy@GLIBC_2.2.5 1104: 00000035e7a88d40 60 IFUNC GLOBAL DEFAULT 12 memcpy@@GLIBC_2.14 1944: 00000035e7a83160 71 IFUNC GLOBAL DEFAULT 12 memmove@@GLIBC_2.2.5 in libc.so.6 .dynsym and: 5575: 00000035e7a83160 71 IFUNC GLOBAL DEFAULT 12 memcpy@GLIBC_2.2.5 6284: 00000035e7a83160 71 IFUNC GLOBAL DEFAULT 12 memmove 7443: 00000035e7a88d40 60 IFUNC GLOBAL DEFAULT 12 memcpy@@GLIBC_2.14 in libc.so.6 .symtab. The reason for this is to keep old broken apps like flash which Adobe wasn't able to fix after many months since it has been reported to them, old libraries or binaries when using memcpy will just have it implemented as memmove, and only newly linked apps will perform real memcpy. Unfortunately it seems that then neither memcpy nor memmove are then redirected to memcheck's own version. Given a short testcase compiled with -O0: extern void *memcpy (void *, const void *, __SIZE_TYPE__); extern void *memmove (void *, const void *, __SIZE_TYPE__); extern void *malloc (__SIZE_TYPE__); extern void *memset (void *, int, __SIZE_TYPE__); extern int posix_memalign(void **, __SIZE_TYPE__, __SIZE_TYPE__); int main (void) { int n = 128 * 1024 * 1024; void *p, *q; if (posix_memalign (&p, 4096, n - 3) || posix_memalign (&q, 4096, n - 3)) return 1; memset (p, 1, n - 3); memmove (q, p, n - 3); return 0; } I see: valgrind -v -v --trace-symtab=yes /tmp/zzz 2>&1 | grep '\(prefer\|REDIR\).*mem\(cpy\|move\|set\)' sym at 0x35e7a82fd0: prefer '__GI_memmove' to '__memmove_sse2' sym at 0x35e7a83160: prefer 'memcpy' to 'memmove' sym at 0x35e7a83160: prefer 'memcpy@GLIBC_2.2.5' to 'memmove' sym at 0x35e7a831f0: prefer '__GI_memset' to '__memset_sse2' sym at 0x35e7a88d40: prefer 'memcpy@@GLIBC_2.14' to 'memcpy' sym at 0x35e7a88d90: prefer '__GI_memcpy' to '__memcpy_sse2' sym at 0x35e7a90430: prefer 'wmemcpy' to '__wmemcpy' sym at 0x35e7a90450: prefer 'wmemset' to '__GI_wmemset' sym at 0x35e7af4870: prefer '__memset_chk' to '__memset_zero_constant_len_parameter' sym at 0x35e7a83160: prefer 'memcpy@GLIBC_2.2.5' to 'memcpy' sym at 0x35e7a88d40: prefer 'memcpy@@GLIBC_2.14' to '__new_memcpy' sym at 0x35e7a90440: prefer 'wmemmove' to '__wmemmove' --20575-- REDIR: 0x35e7a831b0 (memset) redirected to 0x48015d0 (_vgnU_ifunc_wrapper) --20575-- REDIR: 0x35e7a831f0 (__GI_memset) redirected to 0x4a08990 (memset) As you can see, memmove wasn't redirected, supposedly because storage.c preferred memcpy over memmove there, and then preferred memcpy@GLIBC_2.2.5 over memcpy for the same address.
Is this related to #275212 perhaps - if we're not catching it then glibc's selection routine kicks in and sends us off to the ssse3 versions?
Created attachment 60827 [details] valgrind-amd64-linux-memcpy.patch Untested patch (except for this testcase). It would be much better if we could differentiate based on whether the caller was calling memcpy@GLIBC_2.2.5 or memmove or memmove@@GLIBC_2.2.5, if the former, do the overlapping checks, otherwise avoid those, but not sure how to do that when they are calling the same address in libc.so.6.
Yeah, #275212 should be closed as a dup of this. Special casing just one of the many memcpy/memmove implementations is going to work only on some hw and not on other hw.
I was thinking that a solution might be to prefer the linkable symbols (with @@ in them) over the legacy symbols (with the single @ in them). The problem with that though is that we don't currently see the version decorations on the name when processing the dynamic symbol table (hence the appearance of the plain memcpy symbol). It looks like .symtab actually adds the version decoration to the name, but .dynsym must store the version as separate metadata. Actually it looks like it does it by adding symbols with a name of GLIBC_N.NN and size zero, with each symbol being versioned according to the previous such symbol?
*** Bug 275212 has been marked as a duplicate of this bug. ***
.symtab only contains explicit @ and @@s when the source contained them through an .symver directive. Normally this isn't done, then only one symbol of that name is defined and symbol version script determines which symbol version it gets during linking (if doing symbol versioning at all). For the case where one symbol needs to be available in more than once version, or when it is only a compatibility symbol that shouldn't be available for linking against for new programs/libraries, .symver directives are used and then it shows up in .symtab. For dynamic symbol resolution, the @ and @@ chars and symbol version names indeed don't appear in .dynsym, you need to figure it out from .gnu.version section (SHT_GNU_versym). See http://www.akkadia.org/drepper/symbol-versioning for details. .gnu.version section contains 2 bytes per symbol, which includes symbol version id and in msb bit flag whether it is a hidden symbol or not (if set, it is @ symbol, if unset it is @@ symbol). Though, the symbol versions used for glibc stringop/malloc etc. symbols differ a lot between different targets, so if those were to be used always, it would be quite hard to maintain. FOr the sole purpose of this memcpy alias to memmove and new memcpy, I think my patch should work if .symtab is available, and if it is not available perhaps just a hack to prefer memmove over memcpy symbol (as an exception in storage.c (prefersym) could do the job.
I read through this a couple of times. I'm trying to get the trunk to work sanely on a Sandy Bridge box running F15, which it doesn't right now. I'm not sure I understand all the details. But one thing that strikes me is that the problem exists (or is made worse?) because our symbol table mechanism has the assumption that only one label is associated with each address. Hence the preferSym hack (yes, it is a hack) and the problems above. Would it help to redo the symbol table mechanism so that each address (or, really, address range) can be associated with arbitrarily many symbols? That seems to me like a necessary infrastructural change to fix this properly. I volunteer to do it if it would help.
That alone wouldn't help I'm afraid. If the redirection is based on addresses of the called functions and one of memcpy@GLIBC_2.2.5 and memmove@@GLIBC_2.2.5 is called, there is no easy way to find out which one of those it was. And with IFUNC it isn't enough to even adjust based on what the IFUNC indirect function is returning, because here the IFUNC symbols are aliased.
The redirection isn't based on addresses, it's based on names, which is the problem. Basically at the moment when an so is loaded we process the symbol table and choose a name for each function (a single name) then we match that name against our list of functions to trap and if it matches we trap it. Julian is proposing to allow multiple names to be assigned to a function and then, presumably, to trap it if any of the names matches. Of course the problem here would be whether to use the memcpy or memmove trap for it, as both would match...
(In reply to comment #9) > Julian is proposing to allow multiple names to be assigned to a function and > then, presumably, to trap it if any of the names matches. Yes. > Of course the problem here would be whether to use the memcpy or memmove trap > for it, as both would match... In this case we could have both of the memcpy and memmove traps behave like memmove, and then it wouldn't matter (correct?) IIRC the memcpy trap already handles overlapping areas, for the sake of robustness.
*** Bug 278502 has been marked as a duplicate of this bug. ***
I'm mulling over the following fix. Firstly, change how symbols are stored (DiSym) so that we can record the fact that multiple symbols may have the same address (range). At the moment we can't even accurately represent what's in the ELF symbol tables since our representation is based on the broken assumption that only one symbol is associated with any given address. That said, there still needs to be a way to give some preferredness ranking to multiple symbols with the same address. IOW, prefersym() needs to be preserved in some form. This is because, when we come to map an address to a symbol (when making backtraces, basically) we don't want to print all the candidates, just the "most preferred" candidate. When m_redir processes a newly mapped in library, it examines all the symbols, not just the most-preferred ones (which is effectively is what is happening at present). By doing this it may discover that some of the new symbols must be redirected. Now the question arises, as per comment #9 para 4, of what happens if (eg) the same piece of code is labelled "memcpy" and "memmove". I think this is simple. We look through our collection of redir specifications, and find a redirect address for each symbol. If they are all the same then there is no conflict and we use that. If they're different then there's an inconsistency and we abort the run. Darn. That last bit isn't going to work, since the replacements for memcpy and memmove may be cut-n-paste (hence, functionally) identical at the source level, but they will have different addresses. Hmm. Back to the drawing board.
(In reply to comment #12) > Darn. That last bit isn't going to work, since the replacements for > memcpy and memmove may be cut-n-paste (hence, functionally) identical > at the source level, but they will have different addresses. One semi-kludge that would work is as follows. Currently the replacement function names encode two pieces of text, a soname pattern and a function name pattern, which are used to determine which symbols it is a replacement for, in the general format _vgrZZ_sonamepatt_fnnamepatt We could add to that, a third piece of text, giving a "behavioral equivalence class": _vgrZZ_memcopylike_sonamepatt_fnnamepatt with the understanding that two replacement functions with the same equivalance class text are functionally identical. Then, when multiple different replacements are found for the multiple symbols associated with a given address, if they all have the same equivalance class text, we're OK. ----- Urr. That's still a kludge. Really, need a way to separate the specification of replacement functions from the statement of which functions they replace. Currently those two concepts are conflated.
(In reply to comment #12) > I'm mulling over the following fix. > > Firstly, change how symbols are stored (DiSym) so that we can record > the fact that multiple symbols may have the same address (range). Done, r11981 and r11982. Symbol table now contains, for each address, a primary symbol and a bunch of secondary symbols. The primary symbol is selected using the same preferSym hack as before, and redirection is only done against the primary symbol, so the behaviour of the system as a whole is unchanged (this bug is not yet fixed). But at least now we're in a position where it can be fixed properly. Next thing to do is to fix the redir machinery so it considers all the symbols for a given address when processing redirections, not just the primary symbol.
(In reply to comment #14) > Next thing to do is to fix the redir machinery so it considers all the > symbols for a given address when processing redirections, not just the > primary symbol. Committed as r11984, ftr. Doesn't do the final step of resolving conflicting redirections, though.
Created attachment 62872 [details] Last piece of the puzzle (first attempt) Last piece of the puzzle (first attempt). Add equivalence class tags for most functions intercepted by memcheck. On Linux, map memmove and memcpy to the same function (which I think formally fixes this bug); add an intercept for strspn(); remove a conflicting intercepts for strlen in ld.so. This appears to make V work sanely on Fedora 15, x86_64. Still WIP, but good enough for initial testing now. Needs to be applied on top of r11984 or later.
(In reply to comment #2) > It would be much better if we could differentiate based on whether the caller > was calling memcpy@GLIBC_2.2.5 or memmove or memmove@@GLIBC_2.2.5, if the > former, do the overlapping checks, otherwise avoid those, but not sure how to > do that Well, yes. Given that this decision by the glibc maintainers effectively makes memcpy and memmove be the same function on Linux, I don't see how we can avoid losing overlap checking for memcpy.
(In reply to comment #17) > Well, yes. Given that this decision by the glibc maintainers > effectively makes memcpy and memmove be the same function on Linux, > I don't see how we can avoid losing overlap checking for memcpy. They're only the same for programs linked against an old glibc and run against a new one. If you link against a new one you get memcpy@GLIBC_2.14 which is a different routine which doesn't handle overlaps.
(In reply to comment #18) > They're only the same for programs linked against an old glibc and run against > a new one. If you link against a new one you get memcpy@GLIBC_2.14 which is a > different routine which doesn't handle overlaps. Hmm, well, I _think_ I can accommodate that, thusly: /* For older memcpy we have to use memmove-like semantics and skip the overlap check; sigh; see #275284. */ MEMMOVE(VG_Z_LIBC_SONAME, memcpy) MEMCPY(VG_Z_LIBC_SONAME, memcpyZAZAGLIBCZu2Zd14) /* memcpy@@GLIBC_2.14 */ MEMCPY(VG_Z_LD_SO_1, memcpy) /* ld.so.1 */ MEMCPY(VG_Z_LD64_SO_1, memcpy) /* ld64.so.1 */ and memcheck/tests/overlap still works -- the deliberate memcpy overlaps are reported, as expected.
Actually it doesn't even seem to need that. Using your patch, on Fedora 15, and with the memcpy mappings changed to set the map overlap flag I find my active redirections wind up including: --366-- 0x320c0831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a09770 memmove --366-- 0x320c088d90 (memcpy@@GLIBC_2.14 ) R-> (2018) 0x04a08830 memcpy and when I call memcpy (which maps to the 2.14 version) with overlapping arguments this is what I see: --366-- REDIR: 0x320c088d90 (memcpy@@GLIBC_2.14) redirected to 0x48025ca (_vgnU_ifunc_wrapper) --366-- Adding redirect for indirect function 0x320c088d90 from 0x320c1352f0 -> 0x4a08830 --366-- REDIR: 0x320c1352f0 (__memcpy_ssse3_back) redirected to 0x4a08830 (memcpy) ==366== Source and destination overlap in memcpy(0x7feffecb0, 0x7feffecb0, 4092) ==366== at 0x4A08896: memcpy (mc_replace_strmem.c:670) ==366== by 0x40061D: main (x.c:10) ==366== ==366== Source and destination overlap in memcpy(0x7feffecb0, 0x7feffecb0, 4092) ==366== at 0x4A08896: memcpy (mc_replace_strmem.c:670) ==366== by 0x400642: main (x.c:12) then calling memmove I get: --366-- REDIR: 0x320c0831b0 (memcpy@GLIBC_2.2.5) redirected to 0x48025ca (_vgnU_ifunc_wrapper) --366-- Adding redirect for indirect function 0x320c0831b0 from 0x320c13a890 -> 0x4a09770 --366-- REDIR: 0x320c13a890 (__memmove_ssse3_back) redirected to 0x4a09770 (memmove) which all looks good to me?
(In reply to comment #20) > Actually it doesn't even seem to need that. Using your patch, on Fedora 15, and > with the memcpy mappings changed to set the map overlap flag I find my active > redirections wind up including: > > --366-- 0x320c0831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a09770 memmove > --366-- 0x320c088d90 (memcpy@@GLIBC_2.14 ) R-> (2018) 0x04a08830 memcpy Well, that's good, especially considering my attempts to intercept the versioned symbols directly didn't work out, since they lead to conflicts: ==11908== WARNING: new redirection conflicts with existing -- ignoring it --11908-- old: 0x39fbc831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a08b10 memcpy@GLIBC_2.2.5 --11908-- new: 0x39fbc831b0 (memcpy@GLIBC_2.2.5 ) R-> (2022) 0x04a08870 memcpy Do you see anything like that? So .. I didn't understand what you meant by > with the memcpy mappings changed to set the map overlap flag Can you clarify?
(In reply to comment #16) > Created an attachment (id=62872) [details] > Last piece of the puzzle (first attempt) Committed, with minor tidying, as r11985.
(In reply to comment #22) > (In reply to comment #16) > > Created an attachment (id=62872) [details] [details] > > Last piece of the puzzle (first attempt) > Committed, with minor tidying, as r11985. And with MacOS fixes in r11986. At least to a first approximation, this completes my attempts to fix this problem. I suspect some followup tidying will be necessary. As it stands at r11985, we've lost the ability to do overlap checking for memcpy on Linux, even the new versions. I don't know if it will be possible to reinstate it for new glibcs. I'm not exactly clear what Tom is saying about this comment 20.
(In reply to comment #21) > (In reply to comment #20) > > Actually it doesn't even seem to need that. Using your patch, on Fedora 15, and > > with the memcpy mappings changed to set the map overlap flag I find my active > > redirections wind up including: > > > > --366-- 0x320c0831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a09770 memmove > > --366-- 0x320c088d90 (memcpy@@GLIBC_2.14 ) R-> (2018) 0x04a08830 memcpy > > Well, that's good, especially considering my attempts to intercept > the versioned symbols directly didn't work out, since they lead to > conflicts: That's because we are always matching based on the undecorated symbol without the version so you can't add a matcher for a specific version. > ==11908== WARNING: new redirection conflicts with existing -- ignoring it > --11908-- old: 0x39fbc831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a08b10 > memcpy@GLIBC_2.2.5 > --11908-- new: 0x39fbc831b0 (memcpy@GLIBC_2.2.5 ) R-> (2022) 0x04a08870 > memcpy > > Do you see anything like that? When I add -v then I do unfortunately see that, yes. I also added annotations to maybe_add_active to report both accepted redirections, and ones ignored as equivalent, and I see this for memmove: ==27728== Adding active redirection: --27728-- new: 0x3461e831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a09770 memmove ==27728== WARNING: new redirection conflicts with existing -- ignoring it --27728-- old: 0x3461e831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a09770 memmove --27728-- new: 0x3461e831b0 (memcpy@GLIBC_2.2.5 ) R-> (2022) 0x04a08830 memcpy ==27728== WARNING: new redirection conflicts with existing -- ignoring it --27728-- old: 0x3461e831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a09770 memmove --27728-- new: 0x3461e831b0 (memcpy@GLIBC_2.2.5 ) R-> (2022) 0x04a08830 memcpy and then for memcpy: ==27728== Adding active redirection: --27728-- new: 0x3461e88d90 (memcpy@@GLIBC_2.14 ) R-> (2022) 0x04a08830 memcpy So basically everything is working only because of the order in which the specs are processed and the fact that the memove one is matched first. > > So .. I didn't understand what you meant by > > > with the memcpy mappings changed to set the map overlap flag > > Can you clarify? Well in terms of the modified patch you committed I changed MEMMOVE to MEMCPY in the three memcpy match lines for linux in me_replace_strmem.c. Without that we don't get the clashes, but also don't get overlap checks: ==25498== Adding active redirection: --25498-- new: 0x3461e831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a09620 memmove ==25498== Ignoring duplicate redirection: --25498-- old: 0x3461e831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a09620 memmove --25498-- new: 0x3461e831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a08830 memcpy ==25498== Ignoring duplicate redirection: --25498-- old: 0x3461e831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a09620 memmove --25498-- new: 0x3461e831b0 (memcpy@GLIBC_2.2.5 ) R-> (2018) 0x04a08830 memcpy and for memcpy: ==25498== Adding active redirection: --25498-- new: 0x3461e88d90 (memcpy@@GLIBC_2.14 ) R-> (2018) 0x04a08830 memcpy
I just committed a followup fix in r11991. It tries to re-enable overlap checking for calls to memcpy@@GLIBC_2.14 whilst skipping them for memcpy@GLIBC_2.2.5.
I think we can call this resolved now....
*** Bug 281828 has been marked as a duplicate of this bug. ***