With glibc 2.28 (and maybe earlier versions) on amd64 memcheck/tests/overlap fails. --- overlap.stderr.exp 2018-01-23 21:34:02.318995262 +0100 +++ overlap.stderr.out 2018-12-22 21:43:11.181679045 +0100 @@ -1,11 +1,3 @@ -Source and destination overlap in memcpy(0x........, 0x........, 21) - at 0x........: memcpy (vg_replace_strmem.c:...) - by 0x........: main (overlap.c:40) - -Source and destination overlap in memcpy(0x........, 0x........, 21) - at 0x........: memcpy (vg_replace_strmem.c:...) - by 0x........: main (overlap.c:42) - Source and destination overlap in strncpy(0x........, 0x........, 21) at 0x........: strncpy (vg_replace_strmem.c:...) by 0x........: main (overlap.c:45) Looking at the REDIRS it might be caused by redirecting a memcpy variant to memmove? --38108-- REDIR: 0x48f0210 (libc.so.6:memmove) redirected to 0x482b1be (_vgnU_ifunc_wrapper) --38108-- REDIR: 0x48f0680 (libc.so.6:memcpy@@GLIBC_2.14) redirected to 0x482b1be (_vgnU_ifunc_wrapper) --38108-- REDIR: 0x49c3b50 (libc.so.6:__memcpy_avx_unaligned_erms) redirected to 0x483ca60 (memmove)
It also started to fail for me some months ago. Debian GLIBC 2.24-11+deb9u3 gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) See https://sourceforge.net/p/valgrind/mailman/message/36034448/ At that time, IIRC, I vaguely contemplated this could be fixed by changing the REDIR mechanism by making it 'better ifunc aware, and do the redirection at an earlier stage.
Is there any progress here? How important will it be to fix this for 3.15.0?
(In reply to Julian Seward from comment #2) > Is there any progress here? How important will it be to fix this for 3.15.0? I believe this will be a non neglectible change in the REDIR mechanism, as the REDIR will have to be done at at ifunc level. As far as I can see; not fixing this means some false negative for memcpy, not detecting overlap args. So, not that critical IMO
I get a false positive: a call to memmove (which should cope with overlapping memory areas) is erroneously checked as if it were memcpy: ==3546== Source and destination overlap in memcpy_chk(0x1ffeffc460, 0x1ffeffc462, 9) ==3546== at 0x4C383D0: __memcpy_chk (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==3546== by 0x6DC9481: memmove (string_fortified.h:40) Comparing this with previous comments in this ticket, it seems like the redirects for memmove and memcpy are simply swapped.
I don't understand, I've looked at the code and at #275284 We have this redir MEMMOVE(VG_Z_LIBC_SONAME, memcpyZAGLIBCZu2Zd2Zd5) /* memcpy@GLIBC_2.2.5 */ It long predates glibc 2.28. The overlap testcase hasn't changed for donkeys years. Here's the 1st missing call 401283: ba 15 00 00 00 mov $0x15,%edx 401288: 48 89 ce mov %rcx,%rsi 40128b: 48 89 c7 mov %rax,%rdi 40128e: e8 ed fd ff ff call 401080 <memcpy@plt> (Fedora 37, an old Xeon CPU)
For now I added the following "workaround" to Fedora because we keep getting bug reports of memmove calls reporting src/dst overlaps: diff --git a/shared/vg_replace_strmem.c b/shared/vg_replace_strmem.c index b32f13f76..aa7f88ca2 100644 --- a/shared/vg_replace_strmem.c +++ b/shared/vg_replace_strmem.c @@ -1128,7 +1128,7 @@ static inline void my_exit ( int x ) MEMMOVE_OR_MEMCPY(20181, soname, fnname, 0) #define MEMCPY(soname, fnname) \ - MEMMOVE_OR_MEMCPY(20180, soname, fnname, 1) + MEMMOVE_OR_MEMCPY(20180, soname, fnname, 0) /* See KDE bug #402833 */ #if defined(VGO_linux) /* For older memcpy we have to use memmove-like semantics and skip We really should not do ifunc resolving for these two functions, just intercept them directly
*** Bug 453084 has been marked as a duplicate of this bug. ***
The "workaround from comment #6 isn't complete since memmove_chk and memcpy_chk can also alias possibly causing memmove_chk doing an overlap check (because it gets confused with memcpy_chk). So the current patch as used for Fedora is: diff --git a/shared/vg_replace_strmem.c b/shared/vg_replace_strmem.c index b32f13f76..464e8d4ca 100644 --- a/shared/vg_replace_strmem.c +++ b/shared/vg_replace_strmem.c @@ -1128,7 +1128,7 @@ static inline void my_exit ( int x ) MEMMOVE_OR_MEMCPY(20181, soname, fnname, 0) #define MEMCPY(soname, fnname) \ - MEMMOVE_OR_MEMCPY(20180, soname, fnname, 1) + MEMMOVE_OR_MEMCPY(20180, soname, fnname, 0) /* See KDE bug #402833 */ #if defined(VGO_linux) /* For older memcpy we have to use memmove-like semantics and skip @@ -1714,8 +1714,6 @@ static inline void my_exit ( int x ) RECORD_COPY(len); \ if (len == 0) \ return dst; \ - if (is_overlap(dst, src, len, len)) \ - RECORD_OVERLAP_ERROR("memcpy_chk", dst, src, len); \ if ( dst > src ) { \ d = (HChar *)dst + len - 1; \ s = (const HChar *)src + len - 1; \
Created attachment 162737 [details] disable overlap check and test for linux-amd64 More subtle workaround patch that only disables the overlap check (and test) on linux-amd64
(In reply to Mark Wielaard from comment #9) > Created attachment 162737 [details] > disable overlap check and test for linux-amd64 > > More subtle workaround patch that only disables the overlap check (and test) > on linux-amd64 This workaround is now committed: commit 53e101f562fa89bbf92d658fba626e2397862a16 Author: Mark Wielaard <mark@klomp.org> Date: Mon Oct 30 23:30:06 2023 +0100 Disable memcpy overlap check and test on amd64 linux Almost all newer distros have ifunc based memcpy/memmove glibc implementation which cause false positives. Disable the overlap check and test on these systems for now. https://bugs.kde.org/show_bug.cgi?id=402833 But it is just a workaround to get rid of the false positives. Real solution still needed. So keeping bug open.
I recently started seeing the "Source and destination overlap in memcpy_chk" from a call be memmove which was already reported as https://bugs.kde.org/show_bug.cgi?id=453084 which has been marked as a duplicate of this bug. It also seems to be the same problem reported here in comment 4. I found the following suppression works: { memmove overlapping source and destination Memcheck:Overlap fun:__memcpy_chk fun:memmove } This suppression only matches when __memcpy_chk is called by memmove so it should only suppress the incorrect errors. I'm posting it here in the hope it will help others who hit this and find this bug while seeking a workaround (like I did before trying to a suppression). (FWIW I'm using the valgrind package version 1:3.20.0-2.1 on Debian unstable.)
Adding a suppression is probably not the answer. Running under memcheck there shouldn't even be a call to __memcpy_chk from memmove. The question is why Valgrind isn't rediecting the call to memmove? On Linux the memmove may have been replaced by a call to memcpy if the compiler deems it safe to do so. That does cause us problems as our checks are more conservative. What exactly is the error message?
> Adding a suppression is probably not the answer. I didn't try to claim it was - I explicitly presented it as a way to work around the problem. If I followed the discussion in https://bugs.kde.org/show_bug.cgi?id=453084#c9 the underlying problem is that some of the optimised memcpy() implementations actually support overlapping blocks (i.e. they are also valid memmove() implementations) and for these memmove() and memcpy() both end up using the same function to implement the copy: | The problem is that __memcpy_chk_sse2_unaligned and __memmove_chk_sse2_unaligned are aliases to the same function Due to this valgrind can think the call was to memcpy() when it was actually memmove(). The call stack however shows memmove() so my suppression works. > The question is why Valgrind isn't rediecting the call to memmove? On Linux the memmove may have been replaced by a call to memcpy if the compiler deems it safe to do so. That does cause us problems as our checks are more conservative. AIUI, this is NOT the compiler proving to itself that the source and destination can't overlap and replacing memmove with memcpy, but happens at dynamic link time when a memmove() implementation is selected based on the features supported by the CPU the code is being run on. > What exactly is the error message? ==4137856== Source and destination overlap in memcpy_chk(0x1ffeffb000, 0x1ffeffb010, 10224) ==4137856== at 0x484BE22: __memcpy_chk (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==4137856== by 0x495A87B: memmove (string_fortified.h:36) This really is from a call to memmove(), and one where the source and destination often will overlap (it's copying the tail end of a buffer to the start of that buffer, so it'll overlap when there's more left in the buffer than has been used): n -= (p - buf); memmove(buf, p, n);
(In reply to Olly Betts from comment #13) > ==4137856== Source and destination overlap in memcpy_chk(0x1ffeffb000, > 0x1ffeffb010, 10224) > ==4137856== at 0x484BE22: __memcpy_chk (in > /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) > ==4137856== by 0x495A87B: memmove (string_fortified.h:36) Try again. What exactly is the error message?
Also in your libc, is there more than one function using the same address as __memmove_chk? (And specifically, is __memcpy_chk the same as __memmove_chk). First do something like nm -D /lib/libc.so.7 | grep __memmove_chk Then search for the hex value in column 1. That would be something like paulf> nm -D /lib/libc.so.7 | grep 000000000017bbc0 000000000017bbc0 T __mallctl 000000000017bbc0 W mallctl (different libc and different symbol, just for illustration).
(In reply to Paul Floyd from comment #15) > Also in your libc, is there more than one function using the same address as > __memmove_chk? (And specifically, is __memcpy_chk the same as __memmove_chk). nm fails to list symbols for me: $ nm /lib/x86_64-linux-gnu/libc.so.6 nm: /lib/x86_64-linux-gnu/libc.so.6: no symbols However someone else previously reported the problem being due to symbols having the same address in the ticket marked as a duplicate of this one - see https://bugs.kde.org/show_bug.cgi?id=453084#c9 : $ nm /lib64/libc.so.6 | grep "__mem\(cpy\|move\)_chk_sse2_unaligned$" 00000000000b9c50 t __memcpy_chk_sse2_unaligned 00000000000b9c50 t __memmove_chk_sse2_unaligned $ nm /lib64/libc.so.6 | grep "__mem\(cpy\|move\)_chk_avx_unaligned$" 00000000001828a0 t __memcpy_chk_avx_unaligned 00000000001828a0 t __memmove_chk_avx_unaligned
This is getting frustrating. I keep asking questions and you keep (incorrectly) second guessing something else. I'm still waiting to see the exact error. I don't want you to second guess that it means just the top two elements in the callstack. Full Memcheck errors are delimited by lines with just "==PID==" where PID is a number representing the process ID. It's everything between those delimiting lines that I want to see. Is that clear? Secondly I gave an example using "nm -D". You interpreted that as "nm" without "-D". Then you copied some output from Fedora whilst you are using Debian. I have an idea as to what is happening but I'd really prefer not to have to make guesses.
(In reply to Paul Floyd from comment #17) > This is getting frustrating. > > I keep asking questions and you keep (incorrectly) second guessing something > else. > > I'm still waiting to see the exact error. I don't want you to second guess > that it means just the top two elements in the callstack. I just missed you'd posted two replies earlier. > Full Memcheck errors are delimited by lines with just "==PID==" where PID is > a number representing the process ID. It's everything between those > delimiting lines that I want to see. Is that clear? I'm using valgrind via a testsuite harness and valgrind's output unhelpfully gets interleaved with other output so it was much much easier to just cut and paste the three lines that actually seemed relevant. I'll try to get the full clean output for you, but there was just the banner lines before that and after that was into functions in the library code being tested. > Secondly I gave an example using "nm -D". You interpreted that as "nm" > without "-D". Then you copied some output from Fedora whilst you are using > Debian. Sorry, I managed to miss the `-D` - there are no other symbols with the same address as either __memcpy_chk or __memmove_chk it seems: $ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 0000000000119f80 0000000000119f80 i __memcpy_chk@@GLIBC_2.3.4 $ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 000000000011a090 000000000011a090 i __memmove_chk@@GLIBC_2.3.4
(In reply to Olly Betts from comment #18) > I'm using valgrind via a testsuite harness and valgrind's output unhelpfully > gets interleaved with other output so it was much much easier to just cut > and paste the three lines that actually seemed relevant. I'll try to get > the full clean output for you, but there was just the banner lines before > that and after that was into functions in the library code being tested. I'd like to see if the call to memmove is getting inlined. We do de-inlining for error reports but not at runtime. That means that if memmove is inlined memcheck won't redirect it. Instead it will redirect __memmove_chk > $ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 0000000000119f80 > 0000000000119f80 i __memcpy_chk@@GLIBC_2.3.4 > $ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 000000000011a090 > 000000000011a090 i __memmove_chk@@GLIBC_2.3.4 That's not what I was expecting. If the memmove was inlined and __memcpy_chk and __memmove_chk are aliases then Memcheck will only use one of them (__memcpy_chk from the errors you got). If that's the case then your problem was probably fixed by Mark in October 2023: 39c447e4a9 (Bart Van Assche 2013-11-24 17:48:13 +0000 1716) /*-------------------- memcpy_chk --------------------*/ 39c447e4a9 (Bart Van Assche 2013-11-24 17:48:13 +0000 1717) 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1718) /* See https://bugs.kde.org/show_bug.cgi?id=402833 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1719) why we disable the overlap check on x86_64. */ 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1720) #if defined(VGP_amd64_linux) 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1721) #define CHECK_OVERLAP 0 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1722) #else 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1723) #define CHECK_OVERLAP 1 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1724) #endif Try using Valgrind 3.23 and see if the problem goes away.
(In reply to Paul Floyd from comment #19) > (In reply to Olly Betts from comment #18) > > $ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 0000000000119f80 > > 0000000000119f80 i __memcpy_chk@@GLIBC_2.3.4 > > $ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 000000000011a090 > > 000000000011a090 i __memmove_chk@@GLIBC_2.3.4 > > That's not what I was expecting. If the memmove was inlined and > __memcpy_chk and __memmove_chk are aliases then Memcheck > will only use one of them (__memcpy_chk from the errors you got). Note that the "i" means ifunc. So they get resolved at runtime through an ifunc call. It depends on cpu features to which this ifunc resolves. So it could very well be that they resolve to the same address at runtime on one machine, but not on another. > If that's the case then your problem was probably fixed by Mark in October > 2023: > > 39c447e4a9 (Bart Van Assche 2013-11-24 17:48:13 +0000 1716) > /*-------------------- memcpy_chk --------------------*/ > 39c447e4a9 (Bart Van Assche 2013-11-24 17:48:13 +0000 1717) > 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1718) /* See > https://bugs.kde.org/show_bug.cgi?id=402833 > 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1719) why we > disable the overlap check on x86_64. */ > 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1720) #if > defined(VGP_amd64_linux) > 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1721) #define > CHECK_OVERLAP 0 > 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1722) #else > 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1723) #define > CHECK_OVERLAP 1 > 53e101f562 (Mark Wielaard 2023-10-30 23:30:06 +0100 1724) #endif > > Try using Valgrind 3.23 and see if the problem goes away. yeah, that was the workaround for this bug. But that simply means we don't do any overlap checking on x86_64, which is still a bug :{