Bug 402833 - memcheck/tests/overlap testcase fails, memcpy seen as memmove
Summary: memcheck/tests/overlap testcase fails, memcpy seen as memmove
Status: CONFIRMED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.14.0
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 453084 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-01-03 17:07 UTC by Mark Wielaard
Modified: 2024-06-13 10:49 UTC (History)
7 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
disable overlap check and test for linux-amd64 (1.90 KB, text/plain)
2023-10-30 19:49 UTC, Mark Wielaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2019-01-03 17:07:07 UTC
With glibc 2.28 (and maybe earlier versions) on amd64 memcheck/tests/overlap fails.

--- overlap.stderr.exp  2018-01-23 21:34:02.318995262 +0100
+++ overlap.stderr.out  2018-12-22 21:43:11.181679045 +0100
@@ -1,11 +1,3 @@
-Source and destination overlap in memcpy(0x........, 0x........, 21)
-   at 0x........: memcpy (vg_replace_strmem.c:...)
-   by 0x........: main (overlap.c:40)
-
-Source and destination overlap in memcpy(0x........, 0x........, 21)
-   at 0x........: memcpy (vg_replace_strmem.c:...)
-   by 0x........: main (overlap.c:42)
-
 Source and destination overlap in strncpy(0x........, 0x........, 21)
    at 0x........: strncpy (vg_replace_strmem.c:...)
    by 0x........: main (overlap.c:45)

Looking at the REDIRS it might be caused by redirecting a memcpy variant to memmove?

--38108-- REDIR: 0x48f0210 (libc.so.6:memmove) redirected to 0x482b1be (_vgnU_ifunc_wrapper)
--38108-- REDIR: 0x48f0680 (libc.so.6:memcpy@@GLIBC_2.14) redirected to 0x482b1be (_vgnU_ifunc_wrapper)
--38108-- REDIR: 0x49c3b50 (libc.so.6:__memcpy_avx_unaligned_erms) redirected to 0x483ca60 (memmove)
Comment 1 Philippe Waroquiers 2019-01-07 15:41:09 UTC
It also started to fail for me some months ago.
Debian GLIBC 2.24-11+deb9u3
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)

See https://sourceforge.net/p/valgrind/mailman/message/36034448/

At that time, IIRC, I vaguely contemplated this could be fixed by changing
the REDIR mechanism by making it 'better ifunc aware, and do the
redirection at an earlier stage.
Comment 2 Julian Seward 2019-03-10 09:46:12 UTC
Is there any progress here?  How important will it be to fix this for 3.15.0?
Comment 3 Philippe Waroquiers 2019-03-10 22:55:02 UTC
(In reply to Julian Seward from comment #2)
> Is there any progress here?  How important will it be to fix this for 3.15.0?

I believe this will be a non neglectible change in the REDIR mechanism,
as the REDIR will have to be done at at ifunc level.

As far as I can see; not fixing this means some false negative for memcpy,
not detecting overlap args.
So, not that critical IMO
Comment 4 Ewgeni Gawrilow 2019-06-18 22:13:59 UTC
I get a false positive: a call to memmove (which should cope with overlapping memory areas) is erroneously checked as if it were memcpy:

==3546== Source and destination overlap in memcpy_chk(0x1ffeffc460, 0x1ffeffc462, 9)
==3546==    at 0x4C383D0: __memcpy_chk (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3546==    by 0x6DC9481: memmove (string_fortified.h:40)

Comparing this with previous comments in this ticket, it seems like the redirects for memmove and memcpy are simply swapped.
Comment 5 Paul Floyd 2023-02-08 20:41:18 UTC
I don't understand,

I've looked at the code and at #275284

We have this redir
 MEMMOVE(VG_Z_LIBC_SONAME, memcpyZAGLIBCZu2Zd2Zd5) /* memcpy@GLIBC_2.2.5 */

It long predates glibc 2.28. The overlap testcase hasn't changed for donkeys years.

Here's the 1st missing call
  401283:       ba 15 00 00 00          mov    $0x15,%edx
  401288:       48 89 ce                mov    %rcx,%rsi
  40128b:       48 89 c7                mov    %rax,%rdi
  40128e:       e8 ed fd ff ff          call   401080 <memcpy@plt>

(Fedora 37, an old Xeon CPU)
Comment 6 Mark Wielaard 2023-05-05 15:58:34 UTC
For now I added the following "workaround" to Fedora because we keep getting bug reports of memmove calls reporting src/dst overlaps:

diff --git a/shared/vg_replace_strmem.c b/shared/vg_replace_strmem.c
index b32f13f76..aa7f88ca2 100644
--- a/shared/vg_replace_strmem.c
+++ b/shared/vg_replace_strmem.c
@@ -1128,7 +1128,7 @@ static inline void my_exit ( int x )
    MEMMOVE_OR_MEMCPY(20181, soname, fnname, 0)
 
 #define MEMCPY(soname, fnname) \
-   MEMMOVE_OR_MEMCPY(20180, soname, fnname, 1)
+   MEMMOVE_OR_MEMCPY(20180, soname, fnname, 0) /* See KDE bug #402833 */
 
 #if defined(VGO_linux)
  /* For older memcpy we have to use memmove-like semantics and skip

We really should not do ifunc resolving for these two functions, just intercept them directly
Comment 7 Mark Wielaard 2023-05-29 12:14:20 UTC
*** Bug 453084 has been marked as a duplicate of this bug. ***
Comment 8 Mark Wielaard 2023-05-30 16:25:11 UTC
The "workaround from comment #6 isn't complete since memmove_chk and memcpy_chk can also alias possibly causing memmove_chk doing an overlap check (because it gets confused with memcpy_chk). So the current patch as used for Fedora is:

diff --git a/shared/vg_replace_strmem.c b/shared/vg_replace_strmem.c
index b32f13f76..464e8d4ca 100644
--- a/shared/vg_replace_strmem.c
+++ b/shared/vg_replace_strmem.c
@@ -1128,7 +1128,7 @@ static inline void my_exit ( int x )
    MEMMOVE_OR_MEMCPY(20181, soname, fnname, 0)
 
 #define MEMCPY(soname, fnname) \
-   MEMMOVE_OR_MEMCPY(20180, soname, fnname, 1)
+   MEMMOVE_OR_MEMCPY(20180, soname, fnname, 0) /* See KDE bug #402833 */
 
 #if defined(VGO_linux)
  /* For older memcpy we have to use memmove-like semantics and skip
@@ -1714,8 +1714,6 @@ static inline void my_exit ( int x )
       RECORD_COPY(len); \
       if (len == 0) \
          return dst; \
-      if (is_overlap(dst, src, len, len)) \
-         RECORD_OVERLAP_ERROR("memcpy_chk", dst, src, len); \
       if ( dst > src ) { \
          d = (HChar *)dst + len - 1; \
          s = (const HChar *)src + len - 1; \
Comment 9 Mark Wielaard 2023-10-30 19:49:13 UTC
Created attachment 162737 [details]
disable overlap check and test for linux-amd64

More subtle workaround patch that only disables the overlap check (and test) on linux-amd64
Comment 10 Mark Wielaard 2023-10-30 23:21:09 UTC
(In reply to Mark Wielaard from comment #9)
> Created attachment 162737 [details]
> disable overlap check and test for linux-amd64
> 
> More subtle workaround patch that only disables the overlap check (and test)
> on linux-amd64

This workaround is now committed:

commit 53e101f562fa89bbf92d658fba626e2397862a16
Author: Mark Wielaard <mark@klomp.org>
Date:   Mon Oct 30 23:30:06 2023 +0100

    Disable memcpy overlap check and test on amd64 linux
    
    Almost all newer distros have ifunc based memcpy/memmove glibc
    implementation which cause false positives. Disable the overlap check
    and test on these systems for now.
    
    https://bugs.kde.org/show_bug.cgi?id=402833

But it is just a workaround to get rid of the false positives. Real solution still needed. So keeping bug open.
Comment 11 Olly Betts 2024-06-12 02:40:53 UTC
I recently started seeing the "Source and destination overlap in memcpy_chk" from a call be memmove which was already reported as https://bugs.kde.org/show_bug.cgi?id=453084 which has been marked as a duplicate of this bug.  It also seems to be the same problem reported here in comment 4.

I found the following suppression works:

{
   memmove overlapping source and destination
   Memcheck:Overlap
   fun:__memcpy_chk
   fun:memmove
}

This suppression only matches when __memcpy_chk is called by memmove so it should only suppress the incorrect errors. I'm posting it here in the hope it will help others who hit this and find this bug while seeking a workaround (like I did before trying to a suppression).

(FWIW I'm using the valgrind package version 1:3.20.0-2.1 on Debian unstable.)
Comment 12 Paul Floyd 2024-06-12 05:25:35 UTC
Adding a suppression is probably not the answer. Running under memcheck there shouldn't even be a call to __memcpy_chk from memmove.

The question is why Valgrind isn't rediecting the call to memmove? On Linux the memmove may have been replaced by a call to memcpy if the compiler deems it safe to do so. That does cause us problems as our checks are more conservative.

What exactly is the error message?
Comment 13 Olly Betts 2024-06-12 19:25:49 UTC
> Adding a suppression is probably not the answer.

I didn't try to claim it was - I explicitly presented it as a way to work around the problem.

If I followed the discussion in https://bugs.kde.org/show_bug.cgi?id=453084#c9 the underlying problem is that some of the optimised memcpy() implementations actually support overlapping blocks (i.e. they are also valid memmove() implementations) and for these memmove() and memcpy() both end up using the same function to implement the copy:

| The problem is that __memcpy_chk_sse2_unaligned and __memmove_chk_sse2_unaligned are aliases to the same function

Due to this valgrind can think the call was to memcpy() when it was actually memmove().  The call stack however shows memmove() so my suppression works.

> The question is why Valgrind isn't rediecting the call to memmove? On Linux the memmove may have been replaced by a call to memcpy if the compiler deems it safe to do so. That does cause us problems as our checks are more conservative.

AIUI, this is NOT the compiler proving to itself that the source and destination can't overlap and replacing memmove with memcpy, but happens at dynamic link time when a memmove() implementation is selected based on the features supported by the CPU the code is being run on.

> What exactly is the error message?

==4137856== Source and destination overlap in memcpy_chk(0x1ffeffb000, 0x1ffeffb010, 10224)
==4137856==    at 0x484BE22: __memcpy_chk (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==4137856==    by 0x495A87B: memmove (string_fortified.h:36)

This really is from a call to memmove(), and one where the source and destination often will overlap (it's copying the tail end of a buffer to the start of that buffer, so it'll overlap when there's more left in the buffer than has been used):

        n -= (p - buf);
        memmove(buf, p, n);
Comment 14 Paul Floyd 2024-06-12 19:54:52 UTC
(In reply to Olly Betts from comment #13)
> ==4137856== Source and destination overlap in memcpy_chk(0x1ffeffb000,
> 0x1ffeffb010, 10224)
> ==4137856==    at 0x484BE22: __memcpy_chk (in
> /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==4137856==    by 0x495A87B: memmove (string_fortified.h:36)

Try again. What exactly is the error message?
Comment 15 Paul Floyd 2024-06-12 20:20:04 UTC
Also in your libc, is there more than one function using the same address as __memmove_chk? (And specifically, is __memcpy_chk the same as __memmove_chk).

First do something like

nm -D /lib/libc.so.7 | grep __memmove_chk

Then search for the hex value in column 1.

That would be something like

paulf> nm -D /lib/libc.so.7 | grep 000000000017bbc0
000000000017bbc0 T __mallctl
000000000017bbc0 W mallctl

(different libc and different symbol, just for illustration).
Comment 16 Olly Betts 2024-06-12 20:36:48 UTC
(In reply to Paul Floyd from comment #15)
> Also in your libc, is there more than one function using the same address as
> __memmove_chk? (And specifically, is __memcpy_chk the same as __memmove_chk).

nm fails to list symbols for me:

$ nm /lib/x86_64-linux-gnu/libc.so.6
nm: /lib/x86_64-linux-gnu/libc.so.6: no symbols

However someone else previously reported the problem being due to symbols having the same address in the ticket marked as a duplicate of this one - see https://bugs.kde.org/show_bug.cgi?id=453084#c9 :

$ nm /lib64/libc.so.6 | grep "__mem\(cpy\|move\)_chk_sse2_unaligned$"
00000000000b9c50 t __memcpy_chk_sse2_unaligned
00000000000b9c50 t __memmove_chk_sse2_unaligned

$ nm /lib64/libc.so.6 | grep "__mem\(cpy\|move\)_chk_avx_unaligned$"
00000000001828a0 t __memcpy_chk_avx_unaligned
00000000001828a0 t __memmove_chk_avx_unaligned
Comment 17 Paul Floyd 2024-06-13 05:20:04 UTC
This is getting frustrating.

I keep asking questions and you keep (incorrectly) second guessing something else.

I'm still waiting to see the exact error. I don't want you to second guess that it means just the top two elements in the callstack.

Full Memcheck errors are delimited by lines with just "==PID==" where PID is a number representing the process ID. It's everything between those delimiting lines that I want to see. Is that clear?

Secondly I gave an example using "nm -D". You interpreted that as "nm" without "-D". Then you copied some output from Fedora whilst you are using Debian.

I have an idea as to what is happening but I'd really prefer not to have to make guesses.
Comment 18 Olly Betts 2024-06-13 06:05:13 UTC
(In reply to Paul Floyd from comment #17)
> This is getting frustrating.
> 
> I keep asking questions and you keep (incorrectly) second guessing something
> else.
> 
> I'm still waiting to see the exact error. I don't want you to second guess
> that it means just the top two elements in the callstack.

I just missed you'd posted two replies earlier.

> Full Memcheck errors are delimited by lines with just "==PID==" where PID is
> a number representing the process ID. It's everything between those
> delimiting lines that I want to see. Is that clear?

I'm using valgrind via a testsuite harness and valgrind's output unhelpfully gets interleaved with other output so it was much much easier to just cut and paste the three lines that actually seemed relevant.  I'll try to get the full clean output for you, but there was just the banner lines before that and after that was into functions in the library code being tested.

> Secondly I gave an example using "nm -D". You interpreted that as "nm"
> without "-D". Then you copied some output from Fedora whilst you are using
> Debian.

Sorry, I managed to miss the `-D` - there are no other symbols with the same address as either __memcpy_chk or __memmove_chk it seems:

$ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 0000000000119f80
0000000000119f80 i __memcpy_chk@@GLIBC_2.3.4
$ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 000000000011a090
000000000011a090 i __memmove_chk@@GLIBC_2.3.4
Comment 19 Paul Floyd 2024-06-13 07:58:50 UTC
(In reply to Olly Betts from comment #18)

> I'm using valgrind via a testsuite harness and valgrind's output unhelpfully
> gets interleaved with other output so it was much much easier to just cut
> and paste the three lines that actually seemed relevant.  I'll try to get
> the full clean output for you, but there was just the banner lines before
> that and after that was into functions in the library code being tested.

I'd like to see if the call to memmove is getting inlined. We do de-inlining for
error reports but not at runtime. That means that if memmove is inlined
memcheck won't redirect it. Instead it will redirect __memmove_chk

> $ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 0000000000119f80
> 0000000000119f80 i __memcpy_chk@@GLIBC_2.3.4
> $ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 000000000011a090
> 000000000011a090 i __memmove_chk@@GLIBC_2.3.4

That's not what I was expecting. If the memmove was inlined and
__memcpy_chk and __memmove_chk are aliases then Memcheck
will only use one of them (__memcpy_chk from the errors you got).

If that's the case then your problem was probably fixed by Mark in October 2023:

39c447e4a9 (Bart Van Assche     2013-11-24 17:48:13 +0000 1716) /*-------------------- memcpy_chk --------------------*/
39c447e4a9 (Bart Van Assche     2013-11-24 17:48:13 +0000 1717) 
53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1718) /* See https://bugs.kde.org/show_bug.cgi?id=402833
53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1719)    why we disable the overlap check on x86_64.  */
53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1720) #if defined(VGP_amd64_linux)
53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1721)  #define CHECK_OVERLAP 0
53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1722) #else
53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1723)  #define CHECK_OVERLAP 1
53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1724) #endif

Try using Valgrind 3.23 and see if the problem goes away.
Comment 20 Mark Wielaard 2024-06-13 10:49:19 UTC
(In reply to Paul Floyd from comment #19)
> (In reply to Olly Betts from comment #18)

> > $ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 0000000000119f80
> > 0000000000119f80 i __memcpy_chk@@GLIBC_2.3.4
> > $ nm -D /usr/lib/x86_64-linux-gnu/libc.so.6|grep 000000000011a090
> > 000000000011a090 i __memmove_chk@@GLIBC_2.3.4
> 
> That's not what I was expecting. If the memmove was inlined and
> __memcpy_chk and __memmove_chk are aliases then Memcheck
> will only use one of them (__memcpy_chk from the errors you got).

Note that the "i" means ifunc. So they get resolved at runtime through an ifunc call.
It depends on cpu features to which this ifunc resolves. So it could very well be that
they resolve to the same address at runtime on one machine, but not on another.

> If that's the case then your problem was probably fixed by Mark in October
> 2023:
> 
> 39c447e4a9 (Bart Van Assche     2013-11-24 17:48:13 +0000 1716)
> /*-------------------- memcpy_chk --------------------*/
> 39c447e4a9 (Bart Van Assche     2013-11-24 17:48:13 +0000 1717) 
> 53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1718) /* See
> https://bugs.kde.org/show_bug.cgi?id=402833
> 53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1719)    why we
> disable the overlap check on x86_64.  */
> 53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1720) #if
> defined(VGP_amd64_linux)
> 53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1721)  #define
> CHECK_OVERLAP 0
> 53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1722) #else
> 53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1723)  #define
> CHECK_OVERLAP 1
> 53e101f562 (Mark Wielaard       2023-10-30 23:30:06 +0100 1724) #endif
> 
> Try using Valgrind 3.23 and see if the problem goes away.

yeah, that was the workaround for this bug.
But that simply means we don't do any overlap checking on x86_64, which is still a bug :{