Bug 502126 - glibc 2.41 extra syscall_cancel frames
Summary: glibc 2.41 extra syscall_cancel frames
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (other bugs)
Version First Reported In: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Mark Wielaard
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-03-28 12:23 UTC by Mark Wielaard
Modified: 2025-04-30 11:41 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Skip syscall_cancel frames (2.22 KB, text/plain)
2025-03-28 17:54 UTC, Mark Wielaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2025-03-28 12:23:57 UTC
Since glibc 2.41 there are extra frames inserted before doing a syscall to support proper thread cancellation.
This breaks various suppressions and regtests involving checking syscall arguments.

As example the memcheck/test/sendmsg

Before glibc 2.41

==1929378== Syscall param sendmsg(msg) points to uninitialised byte(s)
==1929378==    at 0x4971514: sendmsg (sendmsg.c:28)
==1929378==    by 0x40128B: main (sendmsg.c:46)
==1929378==  Address 0x1ffefff640 is on thread 1's stack
==1929378==  in frame #1, created by main (sendmsg.c:13)

After it looks like:

==2670784== Syscall param sendmsg(msg) points to uninitialised byte(s)
==2670784==    at 0x48D9AE6: __internal_syscall_cancel (cancellation.c:64)
==2670784==    by 0x48D9B03: __syscall_cancel (cancellation.c:75)
==2670784==    by 0x49628F0: sendmsg (sendmsg.c:28)
==2670784==    by 0x4005CB: main (sendmsg.c:46)
==2670784==  Address 0x1ffeffff40 is on thread 1's stack
==2670784==  in frame #3, created by main (sendmsg.c:13)

There is also __syscall_cancel_arch which shows up in some gdb_server testcases.

Mailinglist discussion:
https://inbox.sourceware.org/libc-alpha/4954a5131faf35cbe4d88ac7729a1fa3ba4b2cb8.camel@klomp.org/T/#t

Proposal is to filter out those extra top frames early when the platform is VGO_linux and we are handling PRE/POST syscalls.

There is still an open question whether there is any impact from these functions doing tail calls, which might hide the actual caller frame. This should be solved on the glibc side.
Comment 1 Mark Wielaard 2025-03-28 12:56:25 UTC
This gdb_server tests part (for x86_64) seems simple to fix:

commit f3f30becff5851b0d0b2caa7e96e661c7889f7d1
Author: Mark Wielaard <mark@klomp.org>
Date:   Fri Mar 28 13:44:35 2025 +0100

    filter_gdb.in: __syscall_cancel_arch is just in a syscall
    
    Since glibc 2.41 some extra syscall_cancel frames are inserted before
    that actual syscall is made. Just filter out __syscall_cancel_arch
    from the gdb output and replace it with "in syscall ..." to make the
    regtest .exp match.
    
    https://bugs.kde.org/show_bug.cgi?id=502126

diff --git a/gdbserver_tests/filter_gdb.in b/gdbserver_tests/filter_gdb.in
index 2bef9f3ee57b..e2b329a60483 100755
--- a/gdbserver_tests/filter_gdb.in
+++ b/gdbserver_tests/filter_gdb.in
@@ -134,6 +134,9 @@ s/^>[> ]*//
 #       anonymise a 'general' system calls stack trace part
 s/in _dl_sysinfo_int80 () from \/lib\/ld-linux.so.*/in syscall .../
 
+#      in __syscall_cancel_arch is just in a syscall
+s/in __syscall_cancel_arch .*/in syscall .../
+
 #       anonymise kill syscall.
 s/in kill ().*$/in syscall .../
 
Also pushed to VALGRIND_3_24_BRANCH.

This fixes:

gdbserver_tests/mcinfcallWSRU            (stderrB)
gdbserver_tests/nlcontrolc               (stdoutB)
gdbserver_tests/nlvgdbsigqueue           (stdoutB)

Failures that still need some tweaks to the valgrind side:

memcheck/tests/sendmsg                   (stderr)
none/tests/fdbaduse                      (stderr)
none/tests/fdleak_cmsg_supp              (stderr)
none/tests/fdleak_creat_sup              (stderr)
none/tests/fdleak_ipv4                   (stderr)
none/tests/file_dclose                   (stderr)
none/tests/file_dclose_sup               (stderr)
none/tests/socket_close                  (stderr)
none/tests/use_after_close               (stderr)
Comment 2 Mark Wielaard 2025-03-28 17:54:31 UTC
Created attachment 179822 [details]
Skip syscall_cancel frames

Proposed patch that for VGO_linux skips __syscall_cancel_arch, __internal_syscall_cancel and __syscall_cancel if a backtrace is requested while handling a syscall.

Tested on x86_64, ppc64le and s390x where it seems to work as intended.

Also tested in i386, where there is another frame __libc_do_syscall is in the way and it looks like there is some tail call which prevents getting a backtrace with the actual glibc function that called the syscall.

Testing on aarch64 also seems to miss the calling frame, but works otherwise.
Comment 3 Mark Wielaard 2025-03-30 11:16:16 UTC
More gdb tests filtering:

commit ddcb3aa3ed3188cd28c193225245a76e928b850b
Author: Mark Wielaard <mark@klomp.org>
Date:   Sun Mar 30 13:08:55 2025 +0200

    filter_gdb.in: filter out __libc_do_syscall
    
    On i386 and armhf __libc_do_syscall might be used to invoke a syscall.
    Replace __libc_do_syscall with "in syscall ..." and filter out
    possible extra (assembly) source file lines containing
    libc-do-syscall.S from the gdb output.
    
    https://bugs.kde.org/show_bug.cgi?id=502126

Also pushed to VALGRIND_3_24_BRANCH
Comment 4 Mark Wielaard 2025-04-30 11:41:56 UTC
There is still the arm64 issue of syscall_cancel tail calls obscuring the call stack. But that is a glibc issue:
https://inbox.sourceware.org/libc-alpha/874izmtu4w.fsf@oldenburg.str.redhat.com/

The valgrind side is done.