Running LTP on RHEL-10 kernel: 6.12.0-55.2.1.el10_0.x86_64 shows unhandled amd64-linux syscall: 323. $ valgrind -q --tool=none --log-file=xxx accept03 tst_test.c:1900: TINFO: LTP version: 20250130 tst_test.c:1904: TINFO: Tested kernel: 6.12.0-55.2.1.el10_0.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Mar 3 04:14:37 EST 2025 x86_64 tst_kconfig.c:88: TINFO: Parsing kernel config '/lib/modules/6.12.0-55.2.1.el10_0.x86_64/config' tst_test.c:1722: TINFO: Overall timeout per run is 0h 00m 30s accept03.c:48: TPASS: accept() on file : ENOTSOCK (88) accept03.c:48: TPASS: accept() on O_PATH file : EBADF (9) accept03.c:48: TPASS: accept() on directory : ENOTSOCK (88) accept03.c:48: TPASS: accept() on /dev/zero : ENOTSOCK (88) accept03.c:48: TPASS: accept() on /proc/self/maps : ENOTSOCK (88) accept03.c:48: TPASS: accept() on pipe read end : ENOTSOCK (88) accept03.c:48: TPASS: accept() on pipe write end : ENOTSOCK (88) accept03.c:48: TPASS: accept() on epoll : ENOTSOCK (88) accept03.c:48: TPASS: accept() on eventfd : ENOTSOCK (88) accept03.c:48: TPASS: accept() on signalfd : ENOTSOCK (88) accept03.c:48: TPASS: accept() on timerfd : ENOTSOCK (88) accept03.c:48: TPASS: accept() on pidfd : ENOTSOCK (88) accept03.c:48: TPASS: accept() on fanotify : ENOTSOCK (88) accept03.c:48: TPASS: accept() on inotify : ENOTSOCK (88) tst_fd.c:170: TCONF: Skipping userfaultfd: ENOSYS (38) accept03.c:48: TPASS: accept() on perf event : ENOTSOCK (88) tst_fd.c:199: TCONF: Skipping io uring: EPERM (1) accept03.c:48: TPASS: accept() on bpf map : ENOTSOCK (88) accept03.c:48: TPASS: accept() on fsopen : ENOTSOCK (88) accept03.c:48: TPASS: accept() on fspick : ENOTSOCK (88) accept03.c:48: TPASS: accept() on open_tree : EBADF (9) accept03.c:48: TPASS: accept() on memfd : ENOTSOCK (88) accept03.c:48: TPASS: accept() on memfd secret : ENOTSOCK (88) Summary: passed 21 failed 0 broken 0 skipped 2 warnings 0 $ cat xxx --19712-- WARNING: unhandled amd64-linux syscall: 323 --19712-- You may be able to write your own handler. --19712-- Read the file README_MISSING_SYSCALL_OR_IOCTL. --19712-- Nevertheless we consider this a bug. Please report --19712-- it at http://valgrind.org/support/bug_reports.html. $ rpm -q valgrind valgrind-3.24.0-3.el10.x86_64 $
Same test results with valgrind-3.25.0.GIT.
Created attachment 179365 [details] possible patch
I don’t think that this should be generic as it’s Linux only. It looks to me as if this creates a file descriptor. If so it needs to use the fd tracking mechanism. Is it OK to leave everything to the kernel? Or are there cases where VEX needs to write to the fd?
(In reply to Paul Floyd from comment #3) > I don’t think that this should be generic as it’s Linux only. Yes, this should be moved from syswrap-generic.c to syswrap-linux.c And it should then use LINX_ instead of GENX_. > It looks to me as if this creates a file descriptor. If so it needs to use > the fd tracking mechanism. Yes, so this needs a POST (called after the syscall succeeds) that does the file descriptor tracking, something like: POST(sys_userfaultfd) { if (!ML_(fd_allowed)(RES, "userfaultfd", tid, True)) { VG_(close)(RES); SET_STATUS_Failure( VKI_EMFILE ); } else { if (VG_(clo_track_fds)) ML_(record_fd_open_nameless)(tid, RES); } } And when a syscall wrapper has a POST it needs to be registered with XY (X has PRE, Y has POST). So LINXY in the syswrap-<arch>-linux.c syscall_main_table array. BTW. Although this isn't a syscall defined in include/vki/vki-scnums-shared-linux.h, __NR_userfaultfd is defined for amd64, arm, arm64, mips32, mips64, nanomips, ppc32, ppc64, riscv64 and x86. And it is called similarly on each linux arch, so it really should/can be hooked up in all coregrind/m_syswrap/syswrap-*-linux.c files. > Is it OK to leave everything to the kernel? Or are there cases where VEX > needs to write to the fd? It is ok for this syscall just creating the fd. But when it get configured we might have to intercept/discard reads from it if they trigger for pages allocated for Valgrind itself. See https://www.man7.org/linux/man-pages/man2/ioctl_userfaultfd.2.html but we can do that when we encounter programs that actually do that. BTW. I don't think the *flags |= SfMayBlock; in the PRE handler is necessary in this case. It is if the syscall may block and might depend on another thread (or some other event) calling back into the kernel. It might in the case of FUSE file systems for example for file descriptor creation, but I don't think userfaultfd file description creation is, it should return immediately. Also you want to use PRE_REG_READ1 not PRE_REG_READ0 (which is for syscalls without arguments). The first two arguments of the PRE_REG_READ? macros are the return time and syscall name, the rest are the argument types and names. You can check if you got those right using ./vg-in-place --trace-syscalls ...
Created attachment 179492 [details] possible patch Hello guys, thank you for all the comments. I've found existing wrapper for sys_epoll_create that seems to be close to this new sys_userfaultfd, and modelled the new wrapper after it. Does the new patch look fine?
(In reply to mcermak from comment #5) > Created attachment 179492 [details] > possible patch > > Hello guys, thank you for all the comments. I've found existing wrapper for > sys_epoll_create that seems to be close to this new sys_userfaultfd, and > modelled the new wrapper after it. Does the new patch look fine? Yes, that looks good. Thanks. You are only hooking this up for amd64-linux, but __NR_userfaultfd is defined for all linux arches in: include/vki/vki-scnums-amd64-linux.h include/vki/vki-scnums-arm-linux.h include/vki/vki-scnums-arm64-linux.h include/vki/vki-scnums-mips32-linux.h include/vki/vki-scnums-mips64-linux.h include/vki/vki-scnums-nanomips-linux.h include/vki/vki-scnums-ppc32-linux.h include/vki/vki-scnums-ppc64-linux.h include/vki/vki-scnums-riscv64-linux.h include/vki/vki-scnums-x86-linux.h So you really can/should hook this up also in: coregrind/m_syswrap/syswrap-arm-linux.c coregrind/m_syswrap/syswrap-arm64-linux.c coregrind/m_syswrap/syswrap-mips32-linux.c coregrind/m_syswrap/syswrap-mips64-linux.c coregrind/m_syswrap/syswrap-nanomips-linux.c coregrind/m_syswrap/syswrap-ppc32-linux.c coregrind/m_syswrap/syswrap-ppc64-linux.c coregrind/m_syswrap/syswrap-riscv64-linux.c coregrind/m_syswrap/syswrap-s390x-linux.c coregrind/m_syswrap/syswrap-x86-linux.c
Created attachment 179531 [details] possible patch I've grabbed aarch64, ppc64le and s390x test systems, and tested that this updated patch works on those arches. Does the patch look OK? Should I add hook up the remaining arches (I don't have access to those) without true testing?
(In reply to mcermak from comment #7) > I've grabbed aarch64, ppc64le and s390x test systems, and tested that this > updated patch works on those arches. Does the patch look OK? Yes, thanks. And you saw that I lied (sorry, not intentionally!) and __NR_userfaultfd wasn't defined for linux-s390x. Thanks for looking up the correct number. > Should I add hook up the remaining arches (I don't have access to those) without true > testing? Yes please, this code is simple enough that it should be fine on any arch that has the userfaultfd define already. Note that most code is build for all arches, so you should at least see any compile errors (like the missing __NR_userfaultfd on s390x). And that we have nightly testers and the sourceware buildbot for various arches: https://builder.sourceware.org/buildbot/#/builders?tags=valgrind
> Yes please, this code is simple enough Mark, is it really only about "hooking" the existing PRE() and POST() up? I've noticed that for instance i386 passes sycall params via stack (as opposed to amd64 where this gets passed via regs). The existing PRE() relies on register parameter passing. Not sure about other arches. Thoughts?
(In reply to mcermak from comment #9) > > Yes please, this code is simple enough > > Mark, is it really only about "hooking" the existing PRE() and POST() up? > I've noticed that for instance i386 passes sycall params via stack (as > opposed to amd64 where this gets passed via regs). The existing PRE() > relies on register parameter passing. Not sure about other arches. > Thoughts? Marshalling syscall args to/from registers/stack from/to the syscall is handled elsewhere. The only issue the sometimes arises is the case where a 32bit OS syscall has a 64bit argument. In that case a separate syscall wrapper is needed because the 64bit arguemnt will get split into two 32 bit ones. As an example take a look at fallocate.
Created attachment 179542 [details] possible patch Gotcha, thank you Paul. I've added the hooks for the remaining arches too, and carefully eyeballed it. I've requested gcc compile farm account so that next time I can hopefully test my updates on all needed arches. Does the attached patch look good now?
Patch does look good, pushed as: commit f697142e15c44c2d3e3910ae436a6597e8782aea Author: Martin Cermak <mcermak@redhat.com> Date: Tue Mar 18 18:17:30 2025 +0100 Wrap linux specific userfaultfd syscall userfaultfd takes a flags argument and returns a file descriptor. It shows up in the Linux Test Project syscalls tests as unhandled. Declare a sys_userfaultfd wrapper in priv_syswrap-linux.h and hook it for {amd64,arm,arm64,mips64,nanomips,ppc32,ppc64,riscv64,s390x,x86}-linux using LINXY with PRE/POST handlers in syswrap-linux.c. Define __NR_userfaultfd in vki-scnums-s390x-linux.h. It was already defined for all other arches. https://bugs.kde.org/show_bug.cgi?id=501365