The close_range syscall was added in Linux 5.9. STEPS TO REPRODUCE $ cat <<END > test-close_range.c #include <sys/syscall.h> #include <unistd.h> int main() { return !!syscall(__NR_close_range, 3, ~0U, 0); } END $ gcc -Wall -W -O2 ~/test-close_range.c olly@gemse:~$ ./a.out ; echo $? 0 olly@gemse:~$ valgrind ./a.out ; echo $? ==539319== Memcheck, a memory error detector ==539319== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==539319== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info ==539319== Command: ./a.out ==539319== --539319-- WARNING: unhandled amd64-linux syscall: 436 --539319-- You may be able to write your own handler. --539319-- Read the file README_MISSING_SYSCALL_OR_IOCTL. --539319-- Nevertheless we consider this a bug. Please report --539319-- it at http://valgrind.org/support/bug_reports.html. ==539319== ==539319== HEAP SUMMARY: ==539319== in use at exit: 0 bytes in 0 blocks ==539319== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==539319== ==539319== All heap blocks were freed -- no leaks are possible ==539319== ==539319== For lists of detected and suppressed errors, rerun with: -s ==539319== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) 1 OBSERVED RESULT "WARNING: unhandled amd64-linux syscall: 436" and exit code 1 EXPECTED RESULT No warning and exit code 0 (as without valgrind) SOFTWARE/OS VERSIONS Tested on Debian unstable which has valgrind 3.16.1 (not available in the bugzilla version list). ADDITIONAL INFORMATION I've checked NEWS in current valgrind git and it doesn't look like this has been fixed since 3.16.1. I also couldn't find an existing report in bugzilla.
I had a go at implementing support for close_range, but ended up out of my depth. The problem I have no idea how to solve is what to do if log fds fall in the middle of the range of fds which are to be closed - looking at other syscall wrappers, I can see that I can adjust the parameters, but I potentially need to synthesise multiple close_from calls from one and combine the return codes and any errno values and I couldn't see any existing examples of that. I'll attach the patch I have as it's hopefully a useful start for somebody with more of a clue than me - I've checked it compiles but nothing further.
Created attachment 139634 [details] Work in progress patch
glibc 2.34 (released 2021-08-01) added close_range() and closefrom(), both of which make use of the close_range syscall.
Created attachment 142144 [details] Add close_range(2) support Here's my take at implementing close_range(2) support. Hope it's useful.
(In reply to Lubomir Rintel from comment #4) > Created attachment 142144 [details] > Add close_range(2) support > > Here's my take at implementing close_range(2) support. Hope it's useful. Thanks. This looks near perfect. I kept the FUSE_COMPATIBLE_MAY_BLOCK since we have added that for other modern file syscalls too (e.g. statx) At first I thought we needed to use ML_(fd_allowed), but that would croak in some casses we don't really want here. So the manual checks here are good. last should be unsigned too, otherwise the comparisons don't work if ARG2 == -1. In the POST handler we should do the same limiting as in the the PRE handler or we'll keep calling record_closed_fd forever when running with --trace-fds=yes and someone does a close_range (3, -1). Also, we shouldn't record the fds as closed when the CLOSE_RANGE_CLOEXEC flag is set. I have changes for the above. Will post tomorrow.
This is the diff on top of the original: diff --git a/coregrind/m_syswrap/syswrap-linux.c b/coregrind/m_syswrap/syswrap-linux.c index 2840b3af0..63dd1fb66 100644 --- a/coregrind/m_syswrap/syswrap-linux.c +++ b/coregrind/m_syswrap/syswrap-linux.c @@ -13319,7 +13319,7 @@ PRE(sys_close_range) { SysRes res = VG_(mk_SysRes_Success)(0); unsigned int beg, end; - Int last = ARG2; + unsigned int last = ARG2; FUSE_COMPATIBLE_MAY_BLOCK(); PRINT("sys_close_range ( %" FMT_REGWORD "u, %" FMT_REGWORD "u, %" @@ -13343,8 +13343,10 @@ PRE(sys_close_range) beg = end = ARG1; do { - if (end > last || ( end == 2/*stderr*/ && VG_(debugLog_getLevel)() > 0) || - end == VG_(log_output_sink).fd || end == VG_(xml_output_sink).fd) { + if (end > last + || (end == 2/*stderr*/ && VG_(debugLog_getLevel)() > 0) + || end == VG_(log_output_sink).fd + || end == VG_(xml_output_sink).fd) { /* Split the range if it contains a file descriptor we're not * supposed to close. */ if (end - 1 >= beg) @@ -13360,11 +13362,19 @@ PRE(sys_close_range) POST(sys_close_range) { unsigned int fd; + unsigned int last = ARG2; - if (!VG_(clo_track_fds)) + if (!VG_(clo_track_fds) + || (ARG3 & VKI_CLOSE_RANGE_CLOEXEC) != 0) return; - for (fd = ARG1; fd <= ARG2; fd++) + if (last >= VG_(fd_hard_limit)) + last = VG_(fd_hard_limit) - 1; + + for (fd = ARG1; fd <= last; fd++) + if ((fd != 2/*stderr*/ || VG_(debugLog_getLevel)() == 0) + && fd != VG_(log_output_sink).fd + && fd != VG_(xml_output_sink).fd) ML_(record_fd_close)(fd); } diff --git a/include/vki/vki-linux.h b/include/vki/vki-linux.h index 426d9db92..eb4e01b33 100644 --- a/include/vki/vki-linux.h +++ b/include/vki/vki-linux.h @@ -5369,6 +5369,9 @@ struct vki_itimerspec64 { #define VKI_RLIM64_INFINITY (~0ULL) +#define VKI_CLOSE_RANGE_UNSHARE (1U << 1) +#define VKI_CLOSE_RANGE_CLOEXEC (1U << 2) + /*--------------------------------------------------------------------*/ /*--- end ---*/ /*--------------------------------------------------------------------*/
Created attachment 142366 [details] Updated Add close_range(2) support patch Updated patch and commit message, including the changes mentioned above and adding a NEWS entry.
commit a21e890f82258c17ee47895fa28bb62937eb1af9 Author: Lubomir Rintel <lkundrak@v3.sk> Date: Mon Oct 4 15:40:29 2021 +0200 Add close_range(2) support This is a system call introduced in Linux 5.9. It's typically used to bulk-close file descriptors that a process inherited without having desired so and doesn't want to pass them to its offspring for security reasons. For this reason the sensible upper limit value tends to be unknown and the users prefer to stay on the safe side by setting it high. This is a bit peculiar because, if unfiltered, the syscall could end up closing descriptors Valgrind uses for its purposes, ending in no end of mayhem and suffering. This patch adjusts the upper bounds to a safe value and then skips over the descriptor Valgrind uses by potentially calling the real system call with sub-ranges that are safe to close. The call can fail on negative ranges and bad flags -- we're dealing with the first condition ourselves while letting the real call fail on bad flags. https://bugs.kde.org/show_bug.cgi?id=439090