Bug 439090 - Implement close_range(2) ("WARNING: unhandled amd64-linux syscall: 436")
Summary: Implement close_range(2) ("WARNING: unhandled amd64-linux syscall: 436")
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (other bugs)
Version First Reported In: unspecified
Platform: Debian unstable Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-23 22:50 UTC by Olly Betts
Modified: 2021-10-12 21:05 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
Work in progress patch (4.87 KB, patch)
2021-06-24 06:19 UTC, Olly Betts
Details
Add close_range(2) support (10.07 KB, text/plain)
2021-10-04 15:22 UTC, Lubomir Rintel
Details
Updated Add close_range(2) support patch (11.60 KB, text/plain)
2021-10-12 12:17 UTC, Mark Wielaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Olly Betts 2021-06-23 22:50:02 UTC
The close_range syscall was added in Linux 5.9.

STEPS TO REPRODUCE

$ cat <<END > test-close_range.c
#include <sys/syscall.h>
#include <unistd.h>
int main() { return !!syscall(__NR_close_range, 3, ~0U, 0); }
END
$ gcc -Wall -W -O2 ~/test-close_range.c
olly@gemse:~$ ./a.out ; echo $?
0
olly@gemse:~$ valgrind ./a.out ; echo $?
==539319== Memcheck, a memory error detector
==539319== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==539319== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==539319== Command: ./a.out
==539319== 
--539319-- WARNING: unhandled amd64-linux syscall: 436
--539319-- You may be able to write your own handler.
--539319-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--539319-- Nevertheless we consider this a bug.  Please report
--539319-- it at http://valgrind.org/support/bug_reports.html.
==539319== 
==539319== HEAP SUMMARY:
==539319==     in use at exit: 0 bytes in 0 blocks
==539319==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==539319== 
==539319== All heap blocks were freed -- no leaks are possible
==539319== 
==539319== For lists of detected and suppressed errors, rerun with: -s
==539319== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
1

OBSERVED RESULT

"WARNING: unhandled amd64-linux syscall: 436" and exit code 1

EXPECTED RESULT

No warning and exit code 0 (as without valgrind)

SOFTWARE/OS VERSIONS

Tested on Debian unstable which has valgrind 3.16.1 (not available in the bugzilla version list).

ADDITIONAL INFORMATION

I've checked NEWS in current valgrind git and it doesn't look like this has been fixed since 3.16.1.  I also couldn't find an existing report in bugzilla.
Comment 1 Olly Betts 2021-06-24 06:18:17 UTC
I had a go at implementing support for close_range, but ended up out of my depth.

The problem I have no idea how to solve is what to do if log fds fall in the middle of the range of fds which are to be closed - looking at other syscall wrappers, I can see that I can adjust the parameters, but I potentially need to synthesise multiple close_from calls from one and combine the return codes and any errno values and I couldn't see any existing examples of that.

I'll attach the patch I have as it's hopefully a useful start for somebody with more of a clue than me - I've checked it compiles but nothing further.
Comment 2 Olly Betts 2021-06-24 06:19:15 UTC
Created attachment 139634 [details]
Work in progress patch
Comment 3 Olly Betts 2021-09-17 06:50:56 UTC
glibc 2.34 (released 2021-08-01) added close_range() and closefrom(), both of which make use of the close_range syscall.
Comment 4 Lubomir Rintel 2021-10-04 15:22:56 UTC
Created attachment 142144 [details]
Add close_range(2) support

Here's my take at implementing close_range(2) support. Hope it's useful.
Comment 5 Mark Wielaard 2021-10-11 22:51:38 UTC
(In reply to Lubomir Rintel from comment #4)
> Created attachment 142144 [details]
> Add close_range(2) support
> 
> Here's my take at implementing close_range(2) support. Hope it's useful.

Thanks. This looks near perfect.

I kept the FUSE_COMPATIBLE_MAY_BLOCK since we have added that for other modern file syscalls too (e.g. statx)

At first I thought we needed to use ML_(fd_allowed), but that would croak in some casses we don't really want here. So the manual checks here are good.

last should be unsigned too, otherwise the comparisons don't work if ARG2 == -1.

In the POST handler we should do the same limiting as in the the PRE handler or we'll keep calling record_closed_fd forever when running with --trace-fds=yes and someone does a close_range (3, -1).

Also, we shouldn't record the fds as closed when the CLOSE_RANGE_CLOEXEC flag is set.

I have changes for the above. Will post tomorrow.
Comment 6 Mark Wielaard 2021-10-12 12:12:16 UTC
This is the diff on top of the original:

diff --git a/coregrind/m_syswrap/syswrap-linux.c b/coregrind/m_syswrap/syswrap-linux.c
index 2840b3af0..63dd1fb66 100644
--- a/coregrind/m_syswrap/syswrap-linux.c
+++ b/coregrind/m_syswrap/syswrap-linux.c
@@ -13319,7 +13319,7 @@ PRE(sys_close_range)
 {
    SysRes res = VG_(mk_SysRes_Success)(0);
    unsigned int beg, end;
-   Int last = ARG2;
+   unsigned int last = ARG2;
 
    FUSE_COMPATIBLE_MAY_BLOCK();
    PRINT("sys_close_range ( %" FMT_REGWORD "u, %" FMT_REGWORD "u, %"
@@ -13343,8 +13343,10 @@ PRE(sys_close_range)
 
    beg = end = ARG1;
    do {
-      if (end > last || ( end == 2/*stderr*/ && VG_(debugLog_getLevel)() > 0) ||
-          end == VG_(log_output_sink).fd || end == VG_(xml_output_sink).fd) {
+      if (end > last
+         || (end == 2/*stderr*/ && VG_(debugLog_getLevel)() > 0)
+         || end == VG_(log_output_sink).fd
+         || end == VG_(xml_output_sink).fd) {
          /* Split the range if it contains a file descriptor we're not
           * supposed to close. */
          if (end - 1 >= beg)
@@ -13360,11 +13362,19 @@ PRE(sys_close_range)
 POST(sys_close_range)
 {
    unsigned int fd;
+   unsigned int last = ARG2;
 
-   if (!VG_(clo_track_fds))
+   if (!VG_(clo_track_fds)
+       || (ARG3 & VKI_CLOSE_RANGE_CLOEXEC) != 0)
       return;
 
-   for (fd = ARG1; fd <= ARG2; fd++)
+   if (last >= VG_(fd_hard_limit))
+      last = VG_(fd_hard_limit) - 1;
+
+   for (fd = ARG1; fd <= last; fd++)
+      if ((fd != 2/*stderr*/ || VG_(debugLog_getLevel)() == 0)
+         && fd != VG_(log_output_sink).fd
+         && fd != VG_(xml_output_sink).fd)
       ML_(record_fd_close)(fd);
 }
 
diff --git a/include/vki/vki-linux.h b/include/vki/vki-linux.h
index 426d9db92..eb4e01b33 100644
--- a/include/vki/vki-linux.h
+++ b/include/vki/vki-linux.h
@@ -5369,6 +5369,9 @@ struct vki_itimerspec64 {
 
 #define VKI_RLIM64_INFINITY (~0ULL)
 
+#define VKI_CLOSE_RANGE_UNSHARE (1U << 1)
+#define VKI_CLOSE_RANGE_CLOEXEC (1U << 2)
+
 /*--------------------------------------------------------------------*/
 /*--- end                                                          ---*/
 /*--------------------------------------------------------------------*/
Comment 7 Mark Wielaard 2021-10-12 12:17:37 UTC
Created attachment 142366 [details]
Updated Add close_range(2) support patch

Updated patch and commit message, including the changes mentioned above and adding a NEWS entry.
Comment 8 Mark Wielaard 2021-10-12 21:05:23 UTC
commit a21e890f82258c17ee47895fa28bb62937eb1af9
Author: Lubomir Rintel <lkundrak@v3.sk>
Date:   Mon Oct 4 15:40:29 2021 +0200

    Add close_range(2) support
    
    This is a system call introduced in Linux 5.9.
    
    It's typically used to bulk-close file descriptors that a process inherited
    without having desired so and doesn't want to pass them to its offspring
    for security reasons. For this reason the sensible upper limit value tends
    to be unknown and the users prefer to stay on the safe side by setting it
    high.
    
    This is a bit peculiar because, if unfiltered, the syscall could end up
    closing descriptors Valgrind uses for its purposes, ending in no end of
    mayhem and suffering.
    
    This patch adjusts the upper bounds to a safe value and then skips over
    the descriptor Valgrind uses by potentially calling the real system call
    with sub-ranges that are safe to close.
    
    The call can fail on negative ranges and bad flags -- we're dealing with
    the first condition ourselves while letting the real call fail on bad
    flags.
    
    https://bugs.kde.org/show_bug.cgi?id=439090