For security reasons I close all files in /proc/self/fd except ones I explicitly whitelist (closing all files under sysconf(_SC_OPEN_MAX) is incorrect because one can open a file and then lower resource limits). So, my process crashes upon closing a file in /proc/self/fd that Valgrind won't let it. I understand if this might be too much work for the developers to fix. Reproducible: Always Steps to Reproduce: 1. Read from /proc/self/fd 2. One reads incorrect data Actual Results: My program reads incorrect data. Expected Results: My program should only read the files it opens.
It's not at all clear what files you are referring to, but my guess is that you're trying to close a file that is reserved for valgrind (such as it's output channel) which it will refuse to let you do because as far as your program is concerned those descriptors are not supposed to exist. The descriptor limits reported to the client program by valgrind when getrlimit is called are deliberately adjusted to reserve a set of file descriptors for valgrind's use and you shouldn't try and touch those at all and if you do then valgrind will simply pretend the descriptor isn't open and return EBADF to your program. Yes, because we don't (except for a few things) virtualise /proc it is possible to discover that those descriptors exist but there's isn't much we can easily do about that.
> The descriptor limits reported to the client program by valgrind when getrlimit is called are deliberately adjusted to reserve a set of file descriptors for valgrind's use and you shouldn't try and touch those at all and if you do then valgrind will simply pretend the descriptor isn't open and return EBADF to your program. As I mentioned earlier using getrlimit to test which files to use is a terrible idea because the file limit can be lowered after already allocating file descriptors over the limit. I'm not sure what you are trying to say that is new. It might actually be a good idea not to wrap getrlimit that way so that insecure and broken programs that misuse getrlimits that way break under Valgrind instead of giving users a false sense of security. Anyways, I'll believe you that it'd be difficult and a lot of work to add the feature of virtualizing /proc. This really didn't show up in the documentation though so I think this is at the very least a documentation bug. So, I would suggest that Valgrind documents better the limitation of not virtualising very much information in /proc and that it is considered to NOT wrap getrlimits and report a lower file limit. I'm not sure if there is a legitimate use of getrlimit that would be served by artificially lowering it other than possibly being able to run programs like Valgrind itself (which allocate file descriptors at the end of the limit deliberately to hide them) under Valgrind.
Hi. One possible mitigation for this would be to add a Valgrind API call in valgrind.h for testing if a file descriptor is Valgrind's.
Note that the "test" program for bug https://bugs.kde.org/show_bug.cgi?id=337388 makes use of this bug. It would be nice if we could "virtualize" /proc/self/fd and /proc/self/fdinfo But up to now we have only "virtualized" proc files, not proc directories.
This is also the reason LTP testcase pipe07 fails: pipe07.c:76: TFAIL: exp_num_pipes (1014) != num_pipe_fds (1020) It tries to double check the number of open file descriptors by traversing /proc/self/fd
Created attachment 184210 [details] patch This change prevents client programs from seeing Valgrind's internal file descriptors when scanning /proc/self/fd or /proc/<pid>/fd, addressing a security and compatibility issue where defensive programs that close all non-whitelisted file descriptors would crash when attempting to close Valgrind's reserved descriptors. This patch modifies the getdents and getdents64 syscall wrappers to selectively filter out Valgrind's internal file descriptors only when listing /proc/*/fd directories. Regular directory listings remain unaffected, ensuring the fix is targeted and doesn't interfere with normal filesystem operations. Add none/tests/getdents_filter.vgtest test that tests that the Valgrind's file descriptors are hiddent from the client program and verifies both /proc/self/fd filtering and that regular directory listings remain unfiltered.
(In reply to Alexandra Hajkova from comment #6) > This change prevents client programs from seeing Valgrind's internal file > descriptors when scanning /proc/self/fd or /proc/<pid>/fd, Nice. > addressing a > security > and compatibility issue where defensive programs that close all > non-whitelisted > file descriptors would crash when attempting to close Valgrind's reserved > descriptors. That is overstating a bit imho. Just keep it short and simple with that first sentence. > This patch modifies the getdents and getdents64 syscall wrappers to > selectively > filter out Valgrind's internal file descriptors only when listing > /proc/*/fd > directories. OK, maybe add "for the current process"? > Regular directory listings remain unaffected, ensuring the > fix is > targeted and doesn't interfere with normal filesystem operations. Yes, of course, doesn't really need to be stated imho. > Add none/tests/getdents_filter.vgtest test that tests that the > Valgrind's file descriptors are hiddent from the client program > and verifies both /proc/self/fd filtering and that regular > directory listings remain unfiltered. OK. BTW. These sentences/lines are a bit wide, please keep < 78 chars if possible. > https://bugs.kde.org/show_bug.cgi?id=331311 Please also add this to NEWS. > diff --git a/.gitignore b/.gitignore > index e48a2ab0e..86a028cd6 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -1687,6 +1687,7 @@ > /none/tests/vgprintf > /none/tests/vgprintf_nvalgrind > /none/tests/yield > +/none/tests/getdents_filter OK > +/* Check if the given file descriptor points to a /proc/PID/fd directory. > + Returns True if it's a directory we should filter Valgrind FDs from. */ > +static Bool is_proc_fd_directory(Int fd) > +{ > + const HChar* path; > + if (!VG_(resolve_filename)(fd, &path)) > + return False; OK, that works and doesn't leak because path will point to a static buffer inside resolve_filename. Not thread safe, but OK because we are single threaded here. > + /* Check for /proc/self/fd or /proc/PID/fd patterns */ > + if (VG_(strncmp)(path, "/proc/", 6) != 0) > + return False; > + > + /* Look for "/fd" at the end */ > + Int path_len = VG_(strlen)(path); > + if (path_len < 9) /* Minimum: /proc/1/fd */ > + return False; > + > + if (VG_(strcmp)(path + path_len - 3, "/fd") != 0) > + return False; > + > + /* Check if it's /proc/self/fd */ > + if (VG_(strncmp)(path + 6, "self/fd", 7) == 0) > + return True; > + > + /* Check if it's /proc/<valgrind_pid>/fd */ > + Int valgrind_pid = VG_(getpid)(); > + HChar pid_str[16]; > + VG_(sprintf)(pid_str, "%d", valgrind_pid); > + Int pid_str_len = VG_(strlen)(pid_str); > + > + if (VG_(strncmp)(path + 6, pid_str, pid_str_len) == 0 && > + VG_(strcmp)(path + 6 + pid_str_len, "/fd") == 0) > + return True; That is a lot of checks, and probably correct, but why not do something a bit simpler like: HChar ppfd_name[26] /* large enough 6 + 16 + 4 */ VG_(sprintf)(ppfd, "/proc/%d/fd", VG_(getpid)()); if (VG_(strcmp)(path, "/proc/self/fd") || VG_(strcmp)(path, ppfd)) return True; > +/* Filter out Valgrind's internal file descriptors from getdents results. > + Prevents client programs from seeing or interfering with Valgrind's > + reserved FDs when scanning /proc/self/fd. */ Note that programs should be able to interfer with valgrind's reserved FDs because we (should) check that at every system call that takes or returns file descriptors. Please document orig_size, number of bytes (not number of dirents), and return value, number in bytes of total (remaining) dirents. > +static SizeT filter_valgrind_fds_from_getdents(struct vki_dirent *dirp, SizeT orig_size) > +{ > + struct vki_dirent *dp = dirp; > + struct vki_dirent *write_pos = dirp; > + SizeT bytes_processed = 0; > + SizeT new_size = 0; > + > + while (bytes_processed < orig_size) { > + Bool keep_entry = True; > + const HChar *name = dp->d_name; > + > + if (VG_(strcmp)(name, ".") != 0 && VG_(strcmp)(name, "..") != 0) { > + Bool is_numeric = True; > + > + /* Check if filename is numeric (potential FD number) */ > + for (Int i = 0; name[i] != '\0'; i++) { > + if (name[i] < '0' || name[i] > '9') { > + is_numeric = False; > + break; > + } > + } So, this is an should not happen? Should it have an assert? Or is it OK to silently keep these entries? > + > + /* Filter out Valgrind FDs by checking against reserved FD range */ > + if (is_numeric && name[0] != '\0') { When can name[0] be '\0'? Is that a valid entry? > + Int fd = VG_(strtoll10)(name, NULL); > + /* Hide FDs beyond soft limit or Valgrind's output FDs */ > + if (fd >= VG_(fd_soft_limit) || > + fd == VG_(log_output_sink).fd || > + fd == VG_(xml_output_sink).fd) > + keep_entry = False; > + } > + } > + > + if (keep_entry) { > + if (write_pos != dp) > + VG_(memmove)(write_pos, dp, dp->d_reclen); > + new_size += dp->d_reclen; > + write_pos = (struct vki_dirent *)((HChar *)write_pos + dp->d_reclen); > + } > + > + bytes_processed += dp->d_reclen; > + dp = (struct vki_dirent *)((HChar *)dp + dp->d_reclen); > + } > + > + return new_size; > +} OK, nice.
> > + if (VG_(strcmp)(name, ".") != 0 && VG_(strcmp)(name, "..") != 0) { > > + Bool is_numeric = True; > > + > > + /* Check if filename is numeric (potential FD number) */ > > + for (Int i = 0; name[i] != '\0'; i++) { > > + if (name[i] < '0' || name[i] > '9') { > > + is_numeric = False; > > + break; > > + } > > + } > > So, this is an should not happen? > Should it have an assert? > Or is it OK to silently keep these entries? > > > + The goal of this function is to filter out Valgrind's fd. Non numeric fd value is some weird thing and I'm not sure how to treat such a thing. I think it's kind of ok to silently keep them because they are not Valgrind's for sure and they should not bother us? Or we we could warn about them maybe? Also I don't think we should put an assert there because then we won't filter the Valgrind's fds because of some junk in that directory.
> > + > > + /* Filter out Valgrind FDs by checking against reserved FD range */ > > + if (is_numeric && name[0] != '\0') { > > When can name[0] be '\0'? Is that a valid entry? > Do you think such a check is an overkill? Dunno, I think if it happened it would not be a valid entry, some corrupted data case? I haven't seen such an entry to be honest.
Created attachment 184468 [details] patch
(In reply to Alexandra Hajkova from comment #10) > Created attachment 184468 [details] > patch We discussed this on irc already and I am happy with the patch. Maybe add a comment in the testcase why we are expecting 0, 1, 2 and 3. (Because those are the stdin/out/err and the opendir file descriptor.) But we also discussed an odd corner case. We might filter out all entries and return zero, indicating there are no more entries in the directory. On irc I said this couldn't happen, or would only happen to the entries at the end, and so returning zero would always be correct (end of directory). My assumptions for this not happening were: - All valgrind file descriptors are "high" >= VG_(fd_soft_limit). - Even the log/xml_output_sinks are because they are moved there by coregrind/m_libcprint.c (finalize_sink_fd) (So the fd == VG_(log_output_sink).fd || fd == VG_(xml_output_sink).fd checks are kind of redundant, finalize_sink_fd may fail, but then you probably want the program to see them because they are probably stdout or stderr). - getdents returns the entries in numerical order, so all valgrind file descriptors come at the end. Turns out that last assumption is wrong :{ You can see with: $ valgrind -q /bin/ls /proc/self/fd 0 1 1024 1025 1026 1027 1028 1029 1030 2 3 So there really is a change getdents for a particular size would only return "skipped" entries. In which case we would return zero and the program would assume it saw end of directory. The changes of this seem small, glibc seems to allocate a buffer between 32Kb and 1Mb to do getdent calls. Likely containing all entries at once. But it might still happen when someone does getdents calls by hand. See the example at https://www.man7.org/linux/man-pages/man2/getdents.2.html So we have to figure out what to do when result_size != RES && result_size == 0. (Maybe we just have to do a full new getdents call ourselves?) O, and I just remembered one (actually two) directories to filter in the same way. /proc/self/fdinfo or /proc/<pid>/fdinfo
Created attachment 184860 [details] patch
(In reply to Alexandra Hajkova from comment #12) > Created attachment 184860 [details] > patch Note that NEWS doesn't apply any more (we are fixing too many bugs :) Nice is_proc_fd_directory extension to handle fdinfo. Plus a testcase to check. So we now have a filter_valgrind_fds_from_getdents[64]_with_refill that replaces filter_valgrind_fds_from_getdents[64]. The old functions aren't used anymore, but are still in the patch. Lets remove them. I think this works. But it is a little repetitive. Maybe we can factor out the keep_entry part into its own function which just gets the name. Then all the loops are much simpler. In the retry case, when there is an error, we should probably return that (to the program as a failing syscall). This is kind of tricky to test for.
(In reply to Mark Wielaard from comment #13) > This is kind of tricky to test for. But there is an example in https://www.man7.org/linux/man-pages/man2/getdents.2.html#EXAMPLES You could use that with a really small BUF_SIZE to test some things. I think the filter_valgrind_fds_from_getdents_with_refill functions should return a SysRes instead of a SizeT so you can also return a failing (retry) syscall. And in the retry you don't have to check for remaining_buf > sizeof(struct vki_dirent), if new_size = 0 then none of the buffer has been used to store results (everything filtered out). If the buffer turns out to be too small now (unlikely, but could happen) then the syscall will fail with EINVAL. Just return that to the program/user and they will have to extend the buffer.
(In reply to Mark Wielaard from comment #14) > I think the filter_valgrind_fds_from_getdents_with_refill functions should > return a SysRes instead of a SizeT so you can also return a failing (retry) > syscall. Note that on irc I tried to explain this in a very complicate way. Don't listen to me :) The simplest way is for filter_valgrind_fds_from_getdents_with_refill to just do something like: SysRes retry = VG_(do_syscall3)(...) if (sr_isError(retry)) { SET_STATUS_from_SysRes(retry); return -1; } And then in the POST handler just check for ret < 0.
Created attachment 185004 [details] patch
Comment on attachment 185004 [details] patch I really like this version, it seems to tackle all the corner case. The extra testcase calling "raw" SYS_dentry is nice too. Only thing I would change about that is just printing the d_names instead of keeping track of retry_calls and entries_found (entries_found is probably stable, but retry_calls might differ depending on arch?). Also printing the fd names is what the rest of the tests do.
Created attachment 185212 [details] patch
(In reply to Alexandra Hajkova from comment #18) > Created attachment 185212 [details] > patch Very nice. Double checked the new testcase and the LTP pipe7 testcase pass with your new code. Please check it in.
commit e8e4066c3a0160f03d9dfffaa360b65eb79745d1 Author: Alexandra Hájková <ahajkova@redhat.com> Date: Tue Aug 12 12:17:54 2025 -0400 Filter Valgrind FDs from getdents syscalls This change prevents client programs from seeing Valgrind's internal file descriptors when scanning /proc/self/fd or /proc/<pid>/fd. This patch modifies the getdents and getdents64 syscall wrappers to selectively filter out Valgrind's internal file descriptors only when listing /proc/*/fd directories for the current process. Add none/tests/getdents_filter.vgtest test that tests that the Valgrind's file descriptors are hidden from the client program and verifies both /proc/self/fd filtering and that regular directory listings remain unfiltered. https://bugs.kde.org/show_bug.cgi?id=331311
Followup patch: commit b2c27072b05777fe8dc6d32791de1e4eff044ebb Author: Mark Wielaard <mark@klomp.org> Date: Wed Sep 24 23:16:29 2025 +0200 Deal with linux arches that don't have getdents, only getdents64 Not all linux arches have getdents, some newer arches (arm64 and riscv64) only implement getdents64. So only use the function filter_valgrind_fds_from_getdents_with_refill on linux with __NR_getdents. Also move the getdents_filter testcase under none/tests/linux and only use getdents64. Fixes: e8e4066c3a01 ("Filter Valgrind FDs from getdents syscalls") https://bugs.kde.org/show_bug.cgi?id=331311
Created attachment 185397 [details] patch
Created attachment 185416 [details] patch