Normally a newly recreated file descriptor gets the lowest available number. This might cause old file descriptor numbers to be reused and hides bad file descriptor accesses (because the old number is new again). It would be nice to have a mode where valgrind makes sure new file descriptors always get a new, not yet used before number. This might be implemented by making record_fd_open* return a new file descriptor (by dupping the given file descriptor only a higher never used before number and closing the one returned from the kernel) and returning that to the application. We might have to make an exception for file descriptors 0, 1, 2 which programs might explicitly want to reassign.
This new mode (--track-fds=new ?) would help catch issues like the following: #include <fcntl.h> #include <stdio.h> #include <unistd.h> int main () { int oldfd = open ("foobar.txt", O_RDWR|O_CREAT, S_IRUSR | S_IWUSR); printf ("got foobar.txt as: %d\n", oldfd); /*... do something with oldfd ...*/ close (oldfd); /* Lets open another file... */ int newfd = open ("foobad.txt", O_RDWR|O_CREAT, S_IRUSR | S_IWUSR); printf ("got foobad.txt as: %d\n", newfd); /* ... oops we are using the wrong fd (but same number...) */ dprintf (oldfd, "some new text\n"); close (newfd); return 0; }
Created attachment 178266 [details] patch Add --modify-fds=[no|high] option Normally a newly recreated file descriptor gets the lowest available number. This might cause old file descriptor numbers to be reused and hides bad file descriptor accesses (because the old number is new again). When enabled, when the program opens a new file descriptor, the highest available file descriptor is returned instead of the lowest one. Add the none/tests/track_new.stderr.exp test to test this new option.
Nitpick: Bool VG_(clo_run_cxx_freeres) = True; UInt VG_(clo_track_fds) = 0; +UInt VG_(clo_modify_fds) = 0; Bool VG_(clo_show_below_main)= False; Bool VG_(clo_keep_debuginfo) = False; The = signs don't line up. + (not yet seem before) file descriptor. */ s/seem/seen/ @@ -5897,12 +5903,13 @@ PRE(sys_openat) POST(sys_openat) { vg_assert(SUCCESS); + POST_newFd_RES; if (!ML_(fd_allowed)(RES, "openat", tid, True)) { VG_(close)(RES); SET_STATUS_Failure( VKI_EMFILE ); } else { if (VG_(clo_track_fds)) - ML_(record_fd_open_with_given_name)(tid, RES, (HChar*)(Addr)ARG2); + ML_(record_fd_open_with_given_name)(tid, RES, (HChar*)(Addr)ARG2); } } Unnecessary (and wrong) whitespace indentation change. +extern UInt VG_(clo_modify_fds); Needs comment what the values are/mean. --- /dev/null +++ b/none/tests/track_new.stderr.exp @@ -0,0 +1,12 @@ +File descriptor 1020 was closed already + Previously closed + at 0x........: close (in /...libc...) + by 0x........: main (track_new.c:11) That (1020) seems an oddly specific fd number. + Originally opened + at 0x........: open (in /...libc...) + by 0x........: main (track_new.c:8) + at 0x........: write (in /...libc...) + by 0x........: __printf_buffer_flush_dprintf (in /...libc...) + by 0x........: __vdprintf_internal (in /...libc...) + by 0x........: dprintf (in /...libc...) + by 0x........: main (track_new.c:17) The __ functions might be different on different glibc versions/arches? --- /dev/null +++ b/none/tests/track_new.stdout.exp @@ -0,0 +1,2 @@ +got foobar.txt as: 1020 +got foobad.txt as: 1019 Again oddly specific fd numbers. Given the new option/argument I would have expected some updates to the ./none/tests/cmdline*vgtests? There should probably some documentation that this is against POSIX which guarantees a new file descriptor is always the lowest possible number. POST_newFd_RES works for those syscalls that return the new fd as result (RES) of the syscall. What about other syscalls/fcntls that return a new fd some other way? grepping for record_fd_open I think I spot some syscalls missing: sys_inotify_init, sys_inotify_init1, sys_mq_open, timerfd_create, pipe[2], ... pipe2 is interesting since it returns 2 fds. Should all these other syscalls also hava fd adjustments?
Created attachment 179068 [details] Emit pp_ExeContext after the file descriptor backtrace
(In reply to Alexandra Hajkova from comment #4) > Created attachment 179068 [details] > Emit pp_ExeContext after the file descriptor backtrace Thanks that looks good. Pushed to git trunk and the VALGRIND_3_24_BRANCH since it might be a small bug, the output is really confusing. commit 838dc01d2c42f7f22785c771bd74bc4e595da444 Author: Alexandra Hájková <ahajkova@redhat.com> Date: Mon Mar 3 06:14:08 2025 -0500 syswrap-generic: Emit pp_ExeContext after the file descriptor backtrace Adjust use_after_close test for the change. VALGRIND_3_24_BRANCH commit ec7335142384ec9da66871036803b96319b590eb Author: Alexandra Hájková <ahajkova@redhat.com> Date: Mon Mar 3 06:14:08 2025 -0500 syswrap-generic: Emit pp_ExeContext after the file descriptor backtrace Adjust use_after_close test for the change. (cherry picked from commit 838dc01d2c42f7f22785c771bd74bc4e595da444)
Created attachment 179341 [details] patch Add --modify-fds=[no|high] option Normally a newly recreated file descriptor gets the lowest number available. This might cause old file descriptor numbers to be reused and hides bad file descriptor accesses (because the old number is new again). When enabled, when the program opens a new file descriptor, the highest available file descriptor is returned instead of the lowest one. Add the none/tests/track_new.stderr.exp test to test this new option. use_after_close Adjust none/tests/filter_fdleak to filter the track_new.vgtest, because of the fdleak changes, the output of the use_after_close test also had to be adjusted. ALso adjust the none/tests/cmdline1 and none/tests/cmdline2 output as the new --modify-fds=no|high is displayed. https://bugs.kde.org/show_bug.cgi?id=493433
There was a try build here: https://builder.sourceware.org/buildbot/#/changes/70972 With the following issues: - valgrind-try-fedora-x86_64 --- track_new.stderr.exp 2025-03-13 13:11:28.926443833 +0000 +++ track_new.stderr.out 2025-03-13 13:24:27.514375927 +0000 @@ -1,4 +1,4 @@ -File descriptor 1020 was closed already +File descriptor 20464 was closed already at 0x........: write (in /...libc...) by 0x........: __printf_buffer_flush_dprintf (in /...libc...) by 0x........: __vdprintf_internal (in /...libc...) - valgrind-try-fedora-arm64 track_new PASS - valgrind-try-debian-ppc64 --- track_new.stderr.exp 2025-03-13 13:33:28.997830006 +0000 +++ track_new.stderr.out 2025-03-13 14:05:32.273155198 +0000 @@ -1,12 +1,12 @@ File descriptor 1020 was closed already - at 0x........: write (in /...libc...) - by 0x........: __printf_buffer_flush_dprintf (in /...libc...) - by 0x........: __vdprintf_internal (in /...libc...) - by 0x........: dprintf (in /...libc...) + at 0x........: write (write.c:26) + by 0x........: __printf_buffer_flush_dprintf (iovdprintf.c:49) + by 0x........: __vdprintf_internal (iovdprintf.c:72) + by 0x........: dprintf@@GLIBC_2.4 (dprintf.c:30) by 0x........: main (track_new.c:15) Previously closed - at 0x........: close (in /...libc...) + at 0x........: close (close.c:27) by 0x........: main (track_new.c:10) Originally opened - at 0x........: open (in /...libc...) + at 0x........: open (open64.c:41) by 0x........: main (track_new.c:8) - valgrind-try-fedora-s390x --- track_new.stderr.exp 2025-03-13 13:39:02.530937442 +0000 +++ track_new.stderr.out 2025-03-13 13:50:56.870871483 +0000 @@ -2,7 +2,7 @@ at 0x........: write (in /...libc...) by 0x........: __printf_buffer_flush_dprintf (in /...libc...) by 0x........: __vdprintf_internal (in /...libc...) - by 0x........: dprintf (in /...libc...) + by 0x........: dprintf@@GLIBC_2.4 (in /...libc...) by 0x........: main (track_new.c:15) Previously closed at 0x........: close (in /...libc...)
- valgrind-try-fedora-ppc64le --- track_new.stderr.exp 2025-03-13 14:19:50.376095914 +0000 +++ track_new.stderr.out 2025-03-13 14:50:59.480715273 +0000 @@ -2,7 +2,7 @@ at 0x........: write (in /...libc...) by 0x........: __printf_buffer_flush_dprintf (in /...libc...) by 0x........: __vdprintf_internal (in /...libc...) - by 0x........: dprintf (in /...libc...) + by 0x........: __dprintfieee128 (in /...libc...) by 0x........: main (track_new.c:15) Previously closed at 0x........: close (in /...libc...)
fedora-riscv track_new: valgrind -q --track-fds=yes --modify-fds=high ./track_new PASS
I've regression tested on x86_64 with Fedora 40 and Fedora 41 and the results were absolutely clean: == 839 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
I've regression tested on aarch64 with Fedora 40 and Fedora 41 and there were no regressions but there were some failures: gdbserver_tests/hgtls (stdoutB) memcheck/tests/dw4 (stderr) memcheck/tests/varinfo2 (stderr) memcheck/tests/varinfo4 (stderr) memcheck/tests/varinfo5 (stderr) memcheck/tests/varinfo6 (stderr) memcheck/tests/varinforestrict (stderr) helgrind/tests/hg05_race2 (stderr) helgrind/tests/tc20_verifywrap (stderr)