Normally a newly recreated file descriptor gets the lowest available number. This might cause old file descriptor numbers to be reused and hides bad file descriptor accesses (because the old number is new again). It would be nice to have a mode where valgrind makes sure new file descriptors always get a new, not yet used before number. This might be implemented by making record_fd_open* return a new file descriptor (by dupping the given file descriptor only a higher never used before number and closing the one returned from the kernel) and returning that to the application. We might have to make an exception for file descriptors 0, 1, 2 which programs might explicitly want to reassign.
This new mode (--track-fds=new ?) would help catch issues like the following: #include <fcntl.h> #include <stdio.h> #include <unistd.h> int main () { int oldfd = open ("foobar.txt", O_RDWR|O_CREAT, S_IRUSR | S_IWUSR); printf ("got foobar.txt as: %d\n", oldfd); /*... do something with oldfd ...*/ close (oldfd); /* Lets open another file... */ int newfd = open ("foobad.txt", O_RDWR|O_CREAT, S_IRUSR | S_IWUSR); printf ("got foobad.txt as: %d\n", newfd); /* ... oops we are using the wrong fd (but same number...) */ dprintf (oldfd, "some new text\n"); close (newfd); return 0; }
Created attachment 178266 [details] patch Add --modify-fds=[no|high] option Normally a newly recreated file descriptor gets the lowest available number. This might cause old file descriptor numbers to be reused and hides bad file descriptor accesses (because the old number is new again). When enabled, when the program opens a new file descriptor, the highest available file descriptor is returned instead of the lowest one. Add the none/tests/track_new.stderr.exp test to test this new option.
Nitpick: Bool VG_(clo_run_cxx_freeres) = True; UInt VG_(clo_track_fds) = 0; +UInt VG_(clo_modify_fds) = 0; Bool VG_(clo_show_below_main)= False; Bool VG_(clo_keep_debuginfo) = False; The = signs don't line up. + (not yet seem before) file descriptor. */ s/seem/seen/ @@ -5897,12 +5903,13 @@ PRE(sys_openat) POST(sys_openat) { vg_assert(SUCCESS); + POST_newFd_RES; if (!ML_(fd_allowed)(RES, "openat", tid, True)) { VG_(close)(RES); SET_STATUS_Failure( VKI_EMFILE ); } else { if (VG_(clo_track_fds)) - ML_(record_fd_open_with_given_name)(tid, RES, (HChar*)(Addr)ARG2); + ML_(record_fd_open_with_given_name)(tid, RES, (HChar*)(Addr)ARG2); } } Unnecessary (and wrong) whitespace indentation change. +extern UInt VG_(clo_modify_fds); Needs comment what the values are/mean. --- /dev/null +++ b/none/tests/track_new.stderr.exp @@ -0,0 +1,12 @@ +File descriptor 1020 was closed already + Previously closed + at 0x........: close (in /...libc...) + by 0x........: main (track_new.c:11) That (1020) seems an oddly specific fd number. + Originally opened + at 0x........: open (in /...libc...) + by 0x........: main (track_new.c:8) + at 0x........: write (in /...libc...) + by 0x........: __printf_buffer_flush_dprintf (in /...libc...) + by 0x........: __vdprintf_internal (in /...libc...) + by 0x........: dprintf (in /...libc...) + by 0x........: main (track_new.c:17) The __ functions might be different on different glibc versions/arches? --- /dev/null +++ b/none/tests/track_new.stdout.exp @@ -0,0 +1,2 @@ +got foobar.txt as: 1020 +got foobad.txt as: 1019 Again oddly specific fd numbers. Given the new option/argument I would have expected some updates to the ./none/tests/cmdline*vgtests? There should probably some documentation that this is against POSIX which guarantees a new file descriptor is always the lowest possible number. POST_newFd_RES works for those syscalls that return the new fd as result (RES) of the syscall. What about other syscalls/fcntls that return a new fd some other way? grepping for record_fd_open I think I spot some syscalls missing: sys_inotify_init, sys_inotify_init1, sys_mq_open, timerfd_create, pipe[2], ... pipe2 is interesting since it returns 2 fds. Should all these other syscalls also hava fd adjustments?
Created attachment 179068 [details] Emit pp_ExeContext after the file descriptor backtrace
(In reply to Alexandra Hajkova from comment #4) > Created attachment 179068 [details] > Emit pp_ExeContext after the file descriptor backtrace Thanks that looks good. Pushed to git trunk and the VALGRIND_3_24_BRANCH since it might be a small bug, the output is really confusing. commit 838dc01d2c42f7f22785c771bd74bc4e595da444 Author: Alexandra Hájková <ahajkova@redhat.com> Date: Mon Mar 3 06:14:08 2025 -0500 syswrap-generic: Emit pp_ExeContext after the file descriptor backtrace Adjust use_after_close test for the change. VALGRIND_3_24_BRANCH commit ec7335142384ec9da66871036803b96319b590eb Author: Alexandra Hájková <ahajkova@redhat.com> Date: Mon Mar 3 06:14:08 2025 -0500 syswrap-generic: Emit pp_ExeContext after the file descriptor backtrace Adjust use_after_close test for the change. (cherry picked from commit 838dc01d2c42f7f22785c771bd74bc4e595da444)
Created attachment 179341 [details] patch Add --modify-fds=[no|high] option Normally a newly recreated file descriptor gets the lowest number available. This might cause old file descriptor numbers to be reused and hides bad file descriptor accesses (because the old number is new again). When enabled, when the program opens a new file descriptor, the highest available file descriptor is returned instead of the lowest one. Add the none/tests/track_new.stderr.exp test to test this new option. use_after_close Adjust none/tests/filter_fdleak to filter the track_new.vgtest, because of the fdleak changes, the output of the use_after_close test also had to be adjusted. ALso adjust the none/tests/cmdline1 and none/tests/cmdline2 output as the new --modify-fds=no|high is displayed. https://bugs.kde.org/show_bug.cgi?id=493433
There was a try build here: https://builder.sourceware.org/buildbot/#/changes/70972 With the following issues: - valgrind-try-fedora-x86_64 --- track_new.stderr.exp 2025-03-13 13:11:28.926443833 +0000 +++ track_new.stderr.out 2025-03-13 13:24:27.514375927 +0000 @@ -1,4 +1,4 @@ -File descriptor 1020 was closed already +File descriptor 20464 was closed already at 0x........: write (in /...libc...) by 0x........: __printf_buffer_flush_dprintf (in /...libc...) by 0x........: __vdprintf_internal (in /...libc...) - valgrind-try-fedora-arm64 track_new PASS - valgrind-try-debian-ppc64 --- track_new.stderr.exp 2025-03-13 13:33:28.997830006 +0000 +++ track_new.stderr.out 2025-03-13 14:05:32.273155198 +0000 @@ -1,12 +1,12 @@ File descriptor 1020 was closed already - at 0x........: write (in /...libc...) - by 0x........: __printf_buffer_flush_dprintf (in /...libc...) - by 0x........: __vdprintf_internal (in /...libc...) - by 0x........: dprintf (in /...libc...) + at 0x........: write (write.c:26) + by 0x........: __printf_buffer_flush_dprintf (iovdprintf.c:49) + by 0x........: __vdprintf_internal (iovdprintf.c:72) + by 0x........: dprintf@@GLIBC_2.4 (dprintf.c:30) by 0x........: main (track_new.c:15) Previously closed - at 0x........: close (in /...libc...) + at 0x........: close (close.c:27) by 0x........: main (track_new.c:10) Originally opened - at 0x........: open (in /...libc...) + at 0x........: open (open64.c:41) by 0x........: main (track_new.c:8) - valgrind-try-fedora-s390x --- track_new.stderr.exp 2025-03-13 13:39:02.530937442 +0000 +++ track_new.stderr.out 2025-03-13 13:50:56.870871483 +0000 @@ -2,7 +2,7 @@ at 0x........: write (in /...libc...) by 0x........: __printf_buffer_flush_dprintf (in /...libc...) by 0x........: __vdprintf_internal (in /...libc...) - by 0x........: dprintf (in /...libc...) + by 0x........: dprintf@@GLIBC_2.4 (in /...libc...) by 0x........: main (track_new.c:15) Previously closed at 0x........: close (in /...libc...)
- valgrind-try-fedora-ppc64le --- track_new.stderr.exp 2025-03-13 14:19:50.376095914 +0000 +++ track_new.stderr.out 2025-03-13 14:50:59.480715273 +0000 @@ -2,7 +2,7 @@ at 0x........: write (in /...libc...) by 0x........: __printf_buffer_flush_dprintf (in /...libc...) by 0x........: __vdprintf_internal (in /...libc...) - by 0x........: dprintf (in /...libc...) + by 0x........: __dprintfieee128 (in /...libc...) by 0x........: main (track_new.c:15) Previously closed at 0x........: close (in /...libc...)
fedora-riscv track_new: valgrind -q --track-fds=yes --modify-fds=high ./track_new PASS
I've regression tested on x86_64 with Fedora 40 and Fedora 41 and the results were absolutely clean: == 839 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
I've regression tested on aarch64 with Fedora 40 and Fedora 41 and there were no regressions but there were some failures: gdbserver_tests/hgtls (stdoutB) memcheck/tests/dw4 (stderr) memcheck/tests/varinfo2 (stderr) memcheck/tests/varinfo4 (stderr) memcheck/tests/varinfo5 (stderr) memcheck/tests/varinfo6 (stderr) memcheck/tests/varinforestrict (stderr) helgrind/tests/hg05_race2 (stderr) helgrind/tests/tc20_verifywrap (stderr)
Looking through the results of some of the try builds it looks there are three kind of issues with matching the backtraces. (The testsuite backtrace matching is sadly somewhat flaky in general). First there are internal symbols in the c library because there are multiple calls between the dprintf and the system call. Second there are sometimes versioned symbols. Third the debian (32bit) arches seem to have a totally different call chain (and system call) than the other fedora 64bit ones. I would try to use stderr_filter: filter_fdleak in none/tests/track_new.vgtest And completely filter out any frames with internal symbols starting with _. # Remove "internal" _functions sed '/by 0x........: _/d' | Then remove all symbol versions (those with @ and @@) # Remove @version symbols sed -E 's/ ([a-zA-Z0-9_]+)@@?[A-Z0-9._]+/ \1/' | Finally try replacing the new possible debuginfo (source:line) with in (/...libc...) perl -p -e "s/\(dprintf.c:[0-9]*\)/(in \/...libc...)/" | perl -p -e "s/\(open.c:[0-9]*\)/(in \/...libc...)/" | perl -p -e "s/\(lseek(?:64)?.c:[0-9]*\)/(in \/...libc...)/" | Then have simpler .exp files (one for "regular" and one for "debian 32bit": == none/tests/track_new.stderr.exp == File descriptor was closed already at 0x........: write (in /...libc...) by 0x........: dprintf (in /...libc...) by 0x........: main Previously closed at 0x........: close (in /...libc...) by 0x........: main Originally opened at 0x........: creat (in /...libc...) by 0x........: main == none/tests/track_new.stderr.exp.debian32 == File descriptor was closed already at 0x........: llseek (in /...libc...) by 0x........: dprintf (in /...libc...) by 0x........: main Previously closed at 0x........: close (in /...libc...) by 0x........: main Originally opened at 0x........: creat (in /...libc...) by 0x........: main
Created attachment 180312 [details] Add --modify-fds=[no|high] option Testing patch taken from https://git.sr.ht/~sasshka/valgrind fds2 branch and adjusted slightly to make the tests pass on debian 32bit arches.
One last filter tweak for the fedora powerpc 64 le results: diff --git a/none/tests/filter_fdleak b/none/tests/filter_fdleak index 5fbd74bcf..2cc537df7 100755 --- a/none/tests/filter_fdleak +++ b/none/tests/filter_fdleak @@ -34,7 +34,7 @@ perl -p -e "s/: open \(/: creat (/" | # arm64 write resolved to file:line with debuginfo perl -p -e "s/write\.c:[1-9][0-9]*/in \/...libc.../" | #ppc64le -sed 's/__dprintf.*/dprintf/' | +sed 's/__dprintf.*/dprintf \(in \/...libc...\)/' | # Remove "internal" _functions sed '/by 0x........: _/d' |
commit 4c78562419ce2c9b6a21a3c9dfc9bf90638c9649 Author: Alexandra Hájková <ahajkova@redhat.com> Date: Wed Nov 20 12:00:47 2024 -0500 Add --modify-fds=[no|high] option Normally a newly recreated file descriptor gets the lowest number available. This might cause old file descriptor numbers to be reused and hides bad file descriptor accesses (because the old number is new again). When enabled, when the program opens a new file descriptor, the highest available file descriptor is returned instead of the lowest one. Add the none/tests/track_new.stderr.exp test to test this new option. Adjust none/tests/filter_fdleak to filter the track_new.vgtest, removing some internal glibc functions from the backtraces and remove symbol versioning. The output of the use_after_close test also had to be adjusted. Also adjust the none/tests/cmdline1 and none/tests/cmdline2 output as the new --modify-fds=no|high is displayed. https://bugs.kde.org/show_bug.cgi?id=493433