Bug 493433 - Add a new fds only mode to --track-fds (--modify-fds)
Summary: Add a new fds only mode to --track-fds (--modify-fds)
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (other bugs)
Version First Reported In: 3.22 GIT
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Alexandra Hajkova
URL:
Keywords:
Depends on:
Blocks: 502359
  Show dependency treegraph
 
Reported: 2024-09-21 12:17 UTC by Mark Wielaard
Modified: 2025-04-16 14:58 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
patch (12.92 KB, patch)
2025-02-13 11:35 UTC, Alexandra Hajkova
Details
Emit pp_ExeContext after the file descriptor backtrace (2.01 KB, patch)
2025-03-03 11:47 UTC, Alexandra Hajkova
Details
patch (16.41 KB, patch)
2025-03-12 13:38 UTC, Alexandra Hajkova
Details
Add --modify-fds=[no|high] option (17.39 KB, text/plain)
2025-04-16 12:29 UTC, Mark Wielaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2024-09-21 12:17:21 UTC
Normally a newly recreated file descriptor gets the lowest available number.  This might cause old file descriptor numbers to be reused and hides bad file descriptor accesses (because the old number is new again).

It would be nice to have a mode where valgrind makes sure new file descriptors always get a new, not yet used before number. This might be implemented by making record_fd_open* return a new file descriptor (by dupping the given file descriptor only a higher never used before number and closing the one returned from the kernel) and returning that to the application.

We might have to make an exception for file descriptors 0, 1, 2 which programs might explicitly want to reassign.
Comment 1 Mark Wielaard 2024-11-06 15:58:27 UTC
This new mode (--track-fds=new ?) would help catch issues like the following:

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int
main ()
{
  int oldfd = open ("foobar.txt", O_RDWR|O_CREAT, S_IRUSR | S_IWUSR);
  printf ("got foobar.txt as: %d\n", oldfd);
  /*... do something with oldfd ...*/
  close (oldfd);

  /* Lets open another file... */
  int newfd = open ("foobad.txt", O_RDWR|O_CREAT, S_IRUSR | S_IWUSR);
  printf ("got foobad.txt as: %d\n", newfd);
  /* ... oops we are using the wrong fd (but same number...) */
  dprintf (oldfd, "some new text\n");

  close (newfd);
  return 0;
}
Comment 2 Alexandra Hajkova 2025-02-13 11:35:58 UTC
Created attachment 178266 [details]
patch

 Add --modify-fds=[no|high] option
    
    Normally a newly recreated file descriptor gets the lowest available
    number.  This might cause old file descriptor numbers to be reused
    and hides bad file descriptor accesses (because the old number is
    new again).
    
    When enabled, when the program opens a new file descriptor,
    the highest available file descriptor is returned instead of the
    lowest one.
    
    Add the none/tests/track_new.stderr.exp test to test this
    new option.
Comment 3 Mark Wielaard 2025-02-13 12:41:20 UTC
Nitpick:

 Bool   VG_(clo_run_cxx_freeres) = True;
 UInt   VG_(clo_track_fds)      = 0;
+UInt   VG_(clo_modify_fds)      = 0;
 Bool   VG_(clo_show_below_main)= False;
 Bool   VG_(clo_keep_debuginfo) = False;

The = signs don't line up.

+   (not yet seem before) file descriptor.  */

s/seem/seen/

@@ -5897,12 +5903,13 @@ PRE(sys_openat)
 POST(sys_openat)
 {
    vg_assert(SUCCESS);
+   POST_newFd_RES;
    if (!ML_(fd_allowed)(RES, "openat", tid, True)) {
       VG_(close)(RES);
       SET_STATUS_Failure( VKI_EMFILE );
    } else {
       if (VG_(clo_track_fds))
-         ML_(record_fd_open_with_given_name)(tid, RES, (HChar*)(Addr)ARG2);
+        ML_(record_fd_open_with_given_name)(tid, RES, (HChar*)(Addr)ARG2);
    }
 }

Unnecessary (and wrong) whitespace indentation change.


+extern UInt  VG_(clo_modify_fds);

Needs comment what the values are/mean.

--- /dev/null
+++ b/none/tests/track_new.stderr.exp
@@ -0,0 +1,12 @@
+File descriptor 1020 was closed already
+ Previously closed
+   at 0x........: close (in /...libc...)
+   by 0x........: main (track_new.c:11)

That (1020) seems an oddly specific fd number.

+ Originally opened
+   at 0x........: open (in /...libc...)
+   by 0x........: main (track_new.c:8)
+   at 0x........: write (in /...libc...)
+   by 0x........: __printf_buffer_flush_dprintf (in /...libc...)
+   by 0x........: __vdprintf_internal (in /...libc...)
+   by 0x........: dprintf (in /...libc...)
+   by 0x........: main (track_new.c:17)

The __ functions might be different on different glibc versions/arches?

--- /dev/null
+++ b/none/tests/track_new.stdout.exp
@@ -0,0 +1,2 @@
+got foobar.txt as: 1020
+got foobad.txt as: 1019

Again oddly specific fd numbers.

Given the new option/argument I would have expected some updates to the ./none/tests/cmdline*vgtests?

There should probably some documentation that this is against POSIX which guarantees a new file descriptor is always the lowest possible number.

POST_newFd_RES works for those syscalls that return the new fd as result (RES) of the syscall.
What about other syscalls/fcntls that return a new fd some other way?

grepping for record_fd_open I think I spot some syscalls missing:
sys_inotify_init, sys_inotify_init1, sys_mq_open, timerfd_create, pipe[2], ...
pipe2 is interesting since it returns 2 fds.

Should all these other syscalls also hava fd adjustments?
Comment 4 Alexandra Hajkova 2025-03-03 11:47:19 UTC
Created attachment 179068 [details]
Emit pp_ExeContext after the file descriptor backtrace
Comment 5 Mark Wielaard 2025-03-05 15:18:08 UTC
(In reply to Alexandra Hajkova from comment #4)
> Created attachment 179068 [details]
> Emit pp_ExeContext after the file descriptor backtrace

Thanks that looks good. Pushed to git trunk and the VALGRIND_3_24_BRANCH since it might be a small bug, the output is really confusing.

commit 838dc01d2c42f7f22785c771bd74bc4e595da444
Author: Alexandra Hájková <ahajkova@redhat.com>
Date:   Mon Mar 3 06:14:08 2025 -0500

    syswrap-generic: Emit pp_ExeContext after the file descriptor backtrace
    
    Adjust use_after_close test for the change.

VALGRIND_3_24_BRANCH
commit ec7335142384ec9da66871036803b96319b590eb
Author: Alexandra Hájková <ahajkova@redhat.com>
Date:   Mon Mar 3 06:14:08 2025 -0500

    syswrap-generic: Emit pp_ExeContext after the file descriptor backtrace
    
    Adjust use_after_close test for the change.
    
    (cherry picked from commit 838dc01d2c42f7f22785c771bd74bc4e595da444)
Comment 6 Alexandra Hajkova 2025-03-12 13:38:09 UTC
Created attachment 179341 [details]
patch

Add --modify-fds=[no|high] option
    
    Normally a newly recreated file descriptor gets the lowest number
    available. This might cause old file descriptor numbers to be reused
    and hides bad file descriptor accesses (because the old number is
    new again).
    
    When enabled, when the program opens a new file descriptor,
    the highest available file descriptor is returned instead of the
    lowest one.
    
    Add the none/tests/track_new.stderr.exp test to test this
    new option.
    use_after_close
    Adjust none/tests/filter_fdleak to filter the track_new.vgtest,
    because of the fdleak changes, the output of the use_after_close
    test also had to be adjusted. ALso adjust the none/tests/cmdline1
    and none/tests/cmdline2 output as the new --modify-fds=no|high is
    displayed.
    
    https://bugs.kde.org/show_bug.cgi?id=493433
Comment 7 Mark Wielaard 2025-03-27 14:02:57 UTC
There was a try build here:
https://builder.sourceware.org/buildbot/#/changes/70972

With the following issues:

- valgrind-try-fedora-x86_64

--- track_new.stderr.exp	2025-03-13 13:11:28.926443833 +0000
+++ track_new.stderr.out	2025-03-13 13:24:27.514375927 +0000
@@ -1,4 +1,4 @@
-File descriptor 1020 was closed already
+File descriptor 20464 was closed already
    at 0x........: write (in /...libc...)
    by 0x........: __printf_buffer_flush_dprintf (in /...libc...)
    by 0x........: __vdprintf_internal (in /...libc...)

- valgrind-try-fedora-arm64

track_new PASS

- valgrind-try-debian-ppc64

--- track_new.stderr.exp	2025-03-13 13:33:28.997830006 +0000
+++ track_new.stderr.out	2025-03-13 14:05:32.273155198 +0000
@@ -1,12 +1,12 @@
 File descriptor 1020 was closed already
-   at 0x........: write (in /...libc...)
-   by 0x........: __printf_buffer_flush_dprintf (in /...libc...)
-   by 0x........: __vdprintf_internal (in /...libc...)
-   by 0x........: dprintf (in /...libc...)
+   at 0x........: write (write.c:26)
+   by 0x........: __printf_buffer_flush_dprintf (iovdprintf.c:49)
+   by 0x........: __vdprintf_internal (iovdprintf.c:72)
+   by 0x........: dprintf@@GLIBC_2.4 (dprintf.c:30)
    by 0x........: main (track_new.c:15)
  Previously closed
-   at 0x........: close (in /...libc...)
+   at 0x........: close (close.c:27)
    by 0x........: main (track_new.c:10)
  Originally opened
-   at 0x........: open (in /...libc...)
+   at 0x........: open (open64.c:41)
    by 0x........: main (track_new.c:8)

- valgrind-try-fedora-s390x

--- track_new.stderr.exp	2025-03-13 13:39:02.530937442 +0000
+++ track_new.stderr.out	2025-03-13 13:50:56.870871483 +0000
@@ -2,7 +2,7 @@
    at 0x........: write (in /...libc...)
    by 0x........: __printf_buffer_flush_dprintf (in /...libc...)
    by 0x........: __vdprintf_internal (in /...libc...)
-   by 0x........: dprintf (in /...libc...)
+   by 0x........: dprintf@@GLIBC_2.4 (in /...libc...)
    by 0x........: main (track_new.c:15)
  Previously closed
    at 0x........: close (in /...libc...)
Comment 8 Mark Wielaard 2025-03-27 14:18:23 UTC
- valgrind-try-fedora-ppc64le

--- track_new.stderr.exp	2025-03-13 14:19:50.376095914 +0000
+++ track_new.stderr.out	2025-03-13 14:50:59.480715273 +0000
@@ -2,7 +2,7 @@
    at 0x........: write (in /...libc...)
    by 0x........: __printf_buffer_flush_dprintf (in /...libc...)
    by 0x........: __vdprintf_internal (in /...libc...)
-   by 0x........: dprintf (in /...libc...)
+   by 0x........: __dprintfieee128 (in /...libc...)
    by 0x........: main (track_new.c:15)
  Previously closed
    at 0x........: close (in /...libc...)
Comment 9 Mark Wielaard 2025-03-27 14:23:34 UTC
fedora-riscv
track_new:       valgrind   -q --track-fds=yes --modify-fds=high ./track_new
PASS
Comment 10 Alexandra Hajkova 2025-03-27 14:42:06 UTC
I've regression tested on x86_64 with Fedora 40 and Fedora 41 and the results were absolutely clean:
== 839 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
Comment 11 Alexandra Hajkova 2025-03-27 14:44:14 UTC
I've regression tested on aarch64 with Fedora 40 and Fedora 41 and there were no regressions but there were some failures:
gdbserver_tests/hgtls                    (stdoutB)		
memcheck/tests/dw4                       (stderr)		
memcheck/tests/varinfo2                  (stderr)		
memcheck/tests/varinfo4                  (stderr)		
memcheck/tests/varinfo5                  (stderr)		
memcheck/tests/varinfo6                  (stderr)		
memcheck/tests/varinforestrict           (stderr)		
helgrind/tests/hg05_race2                (stderr)		
helgrind/tests/tc20_verifywrap           (stderr)
Comment 12 Mark Wielaard 2025-04-13 00:29:57 UTC
Looking through the results of some of the try builds it looks there are three kind of issues with matching the backtraces.
(The testsuite backtrace matching is sadly somewhat flaky in general).

First there are internal symbols in the c library because there are multiple calls between the dprintf and the system call.
Second there are sometimes versioned symbols.
Third the debian (32bit) arches seem to have a totally different call chain (and system call) than the other fedora 64bit ones.

I would try to use stderr_filter: filter_fdleak in none/tests/track_new.vgtest
And completely filter out any frames with internal symbols starting with _.

# Remove "internal" _functions
sed '/by 0x........: _/d' |

Then remove all symbol versions (those with @ and @@)

# Remove @version symbols
sed -E 's/ ([a-zA-Z0-9_]+)@@?[A-Z0-9._]+/ \1/' |

Finally try replacing the new possible debuginfo (source:line) with in (/...libc...)

perl -p -e "s/\(dprintf.c:[0-9]*\)/(in \/...libc...)/" |
perl -p -e "s/\(open.c:[0-9]*\)/(in \/...libc...)/" |
perl -p -e "s/\(lseek(?:64)?.c:[0-9]*\)/(in \/...libc...)/" |

Then have simpler .exp files (one for "regular" and one for "debian 32bit":

== none/tests/track_new.stderr.exp ==
File descriptor was closed already
   at 0x........: write (in /...libc...)
   by 0x........: dprintf (in /...libc...)
   by 0x........: main
 Previously closed
   at 0x........: close (in /...libc...)
   by 0x........: main
 Originally opened
   at 0x........: creat (in /...libc...)
   by 0x........: main

== none/tests/track_new.stderr.exp.debian32 ==
File descriptor was closed already
   at 0x........: llseek (in /...libc...)
   by 0x........: dprintf (in /...libc...)
   by 0x........: main
 Previously closed
   at 0x........: close (in /...libc...)
   by 0x........: main
 Originally opened
   at 0x........: creat (in /...libc...)
   by 0x........: main
Comment 13 Mark Wielaard 2025-04-16 12:29:59 UTC
Created attachment 180312 [details]
Add --modify-fds=[no|high] option

Testing patch taken from https://git.sr.ht/~sasshka/valgrind fds2 branch and adjusted slightly to make the tests pass on debian 32bit arches.
Comment 14 Mark Wielaard 2025-04-16 14:58:10 UTC
One last filter tweak for the fedora powerpc 64 le results:

diff --git a/none/tests/filter_fdleak b/none/tests/filter_fdleak
index 5fbd74bcf..2cc537df7 100755
--- a/none/tests/filter_fdleak
+++ b/none/tests/filter_fdleak
@@ -34,7 +34,7 @@ perl -p -e "s/: open \(/: creat (/" |
 # arm64 write resolved to file:line with debuginfo
 perl -p -e "s/write\.c:[1-9][0-9]*/in \/...libc.../" |
 #ppc64le
-sed 's/__dprintf.*/dprintf/' |
+sed 's/__dprintf.*/dprintf \(in \/...libc...\)/' |
 
 # Remove "internal" _functions
 sed '/by 0x........: _/d' |
Comment 15 Mark Wielaard 2025-04-16 14:58:54 UTC
commit 4c78562419ce2c9b6a21a3c9dfc9bf90638c9649
Author: Alexandra Hájková <ahajkova@redhat.com>
Date:   Wed Nov 20 12:00:47 2024 -0500

    Add --modify-fds=[no|high] option
    
    Normally a newly recreated file descriptor gets the lowest number
    available. This might cause old file descriptor numbers to be reused
    and hides bad file descriptor accesses (because the old number is
    new again).
    
    When enabled, when the program opens a new file descriptor,
    the highest available file descriptor is returned instead of the
    lowest one.
    
    Add the none/tests/track_new.stderr.exp test to test this new option.
    
    Adjust none/tests/filter_fdleak to filter the track_new.vgtest,
    removing some internal glibc functions from the backtraces and remove
    symbol versioning. The output of the use_after_close test also had to
    be adjusted. Also adjust the none/tests/cmdline1 and
    none/tests/cmdline2 output as the new --modify-fds=no|high is
    displayed.
    
    https://bugs.kde.org/show_bug.cgi?id=493433