Bug 311649

Summary: exhausting fds can leak created files
Product: [Developer tools] valgrind Reporter: Brian <bfallik>
Component: memcheckAssignee: Mark Wielaard <mark>
Status: ASSIGNED ---    
Severity: normal CC: ahajkova, mark, tom
Priority: NOR    
Version First Reported In: 3.6.0   
Target Milestone: ---   
Platform: Fedora RPMs   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: test case
Add VG_(unlinkat) and use that (or VG_(unlink)) to remove a failing creat or open O_CREAT | O_EXCL call

Description Brian 2012-12-13 18:56:50 UTC
According to [1], valgrind reserves 10 fds for its own use.  If a process executes syscall which returns a fd in this range, valgrind intercepts the call, outputs  
  Warning: invalid file descriptor 1029 in syscall socket()
and returns an error and sets errno appropriately (to EMFILE).  However, if the syscall has a side effect, like open(..., O_CREAT, ...), the created file does not get removed.

This breaks our unit tests which are trying to verify the EMFILE path but are failing since the file is being created. 

This is on Fedora 16.
  $ rpm -q valgrind
  valgrind-3.6.1-6.fc16.x86_64


1 - http://comments.gmane.org/gmane.comp.debugging.valgrind/12380

Reproducible: Always
Comment 1 Julian Seward 2013-03-03 14:22:36 UTC
Hmm, can you supply a standalone testcase that illustrates the problem?
Comment 2 Brian 2013-03-04 17:00:17 UTC
Created attachment 77750 [details]
test case

Below is a sample session demonstrating the difference:
$ ./311649 
Reached limit at 1021
open(27121.tmp, O_CREAT, 0)
$ ls 27121.tmp,
ls: cannot access 27121.tmp,: No such file or directory
$ valgrind ./311649 
==27147== Memcheck, a memory error detector
==27147== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==27147== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==27147== Command: ./311649
==27147== 
==27147== Warning: invalid file descriptor 1031 in syscall socket()
Reached limit at 1021
==27147== Warning: invalid file descriptor 1031 in syscall open()
open(27147.tmp, O_CREAT, 0)
==27147== 
==27147== HEAP SUMMARY:
==27147==     in use at exit: 0 bytes in 0 blocks
==27147==   total heap usage: 11 allocs, 11 frees, 8,188 bytes allocated
==27147== 
==27147== All heap blocks were freed -- no leaks are possible
==27147== 
==27147== For counts of detected and suppressed errors, rerun with: -v
==27147== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
$ ls 27147.tmp
27147.tmp

When run under valgrind the created file exists even though the program believes open() failed.
Comment 3 Tom Hughes 2013-03-04 17:58:37 UTC
I don't think there is any safe way for us to fix that, because we have to actually do the open call in order to find out what file descriptor it will get, but there is then no way for us to know when we are rejecting the descriptor whether the file already existed or whether the open call created it.

Checking for existence before making the call is racy because somebody else could jump in and create it.

Note that for creat, and open with OCREAT|O_EXCL, it would be possible for us to remove the file again when we reject the descriptor.
Comment 4 Julian Seward 2013-09-20 11:08:50 UTC
Shall we close this as WONTFIX, then?
Comment 5 Mark Wielaard 2024-09-20 16:45:46 UTC
Created attachment 173914 [details]
Add VG_(unlinkat) and use that (or VG_(unlink)) to remove a failing creat or open O_CREAT | O_EXCL call

(In reply to Tom Hughes from comment #3)
> Checking for existence before making the call is racy because somebody else
> could jump in and create it.
> 
> Note that for creat, and open with OCREAT|O_EXCL, it would be possible for
> us to remove the file again when we reject the descriptor.

Here is a patch that does that (note the is also openat2 which hasn't been hooked up yet).
Should we do this? It wouldn't actually fix this report because it is an O_CREAT without O_EXCL so we aren't sure we were the one creating the file, so wouldn't even with this patch.
Comment 6 Alexandra Hajkova 2024-10-16 13:28:05 UTC
We cannot fix this bug in the general case of creat () as we we don't know if the file already existed or not.   The patch might work for for open () or openat () with mode O_CREAT | O_EXCL. creat () doesn't take flags and so cannot use O_EXCL.
Comment 7 Mark Wielaard 2024-10-31 23:44:12 UTC
(In reply to Alexandra Hajkova from comment #6)
> We cannot fix this bug in the general case of creat () as we we don't know
> if the file already existed or not.   The patch might work for for open ()
> or openat () with mode O_CREAT | O_EXCL. creat () doesn't take flags and so
> cannot use O_EXCL.

Thanks. So creat () is also unfixable. Sigh.
Lets not "fix" the rest just before 3.24.0 release.
Maybe after? But is it really worth it?