Bug 360752 - raise the number of reserved fds in m_main.c from 10 to 12
Summary: raise the number of reserved fds in m_main.c from 10 to 12
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.12 SVN
Platform: Compiled Sources Solaris
: NOR normal
Target Milestone: ---
Assignee: Ivo Raisr
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-19 21:42 UTC by Ivo Raisr
Modified: 2016-03-24 06:25 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
proposed patch (1.03 KB, patch)
2016-03-19 21:51 UTC, Ivo Raisr
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ivo Raisr 2016-03-19 21:42:56 UTC
Recently test gdbserver_tests/nlpasssigalrm started to fail intermittently on Solaris 11.3 SRU
with the following stack trace:

valgrind: m_libcfile.c:68 (vgPlain_safe_fd): Assertion 'newfd >= VG_(fd_hard_limit)' failed.

host stacktrace:
==21159==    at 0x38108BC8: show_sched_status_wrk (m_libcassert.c:343)
==21159==    by 0x38108CE4: report_and_quit (m_libcassert.c:415)
==21159==    by 0x38108E71: vgPlain_assert_fail (m_libcassert.c:481)
==21159==    by 0x381099E8: vgPlain_safe_fd (m_libcfile.c:68)
==21159==    by 0x380F3F71: vgSysWrap_solaris_sys_forksys_before (syswrap-solaris.c:6405)
==21159==    by 0x380CFDA7: vgPlain_client_syscall (syswrap-main.c:1906)
==21159==    by 0x380CCB12: handle_syscall (scheduler.c:1118)
==21159==    by 0x380CDFB6: vgPlain_scheduler (scheduler.c:1435)
==21159==    by 0x380DD573: run_a_thread_NORETURN (syswrap-solaris.c:135)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 1)
==21159==    at 0x7FFF2303B: vfork (in /lib/amd64/libc.so.1)
==21159==    by 0x7FFEED6E9: system (in /lib/amd64/libc.so.1)
==21159==    by 0x4013FF: main (passsigalrm.c:52)

Investigation showed the problem is failing VG_(fcntl)(VKI_F_DUPFD) which returned
EMFILE (too many open files). And indeed, the file descriptor soft limit was one less.
Simulation of Solaris vfork() puts quite a strain on Valgrind core because a pipe needs to be
created and its file descriptors duplicated and closed via VG_(safe_fd)().

The fix is straigtforward - simple increase N_RESERVED_FDS from 10 to 12. This is sufficient
for all tests.
Comment 1 Ivo Raisr 2016-03-19 21:51:04 UTC
Created attachment 97984 [details]
proposed patch
Comment 2 Ivo Raisr 2016-03-21 20:43:01 UTC
Here is a summary of file descriptors owned specifically by Valgrind
during gdbserver_tests/nlpasssigalrm test execution, when fork_sys syscall wrapper is executing.

256: S_IFREG: .../valgrind/gdbserver_tests/passsigalrm
257: S_IFREG: fake /proc/<pid>/cmdline
258: S_IFREG: fake /proc/<pid>/auxv
259: S_IFREG: fake /proc/<pid>/psinfo
260: S_IFCHR: /dev/pts/1
261+262: S_IFIFO: big lock pipe
263: S_IFIFO: .../valgrind/vgdb-prefix-nlpasssigalrm-from-vgdb-to-3782-by-tester1-on-???
264: S_IFIFO: .../valgrind/vgdb-prefix-nlpasssigalrm-to-vgdb-from-3782-by-tester1-on-???
265+266: S_IFIFO: pipe to a child process during vfork()

Full list:
 256: S_IFREG mode:0755 dev:279,65545 ino:92562 uid:101 gid:10 size:15600
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /export/home/tester1/valgrind/gdbserver_tests/passsigalrm
      offset:0
 257: S_IFREG mode:0600 dev:587,2 ino:51603298 uid:101 gid:10 size:30
      O_RDWR|O_CREAT|O_TRUNC|O_EXCL|O_LARGEFILE FD_CLOEXEC
 258: S_IFREG mode:0600 dev:587,2 ino:51603242 uid:101 gid:10 size:160
      O_RDWR|O_CREAT|O_TRUNC|O_EXCL|O_LARGEFILE FD_CLOEXEC
 259: S_IFREG mode:0600 dev:587,2 ino:51559981 uid:101 gid:10 size:0
      O_RDWR|O_CREAT|O_TRUNC|O_EXCL|O_LARGEFILE FD_CLOEXEC
 260: S_IFCHR mode:0620 dev:582,0 ino:41169506 uid:101 gid:7 rdev:182,1
      O_RDWR|O_NOCTTY|O_LARGEFILE FD_CLOEXEC
      /dev/pts/1
 261: S_IFIFO mode:0000 dev:579,0 ino:3266 uid:101 gid:10 size:0
      O_RDWR FD_CLOEXEC
 262: S_IFIFO mode:0000 dev:579,0 ino:3266 uid:101 gid:10 size:0
      O_RDWR FD_CLOEXEC
 263: S_IFIFO mode:0600 dev:279,65545 ino:90549 uid:101 gid:10 size:0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC
      /export/home/tester1/valgrind/vgdb-prefix-nlpasssigalrm-from-vgdb-to-3782-by-tester1-on-???
 264: S_IFIFO mode:0600 dev:279,65545 ino:90548 uid:101 gid:10 size:0
      O_WRONLY|O_LARGEFILE FD_CLOEXEC
      /export/home/tester1/valgrind/vgdb-prefix-nlpasssigalrm-to-vgdb-from-3782-by-tester1-on-???
 265: S_IFIFO mode:0000 dev:579,0 ino:3267 uid:101 gid:10 size:0
      O_RDWR FD_CLOEXEC
 266: S_IFIFO mode:0000 dev:579,0 ino:3267 uid:101 gid:10 size:0
      O_RDWR FD_CLOEXEC

So you can see 11 of them are currently used. Adding one to make them 12 reserved is a precaution for future.
Comment 3 John Reiser 2016-03-23 18:34:53 UTC
Thank you for the explicit list; now I see why 12 fd is appropriate.

It seems to me that /proc/pid/{cmdline,auxv,psinfo} are fd leaks.  The information in those pseudo-files never changes, so the fd could be closed immediately after reading the whole file.
Comment 4 Ivo Raisr 2016-03-23 20:57:20 UTC
I am glad you found the list useful.

However /proc/pid/{cmdline,auxv,psinfo} are *not* fd leaks. 
See this comment from m_main.c:
   //--------------------------------------------------------------
   // create fake /proc/<pid>/cmdline and /proc/<pid>/auxv files
   // and then unlink them, but hold onto the fds, so we can hand
   // them out to the client when it tries to open
   // /proc/<pid>/cmdline or /proc/<pid>/auxv for itself.

and the following one for example from syswrap-linux.c:
   /* Handle the case where the open is of /proc/self/cmdline or
      /proc/<pid>/cmdline, and just give it a copy of the fd for the
      fake file we cooked up at startup (in m_main).  Also, seek the
      cloned fd back to the start. */

While it is true that the information never changes for cmdline and auxv, it is changed
for psinfo (see syswrap-solaris.c). Every {cmdline,auxv,psinfo} open syscall also needs
to get its own file descriptor. Because kernel associates file position with the fd (imagine
multiple threads opening the same file).
We do not want to keep cooked files stored on the disk because it would be hard to clean
them up under various Valgrind exit/abort/... scenarios. Therefore the files are created, cooked and immediately unlinked. The only reference to them are VG_(cl_*_fd) fds.
Comment 5 John Reiser 2016-03-23 21:50:12 UTC
For psinfo, I see that it changes, so I guess that needs a permanent fd.

For cmdline and auxv, it still looks to me as if the separate fd could be avoided nearly all the time, because the vast majority of processes never refer to those files.  Cook the data into a memory block, and create the file (and fd) from the memory block only upon an actual user open().  Of course for a process that does such an open, then the fd is needed.  But it would not be needed across execve(), for instance, so the maximum number of simultaneous safe fd might be reduced by non-overlapping lifetimes.  This might be a future optimization.  For now, just apply the patch.
Comment 6 Ivo Raisr 2016-03-24 06:25:24 UTC
Fixed in SVN r15835.