Summary: | m_libcfile.c:66 (vgPlain_safe_fd): Assertion 'newfd >= VG_(fd_hard_limit)' failed. | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Corentin Noël <tintou> |
Component: | general | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | agilesoftware, mark, pjfloyd |
Priority: | NOR | ||
Version: | 3.20.0 | ||
Target Milestone: | --- | ||
Platform: | Debian stable | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Patch to replace assert with more helpful message and exit.
156134: Patch to replace assert with more helpful message and exit. better message if safe_fd fails |
Description
Corentin Noël
2023-02-07 17:58:43 UTC
A similar question was recently asked on SO: https://stackoverflow.com/questions/75292406/memchk-valgrind-reporting-inconsistent-results-in-different-docker-hosts Could you check that you have a sensible value for the limit of file descriptors? For instance on RHEL 7.6 I have a limit of 65535. For now I'm assuming it is the same problem. We're not going to change Valgrind to handle insane rlimits that Docker presents. I suggest that either you log a bug with Docker or else try using (r)limit to set the descriptor limit to something more sensible. Marking as NEEDSINFO Thank you for your answer that was really useful, it was indeed an issue with both Docker and Crosvm: Docker set a limit to 1048576 so I've added "--ulimit nofile=1024:4096" to the docker arguments to have decent default configuration. I've also had to modify Crosvm (more precisely minijail that it is using) by preventing it to change the nofile limit in https://github.com/google/minijail/blob/c30d299f93f94a4ee4b786d15e1fb194f8709b9f/libminijail.c#L2183 I hope it will be useful to anyone having the same issue. Well one thing that I could do is to change the assert into a message + exit. Something like Valgrind does not support very high file descriptor limits (e.g., as seen when running in Docker). Please consider lowering this limit with the shell built-in command limit command. with a bit more information on the fd requested and the hard limit that it is using. Also, if I upload a patch here will you be able to test it (clone the Valgrind source repo and build Valgrind)? (In reply to Paul Floyd from comment #5) > Also, if I upload a patch here will you be able to test it (clone the > Valgrind source repo and build Valgrind)? Sure I can do that Created attachment 156134 [details]
Patch to replace assert with more helpful message and exit.
Please could you try this and let me know if it is more helpful?
Created attachment 156143 [details]
156134: Patch to replace assert with more helpful message and exit.
Condition was wrong and I'd left in a True to check the message.
Comment on attachment 156143 [details]
156134: Patch to replace assert with more helpful message and exit.
Just tested the patch and I'm indeed getting a:
--88:0:libcfile Valgrind: FATAL: Duplicated file descriptor -1
is not in the range of reserved values starting at 1048564.
Valgrind does not support very high file descriptor limits
(e.g., as seen when running in Docker).
Please consider lowering this limit with the shell built-in
limit command.
--88:0:libcfile Exiting now.
In this case it seems that I'm getting an error on file descriptor duplication so having a errno/errmsg would also help understanding it a bit.
(In reply to Corentin Noël from comment #9) > In this case it seems that I'm getting an error on file descriptor > duplication so having a errno/errmsg would also help understanding it a bit. We can't provide anything from errno as we don't have it. Valgrind does not link with libc and makes system calls like SysRes res = VG_(do_syscall3)(__NR_fcntl, fd, cmd, arg); directly itself. I really think that the problem is with Docker. It's advertising some ridiculously high value for ulimit -n like 1048576. Valgrind wants to put its own files in the top 12 of those slots, and is trying to to a fcntl(oldfd, F_DUPFD, 1048576-12) - note that 1048576-12 matches the 1048564 that you get from the patch message. Then Docker fails to honour its promised file descriptor limit and the fcntl fails. man fcntl says there are 3 possible causes: negative oldfd (no) max nb file descriptors already open (no) and oldfd greater than the rlimit (I think that this is the case). Here is an example. #include <unistd.h> #include <fcntl.h> #include <stdio.h> #include <sys/time.h> #include <sys/resource.h> #include <errno.h> int main(void) { struct rlimit rlim; getrlimit(RLIMIT_NOFILE, &rlim); printf("nofile soft limit %ld hard limit %ld\n", (long)rlim.rlim_cur, (long)rlim.rlim_max); for (long i = 4; i <= rlim.rlim_cur; ++i) { int newFd = fcntl(1, F_DUPFD, i); if (newFd != -1) { close(newFd); } else { fprintf(stderr, "default soft limit failed to DUPFD %ld\n", i); perror(""); break; } } rlim.rlim_cur = rlim.rlim_max; setrlimit(RLIMIT_NOFILE, &rlim); for (long i = 4; i <= rlim.rlim_cur; ++i) { int newFd = fcntl(1, F_DUPFD, i); if (newFd != -1) { close(newFd); } else { fprintf(stderr, "maximum soft limit failed to DUPFD %ld\n", i); perror(""); break; } } } Compiling and running this on RHEL 7.6 I get nofile soft limit 65535 hard limit 65535 default soft limit failed to DUPFD 65535 Invalid argument maximum soft limit failed to DUPFD 65535 Invalid argument Could you try it and report back whether you also see the first failing dup being equal to the limits? (In reply to Paul Floyd from comment #10) > (In reply to Corentin Noël from comment #9) > > > In this case it seems that I'm getting an error on file descriptor > > duplication so having a errno/errmsg would also help understanding it a bit. > > We can't provide anything from errno as we don't have it. Valgrind does not > link with libc and makes system calls like > > SysRes res = VG_(do_syscall3)(__NR_fcntl, fd, cmd, arg); > > directly itself. We cannot use errno (which is indeed a glibc concept), but we can extract the kernel error from the SysRes with something like: if (sr_isError (res)) VG_(umsg) ("fcntl error %lu %s\n", sr_Err(res), VG_(strerror) (sr_Err(res))); Created attachment 156323 [details]
better message if safe_fd fails
Change message as it looks like failure rather than fd not in safe range.
Also add a debug log for fcntl.
*** Bug 196335 has been marked as a duplicate of this bug. *** commit e61a04babcc3de8c1c86638f8ccdb4ef1b74a4d0 (HEAD -> master, origin/master, origin/HEAD) Author: Paul Floyd <pjfloyd@wanadoo.fr> Date: Fri Feb 24 21:31:35 2023 +0100 bug465435 - m_libcfile.c:66 (vgPlain_safe_fd): Assertion 'newfd >= VG_(fd_hard_limit)' failed. |