| Summary: | Login VERY slow due to cleanup_fds | ||
|---|---|---|---|
| Product: | [Frameworks and Libraries] frameworks-kinit | Reporter: | Reimar Döffinger <kde> |
| Component: | general | Assignee: | David Faure <faure> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | kdelibs-bugs-null, nate, ossi |
| Priority: | NOR | ||
| Version First Reported In: | 5.51.0 | ||
| Target Milestone: | --- | ||
| Platform: | Debian unstable | ||
| OS: | Linux | ||
| Latest Commit: | 26620aef0bd6d01b543e7523dd15dddc1bb871df | Version Fixed/Implemented In: | 5.54.0 |
| Sentry Crash Report: | |||
After some thinking I am a bit concerned that this code might be quite buggy anyway. It seems to me it would be possible for a program to open thousands of files and the set the hard ulimit to 10 (as far as I can tell the ulimit is only checked on open), and this code would fail to close all files. While non-portable, the proc/self/fd based approach seems more reliable on Linux, and even on other OS there probably should be a check to not allow a maxfd value below some of the traditional limits like 1000, as that might indicate someone trying to trick kdeinit into failing to close all fds (not sure there is any way to actually misuse this though). I just pushed Ossi's fix for this https://commits.kde.org/kinit/26620aef0bd6d01b543e7523dd15dddc1bb871df Thanks, that patch should fix the immediate issue. It still leaves the issue that the code doesn't explain its requirements and assumptions, so I can't really say if it is correct at all. But it seems to me that even after this patch, the code will not be correct. The RLIMIT_NOFILE documentation says: "This specifies a value one greater than the maximum file descriptor number that can be opened by this process" Note the "can be OPENED". It doesn't restrict which FDs might already BE open, so it seems to this code works at all only if we are lucky enough that nobody ever reduces this limit... And we'd still get the same hangs if someone were to set a very large soft limit... I tried to research the options there are, and came up with the following: 1) Where available, and if the OOM FD is not used, use closefrom() - solves things on OpenBSD etc. 2) Iterate over /dev/fd - this is better than /proc/self/fd as it is also available on e.g. FreeBSD it seems. 3) (possibly redundant) iterate over /proc/self/fd if none of the previous worked/were an option 4) Fallback to something that at a minimum closes the first 1000 FDs (possibly a modified version of the current code). Maybe just always iterate over the first 1000 FDs just to be safe (e.g. in case /dev/fd somehow is misconfigured and contains a static set) this code is only ever relevant if a not entirely well-behaved process (which doesn't use FD_CLOEXEC) launches kdeinit directly. i'm not even sure when that would happen. Thank you for reporting this issue in KDE software. As it has been a while since this issue was reported, can we please ask you to see if you can reproduce the issue with a recent software version? If you can reproduce the issue, please change the status to "REPORTED" when replying. Thank you! i think it's fair to claim that this is fixed. whether that code should be improved further can be debated in another issue, but i don't think there would be much point to it. |
SUMMARY Logging in takes over 1 minute, with kdeinit5 using 100% CPU. This happens if the HARD limit on number of file descriptors is set to a very high value (> 1 million on my machine - I am not aware of ever changing that, so it must somehow be the default). STEPS TO REPRODUCE 1. Ensure ulimit -n -H is set to millions for some user account 2. log in with that user in SDDM OBSERVED RESULT Login takes at least 1 minute or more, depending on the ulimit value EXPECTED RESULT Login should be immediate. SOFTWARE/OS VERSIONS KDE Frameworks Version: 5.51.0-1 (Debian unstable) ADDITIONAL INFORMATION The problematic code is in kinit.cpp: static void cleanup_fds() { int maxfd = FD_SETSIZE; struct rlimit rl; if (getrlimit(RLIMIT_NOFILE, &rl) == 0) { maxfd = rl.rlim_max; } for (int fd = 3; fd < maxfd; ++fd) { #if KDEINIT_OOM_PROTECT if (fd != oom_pipe) #endif close(fd); } } I believe a good first step would be to check if rlimit_cur could safely be used instead of rlimit_max. If not, it would be a good idea to either 1) ensure kdeinit5 does not block the whole login process 2) SDDM either sets this limit to a reasonable value itself, or at least warns the user about it, so that a somewhat "normal" user has a chance to fix it without having to break out GDB to figure out where the hang comes from 3) Do this in a more clever way, e.g. check /proc/self/fd (maybe there is even a programmatic interface?) for which fds are even in use. There probably are programs that have implemented this in a better way? Note that the spectre/meltdown fixes probably made this issue FAR more noticeable.