Bug 378268 - ksysguardd md-sensor (raid) eats filehandles, quits when 1023 are used up.
Summary: ksysguardd md-sensor (raid) eats filehandles, quits when 1023 are used up.
Status: RESOLVED FIXED
Alias: None
Product: ksysguard
Classification: Unmaintained
Component: ksysguardd (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: KSysGuard Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-30 09:57 UTC by reisenweber
Modified: 2018-06-22 08:54 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
strace log of last two pipe() and clone() calls before +++ exited with 1 +++ (40.93 KB, text/plain)
2017-03-30 09:57 UTC, reisenweber
Details
strace this time -f to trace child (41.90 KB, text/plain)
2017-03-30 10:10 UTC, reisenweber
Details

Note You need to log in before you can comment on or make changes to this bug.
Description reisenweber 2017-03-30 09:57:13 UTC
Created attachment 104811 [details]
strace log of last two pipe() and clone() calls before +++ exited with 1 +++

pipe(0x7fff1904da90)                    = -1 EMFILE (Too many open files)

[2017-03-30 Thu 11:35:12] <DocScrutinizer05> hmm, obviously ksysguardd cteates a pipe (2 filehandles), then clones a child, closes write side of pipe, tries reading from end of pipe, and eventually does same again, while read side of pipe stays open
[2017-03-30 Thu 11:37:21] <DocScrutinizer05> I guess the ksysguardd parent eventually should close its side of the pipe, no?
[2017-03-30 Thu 11:38:19] <DocScrutinizer05> I can't think of it really intentionally trying to spawn >1023 children with a pipe to each one
[2017-03-30 Thu 11:39:17] <DocScrutinizer05> heck, on some systems the process table isn't large enough for 1024 processes ;-)
[2017-03-30 Thu 11:44:01] <DocScrutinizer05> http://paste.opensuse.org/4123994  there's no `close(1021)` ever, after the `pipe([1021, 1022])`
[2017-03-30 Thu 11:48:19] <DocScrutinizer05> so of course next sequence looks like http://paste.opensuse.org/46086581
Comment 1 reisenweber 2017-03-30 10:10:45 UTC
Created attachment 104812 [details]
strace this time -f to trace child

straced child too. 
htop shows only one thread for ksysguardd, so there are no 1000 children holding pipe handles
Comment 2 Christoph Feck 2017-03-30 10:15:01 UTC
Please set the Version field.
Comment 3 reisenweber 2017-03-30 10:23:36 UTC
root culprit might be
[pid 22653] execve("/sbin/mdadm", ["mdadm", "--detail", "/dev/md0"], [/* 96 vars */]) = 0
...
[pid 22653] write(2, "mdadm: must be super-user to per"..., 49) = -1 EBADF (Bad file descriptor)
Comment 4 reisenweber 2017-03-30 10:26:14 UTC
(In reply to Christoph Feck from comment #2)
> Please set the Version field.

jr@saturn:~> ksysguardd --version
ksysguardd 4

jr@saturn:~> ksysguard --version
Qt: 4.8.5
KDE Development Platform: 4.11.5
System Monitor: 4.11.12
Comment 5 reisenweber 2017-03-30 10:44:04 UTC
I see 3 bugs:
 - parent thread not closing pipe read handle - probably when child thread doesn't responf as expected
- child thread trying to run mdadm without sufficient permissions
- child thread trying to write to incorrect or already closed filehandle
Comment 6 reisenweber 2017-04-01 14:50:41 UTC
confirmed to be related to MD sensor in sheet, without any MD sensors the problem doesn't show
Comment 7 Christoph Feck 2017-04-21 19:54:42 UTC
Thanks for the analysis. My understanding of pipes and their use as the standard input/output channels is limited, but from following the code and your comments, it really looks like there is an FD leak bug.

Do you think you could try creating a patch? We currently have no ksysguard maintainer.
Comment 8 reisenweber 2017-04-21 20:45:56 UTC
(In reply to Christoph Feck from comment #7)
> Do you think you could try creating a patch? We currently have no ksysguard
> maintainer.

Sorry, not experienced as maintainer, not used to the coding style either. Actually I had no code at hand to analyue, just concluded from strace.
I'd *guess* running ksysguardd (or mdadm[wraper]) with the right permissions (suid? yes I know it's deprecated), or setting the right /proc/* and /dev/* file permissions/group that mdadm needs,  might fix the issue. The FD leak is just a sleeping bug fallout of the initial issue of mdadm throwing permissions error AIUI
Comment 9 Fabian Vogt 2018-06-22 08:54:25 UTC
Git commit d0287c1dea39f5d3b8993ddfc38e483a048a4333 by Fabian Vogt.
Committed on 22/06/2018 at 08:54.
Pushed by fvogt into branch 'Plasma/5.12'.

Fix leak of pipe FDs in MD RAID code

Summary:
Use pipe2 with O_CLOEXEC to not leak FDs and close the pipe completely after reading.

Test Plan:
Did not break anything here, but I don't have MD RAID.
According to the reporter this fixed the FD leak issue.

Reviewers: #plasma, mart

Reviewed By: #plasma, mart

Subscribers: plasma-devel

Tags: #plasma

Differential Revision: https://phabricator.kde.org/D13664

M  +4    -1    ksysguardd/Linux/softraid.c

https://commits.kde.org/ksysguard/d0287c1dea39f5d3b8993ddfc38e483a048a4333