Bug 423756

Summary: kio: cgroup creation race condition
Product: [Frameworks and Libraries] frameworks-kio Reporter: i.Dark_Templar <idarktemplar>
Component: generalAssignee: KIO Bugs <kio-bugs-null>
Status: REPORTED ---    
Severity: normal CC: james, kde, kdelibs-bugs, nate, techxgames
Priority: NOR    
Version: git master   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description i.Dark_Templar 2020-07-01 18:07:31 UTC
SUMMARY
Since cgroup is created in kio after process startup, there is a race condition present if started process forks and starts some children before cgroup creation is processed.

Currently, situation sometimes, depending on various circumstances, may be following:
1. kio starts new process via KProcessRunner class.
2. Started process runs and forks, creating one or more children.
3. kio creates new cgroup and puts started process to this new cgroup. All children created by started process stay in current cgroup.

STEPS TO REPRODUCE
1. Try starting via KProcessRunner class applications which fork soon after launch.
Modern firefox, for example.
If application forks and parent exits, it's even better.

OBSERVED RESULT
Sometimes created cgroup would contain only parent process without any it's children. If parent process quits, it may result even into an empty cgroup. Children of launched process would be left in the cgroup of parent of launched process. Should not be reliably reproducible.

EXPECTED RESULT
Cgroup should contain both started process and all it's children reliably.

ADDITIONAL INFORMATION
I did not try reproducing it.

Currently, cgroup created when process is already running, and if this process creates some children fast, it may lead to described issue. Here's the link to code:

https://invent.kde.org/frameworks/kio/-/blob/8d6b306f585920230acecd19903325f6f0387b8e/src/gui/kprocessrunner.cpp#L226

Function 'registerCGroup()' is called in 'KProcessRunner::slotProcessStarted()' slot, which is invoked after process started, within some undetermined timeframe.

In my opinion, proper way to start process in new cgroup in order to fix this issue looks like following:
1. KProcessRunner forks.
2. Fork sets up cgroup, puts itself into it, waits until it's confirmed that OS finished moving it into new cgroup. Optionally between forking and setting up cgroup, it may exec into some helper which would setup cgroup.
3. Fork (or helper used to setup cgroup, if one is used) execs into the new process. Since it's already running in new cgroup before exec, when new process is started it's already contained in new cgroup, and even if first thing it does is fork() call, all children would be contained in that new cgroup as well, without any race conditions.

AFAIK, systemd-run actually setups cgroup and reliably puts requested process and all it's children into newly created cgroup, but I might be wrong. Not sure which interfaces are available from systemd via dbus or some library for starting process similarly to systemd-run.
Comment 1 David Edmundson 2020-07-02 09:56:24 UTC
There is a theoretical race, yes. 

The landed patch was a somewhat "lite" and invasive version of introducing the concept.

What you describe is how systemd-run does it.

But there's a much simpler solution, we can just use startTransientService(execLine) instead of startTransientScope(somePid) then it all becomes someone else's problem.

See:

https://invent.kde.org/frameworks/kio/-/merge_requests/71
Comment 2 i.Dark_Templar 2020-07-04 09:12:11 UTC
(In reply to David Edmundson from comment #1)
> There is a theoretical race, yes. 
> 
> The landed patch was a somewhat "lite" and invasive version of introducing
> the concept.
> 
> What you describe is how systemd-run does it.
> 
> But there's a much simpler solution, we can just use
> startTransientService(execLine) instead of startTransientScope(somePid) then
> it all becomes someone else's problem.
> 
> See:
> 
> https://invent.kde.org/frameworks/kio/-/merge_requests/71

I've looked a bit at this merge request and it looks like it should fix this issue. And it should also allow to implement later something like CgroupV2ProcessRunner for systems with no systemd.
Comment 3 James Henstridge 2023-03-31 04:56:26 UTC
We've been seeing reports of problems launching snap applications on the Ubuntu 23.04 development release that seem to relate to this race condition. At first we thought it was only a gnome-shell problem, but some KDE users also encountered the problem and I discovered the same racy behaviour in kio. The Ubuntu bug is being tracked here:

https://bugs.launchpad.net/bugs/2011806

The race causes problems for snap applications because it uses cgroups as part of its sandboxing to control device access. The "snap run" launcher moves itself to a new cgroup via systemd's StartTransientUnit call, with some later code setting up the sandbox performing a sanity check to make sure the process is in the expected cgroup. If kio issues its own StartTransientUnit call after "snap run"'s one, the sanity check can fail with an error like:

/user.slice/user-1000.slice/user@1000.service/app.slice/app-slack_slack-4ff6abb389164e52b614e40762b46557.scope is not a snap cgroup

To remove the race, the child process would need to be moved to the new cgroup before the application process is started via exec(). It's also not enough to wait for the StartTransientUnit D-Bus call to complete: you need to wait for the corresponding JobRemoved signal.

If it is of interest, here's the upstream GNOME bug report I filed (different code, but implementing the same race condition):

https://gitlab.gnome.org/GNOME/gnome-shell/-/issues/6565
Comment 4 David Edmundson 2023-03-31 13:52:15 UTC
At a ubuntu level if you can patch startplasma.cpp

startkde/startplasma.cpp
342:    qputenv("KDE_APPLICATIONS_AS_SCOPE", "1");

to 
342:    qputenv("KDE_APPLICATIONS_AS_SERVICE", "1");

and land https://invent.kde.org/frameworks/kio/-/merge_requests/1019 so we don't regress something important.

I apparently haven't followed up on the latter, so I'll do so now and get it in the next frameworks. The env var guard existed because we couldn't rely on new system which had the new exit type at the time.
Comment 5 David Edmundson 2023-03-31 14:19:49 UTC
Git commit 5c0ce0f5882d732578be22ac6bb7e1d250e2046b by David Edmundson.
Committed on 31/03/2023 at 14:19.
Pushed by davidedmundson into branch 'master'.

Set ExitType when running applications as transient systemd services

KIO internally has 3 paths to launch applications.
 - fork and exec
- fork and exec then put in a "scope" so that we have our own cgroup
(default)
 - asking systemd to launch the application

The latter was explicitly disabled by default because we had some issues
with applications being cleaned up when the first process exited. We
made those upstream changes a year ago, but haven't adopted the changes
here. This brings it in line with the setup used by xdg-autostart-generator.

Nothing will change when we land this as this operation mode is not
enabled.
Related: bug 461289

M  +13   -11   src/gui/systemd/systemdprocessrunner.cpp

https://invent.kde.org/frameworks/kio/commit/5c0ce0f5882d732578be22ac6bb7e1d250e2046b
Comment 6 Bug Janitor Service 2023-04-03 08:44:54 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/kio/-/merge_requests/1244
Comment 7 David Edmundson 2023-04-03 09:18:41 UTC
Git commit 0c540119248c1eb373eaf2ee323094798897967a by David Edmundson.
Committed on 03/04/2023 at 08:44.
Pushed by davidedmundson into branch 'kf5'.

Set ExitType when running applications as transient systemd services

KIO internally has 3 paths to launch applications.
 - fork and exec
- fork and exec then put in a "scope" so that we have our own cgroup
(default)
 - asking systemd to launch the application

The latter was explicitly disabled by default because we had some issues
with applications being cleaned up when the first process exited. We
made those upstream changes a year ago, but haven't adopted the changes
here. This brings it in line with the setup used by xdg-autostart-generator.

Nothing will change when we land this as this operation mode is not
enabled.
Related: bug 461289


(cherry picked from commit 5c0ce0f5882d732578be22ac6bb7e1d250e2046b)

M  +13   -11   src/gui/systemd/systemdprocessrunner.cpp

https://invent.kde.org/frameworks/kio/commit/0c540119248c1eb373eaf2ee323094798897967a
Comment 8 techxgames 2024-01-01 01:52:16 UTC
I'm still experiencing this issue :-(

https://bugs.launchpad.net/ubuntu/+source/gnome-shell/+bug/2011806/comments/43