Bug 485024 - Notification on kwin restart
Summary: Notification on kwin restart
Status: CONFIRMED
Alias: None
Product: kwin
Classification: Plasma
Component: general (show other bugs)
Version: git master
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Harald Sitter
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-04 11:42 UTC by Harald Sitter
Modified: 2024-05-02 16:30 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Harald Sitter 2024-04-04 11:42:16 UTC
SUMMARY
While the system is trying to fix an oom situation there is a good chance (so long as the system is responsive enough) that the systemd unit watchdog triggers and terminates kwin_wayland, thereby nuking the session.

Apr 04 13:28:07 ajax systemd[1007]: plasma-kwin_wayland.service: Watchdog timeout (limit 15s)!
Apr 04 13:28:07 ajax systemd[1007]: plasma-kwin_wayland.service: Killing process 1133 (kwin_wayland_wr) with signal SIGHUP.
...
Apr 04 13:28:11 ajax systemd[1007]: plasma-kwin_wayland.service: Failed with result 'watchdog'.
Apr 04 13:28:11 ajax systemd[1007]: Stopped KDE Window Manager.

STEPS TO REPRODUCE
1. run out of memory. have many VMs, or a leaky app, or just too much stuff open
2. system becomes slightly unresponsive

OBSERVED RESULT
you get thrown to sddm because your session has been stopped by the watchdog

EXPECTED RESULT
oom handling should be allowed to take place and kill a suitable client. note the additional info below though

SOFTWARE/OS VERSIONS
KDE Plasma Version: 6.0.80
KDE Frameworks Version: 6.1.0
Qt Version: 6.6.3
Kernel Version: 6.8.2-arch2-1 (64-bit)
Graphics Platform: Wayland
Processors: 12 × AMD Ryzen 5 3600X 6-Core Processor
Memory: 31.2 GiB of RAM
Graphics Processor: AMD Radeon RX 5700 XT

ADDITIONAL INFORMATION
In a way this is actually useful because it likely terminates what was causing the oom situation in the first place, not sure if we can make this a feature somehow. Maybe instead of terminating, wildly shoot at clients on SIGHUP? But then I suppose we'd be implementing yet another user space oom handler. Indeed perhaps we can consider this a feature? If the oom handler hasn't been able to fix the situation in 15 seconds it probably won't for a lot longer. That said, we definitely need to tell the user what happened.
Comment 1 David Edmundson 2024-05-02 13:47:24 UTC
If the system is frozen for so long, then systemd killing kwin is the right thing to do. There's not much else we can do in regards to this bug is there?
Comment 2 Harald Sitter 2024-05-02 16:16:45 UTC
Like I say, if we consider this a feature we need to tell the user that we just restarted the compositor. In particular since we cannot assume all clients will survive the reset.
Comment 3 David Edmundson 2024-05-02 16:30:59 UTC
Ack, notification is easy. Lets change scope to that.