Bug 487851 - Issue with Systemd Watchdog and s2idle
Summary: Issue with Systemd Watchdog and s2idle
Status: RESOLVED FIXED
Alias: None
Product: kwin
Classification: Plasma
Component: wayland-generic (show other bugs)
Version: git-stable-Plasma/6.1
Platform: Gentoo Packages Linux
: VHI crash
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-31 20:34 UTC by snow flurry
Modified: 2024-06-06 16:24 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
journalctl output showing kwin_watchdog being triggered (1.41 KB, text/plain)
2024-05-31 20:34 UTC, snow flurry
Details
Additional journalctl output affecting my PC, similar behavior. (3.02 KB, text/x-log)
2024-06-02 20:20 UTC, briguy992
Details

Note You need to log in before you can comment on or make changes to this bug.
Description snow flurry 2024-05-31 20:34:34 UTC
Created attachment 170025 [details]
journalctl output showing kwin_watchdog being triggered

***
If you're not sure this is actually a bug, instead post about it at https://discuss.kde.org

If you're reporting a crash, attach a backtrace with debug symbols; see https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***

SUMMARY
When a system using s2idle for suspend wakes up, the systemd watchdog for kwin_wayland can trigger, causing KWin to quit. This causes the user to lose their running session.

STEPS TO REPRODUCE
1. Ensure `/sys/power/mem_sleep` is set to "s2idle".
2. Put the system into sleep mode, and wait some time (5-10 minutes appears to work).
3. Wake up the system.

OBSERVED RESULT
The lock screen appears for a moment before the screen blanks and the user is returned to SDDM.
From the logs, KWin is sent SIGHUP and seems to try to recover, but ultimately terminates. 

EXPECTED RESULT
The lock screen appears normally and the user can log back into their running session.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Gentoo Linux
(available in About System)
KDE Plasma Version: 6.0.90
KDE Frameworks Version: 6.2.0
Qt Version: 6.7.0

ADDITIONAL INFORMATION
This appears to be an issue affecting systemd more than KWin, and is already known (see https://access.redhat.com/solutions/5118401). Unfortunately, my laptop doesn't support S3/deep sleep (possibly due to ACPI?):

  # cat /sys/power/mem_sleep
  [s2idle]
  # echo "deep" > /sys/power/mem_sleep
  -bash: echo: write error: Invalid argument

I've been able to work around this by changing WatchdogSec in my plasma-kwin_wayland.service override:

  [Service]
  WatchdogSec=3m

It seems like the s2idle interrupts never cause the clock to count up that high, but there are cases where it could (ex., https://github.com/systemd/systemd/issues/9538).
Comment 1 briguy992 2024-06-02 20:20:44 UTC
Created attachment 170068 [details]
Additional journalctl output affecting my PC, similar behavior.

Attaching output from my PC that shows the same behavior. Pretty nasty bug since at first I thought something was crashing when trying to wake from sleep since everything gets terminated when the workspace exits.
Comment 2 Bug Janitor Service 2024-06-04 21:01:13 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/5834
Comment 3 Ed Tomlinson 2024-06-06 12:25:49 UTC
This looks to be happening after a hibernate with kwin 6.1-beta.  Suspend is working but after a hibernate I see:
```Jun 06 07:47:56 grover kernel: PM: hibernation: hibernation exit
Jun 06 07:47:56 grover nut-server[1237]: Data for UPS [opti] is stale - check driver
Jun 06 07:47:56 grover nut-server[1237]: UPS [opti] data is no longer stale
Jun 06 07:47:56 grover systemd[1272]: plasma-kwin_wayland.service: Watchdog timeout (limit 15s)!
Jun 06 07:47:56 grover nut-driver@opti[1040]: nut_libusb_get_interrupt: Input/Output Error
Jun 06 07:47:56 grover upsd[1237]: Data for UPS [opti] is stale - check driver
Jun 06 07:47:56 grover fan2go[973]: WARNING: PWM of top2 was changed by third party! Last set PWM value was: 170 but is now: 67
Jun 06 07:47:56 grover fan2go[973]: WARNING: PWM of back was changed by third party! Last set PWM value was: 171 but is now: 137
Jun 06 07:47:56 grover fan2go[973]: WARNING: PWM of cpu was changed by third party! Last set PWM value was: 89 but is now: 87
Jun 06 07:47:56 grover systemd[1272]: plasma-kwin_wayland.service: Killing process 1756 (kwin_wayland_wr) with signal SIGHUP.
Jun 06 07:47:56 grover upsd[1237]: UPS [opti] data is no longer stale
Jun 06 07:47:56 grover fan2go[973]: WARNING: PWM of front was changed by third party! Last set PWM value was: 150 but is now: 67
Jun 06 07:47:56 grover fan2go[973]: WARNING: PWM of top1 was changed by third party! Last set PWM value was: 177 but is now: 67
Jun 06 07:47:56 grover systemd[1272]: plasma-kwin_wayland.service: Killing process 1762 (kwin_wayland) with signal SIGHUP.
Jun 06 07:47:56 grover dnscrypt-proxy[964]: Sorted latencies:
Jun 06 07:47:56 grover systemd[1272]: plasma-kwin_wayland.service: Killing process 1910 (Xwayland) with signal SIGHUP.
Jun 06 07:47:56 grover dnscrypt-proxy[964]: -    14ms google
Jun 06 07:47:56 grover systemd[1272]: plasma-kwin_wayland.service: Killing process 90412 (kscreenlocker_g) with signal SIGHUP.
Jun 06 07:47:56 grover dnscrypt-proxy[964]: -    33ms cloudflare
Jun 06 07:47:56 grover systemd[1272]: Stopped target plasma-workspace-wayland.target.
Jun 06 07:47:56 grover dnscrypt-proxy[964]: Server with the lowest initial latency: google (rtt: 14ms)
```
Comment 4 Ed Tomlinson 2024-06-06 14:38:29 UTC
Setting the timeout to 3m also allows hibernate to resume here.  Suggest that instead of disabling the watchdog, set the timeout to 3m.
Comment 5 Vlad Zahorodnii 2024-06-06 16:24:37 UTC
Git commit c778c909f1b128ce517678d4666b0d4f78ef69d0 by Vlad Zahorodnii, on behalf of Xaver Hugl.
Committed on 06/06/2024 at 16:13.
Pushed by vladz into branch 'Plasma/6.1'.

disable the systemd watchdog by default

It wrongly kills KWin after s2idle on some systems

M  +5    -4    plasma-kwin_wayland.service.in

https://invent.kde.org/plasma/kwin/-/commit/c778c909f1b128ce517678d4666b0d4f78ef69d0