Bug 516038

Summary:	kwin_wayland loses DRM master during S3 suspend and never re-acquires it on resume, resulting in a permanent black screen. The GPU and NVIDIA kernel modules are fully functional after resume (proven by SysRq test), but kwin enters an unrecoverable error l
Product:	[Plasma] kwin	Reporter:	thedarkbird
Component:	general	Assignee:	KWin default assignee <kwin-bugs-null>
Status:	REPORTED ---
Severity:	major
Priority:	NOR
Version First Reported In:	unspecified
Target Milestone:	---
Platform:	Fedora RPMs
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description thedarkbird 2026-02-15 14:11:38 UTC

Please note that I have used Claude AI to analyze this problem, but I have done so thinking along with it, asking critical questions, over the course of several days. I do not have the extensive linux knowledge required to do these kinds of analyses (although I understand the basic functionality of a linux system). It seemed to me that its final conclusion made enough sense to post it here.

What follows below is a Claude-generated summary of the issue:

SYSTEM INFORMATION

Distro: Fedora 43
Plasma: 6.5
KWin: 6.5.5 (Wayland)
Kernel: 6.18.9-200.fc43.x86_64
GPU: NVIDIA RTX 4080 (proprietary driver 580.119.02, open kernel modules)
CPU/iGPU: AMD Ryzen 7000 (Raphael iGPU, no displays connected)
Displays: DP-1 + HDMI-A-1, both on NVIDIA GPU
DRM devices: card1 = NVIDIA (pci-0000:01:00.0), card2 = amdgpu (pci-0000:14:00.0)
Sleep mode: S3 deep sleep ([deep] in /sys/power/mem_sleep)
Initramfs: NVIDIA modules loaded as to correctly display LUKS password prompt

STEPS TO REPRODUCE

1. Boot normally, log into Plasma Wayland session
2. Suspend to RAM (S3 deep sleep) - not a manually forced sleep, but having the system do it by itself after the set amount of time
3. Wake the system (press power button or keyboard)

EXPECTED BEHAVIOR

Displays turn on, lock screen appears, session resumes normally.

ACTUAL BEHAVIOR

System wakes (fans, disks spin up), but both displays remain permanently black. Ctrl+Alt+F3 (VT switch) also produces no output. The system is otherwise alive (accessible via SSH). Without intervention, a hard reboot is required.

KEY FINDING: GPU IS FUNCTIONAL AFTER RESUME

Pressing Alt+SysRq+REISUB after a failed resume brings the display back at the S (sync) step. By that point, E (SIGTERM) and I (SIGKILL) have killed all userspace processes including kwin_wayland. The kernel reclaims DRM master and fbcon takes over the display successfully.

This proves the GPU hardware and NVIDIA kernel modules are fully functional after resume. The failure is in kwin_wayland, not in the kernel driver.

JOURNAL EVIDENCE

Resume timeline (journalctl -b -1):

14:09:18 nvidia-suspend.service runs successfully
14:09:19 System enters S3 deep sleep
14:31:05 System wakes — kernel resumes, CPUs come back online
14:31:05 amdgpu resumes normally (no displays connected, expected)
14:31:06 session-2.scope thawed — kwin_wayland is unfrozen
14:31:06 kwin_wayland: Failed to open drm node: "/dev/dri/card0" (card0 doesn't exist, harmless)
14:31:06 nvidia-resume.service starts
14:31:06 kwin_wayland: Atomic modeset test failed! Permission denied <-- FIRST FAILURE
14:31:06 kwin_wayland: Applying output configuration failed!
14:31:06 nvidia-resume.service finishes successfully
14:31:06 kwin_wayland: Setting dpms mode failed!
14:31:15 Hundreds of "Atomic modeset test failed! Permission denied" — never recovers
14:32:20 Still spamming errors — kwin is permanently stuck

Relevant kwin_wayland messages:

kwin_wayland[2972]: Failed to open drm node: "/dev/dri/card0"
kwin_wayland[2972]: Failed to open drm node: "/dev/dri/card0"
kwin_wayland[2972]: Atomic modeset test failed! Permission denied
kwin_wayland[2972]: Applying output configuration failed!
kwin_wayland[2972]: Atomic modeset test failed! Permission denied
kwin_wayland[2972]: Setting dpms mode failed!
(repeats hundreds of times, never recovers)

logind only logs "Operation 'suspend' finished." — there is no evidence of DRM master being re-granted to the session.

nvidia-resume.service ran and completed successfully. The NVIDIA kernel driver resumed without errors.

ANALYSIS

The "Permission denied" error from drmModeAtomicCommit() indicates kwin has lost DRM master status during S3 suspend. Two problems prevent recovery:

1. DRM master is not re-granted after resume. logind does not appear to re-issue DRM master to the active session's kwin instance after S3 resume completes.

2. kwin has no recovery mechanism. Once the first atomic modeset fails, kwin enters an infinite error loop, retrying the same failing operation without ever attempting to re-acquire DRM master. A fresh kwin instance (started after the old one is killed) acquires DRM master from logind without issues.

There is also a possible race condition: kwin is unfrozen and attempts modesetting at the same moment nvidia-resume.service is still running. However, the errors persist long after nvidia-resume.service completes, so the race is at most a trigger — the lack of DRM master recovery is the root cause.

WHY THIS IS A KWIN BUG (NOT NVIDIA)

- The SysRq test proves the GPU and nvidia-drm kernel module are fully operational after resume — fbcon can drive the displays via the same hardware.
- A freshly started kwin_wayland (after killing the stuck one) acquires DRM master and works perfectly.
- The failure is kwin not recovering from a lost DRM master state, regardless of why the DRM master was lost.

Bug 477738 was closed as RESOLVED DOWNSTREAM, attributing this to NVIDIA. The SysRq evidence contradicts that conclusion — the kernel driver works, but kwin does not attempt to re-acquire DRM master when it loses it during suspend.

RELATED BUGS

Bug 477738 — Same error signature ("Atomic commit failed! Permission denied" after resume). Closed DOWNSTREAM. The SysRq evidence shows the issue is in kwin's lack of DRM master recovery.

Bug 509439 — Fixed in KWin 6.5.0 (EGL context handling on resume). We run 6.5.5; this fix is present but insufficient.

Bug 478090 — Fixed in Plasma 6.3.1 (lock screen black screen). Present in our version, not our issue.

WORKAROUND

Pressing Alt+SysRq+E kills all userspace. SDDM restarts, a fresh kwin acquires DRM master, and the session can be restored (unsaved work is lost). Mostly a technical workaround, not a functional one.

Comment 1 thedarkbird 2026-02-16 18:28:11 UTC

UPDATE: POSSIBLE WORKAROUND AND ADDITIONAL DATA

  Setting KWIN_DRM_DEVICES=/dev/dri/card1 in /etc/environment (restricting kwin to only the NVIDIA GPU) appears to resolve
  the issue — suspend/resume worked on the next attempt. However, the bug may be intermittent, so this needs more testing.

  With this variable set, kwin still hits the same "Permission denied" race with nvidia-resume.service, but recovers on its
  own within milliseconds:

  19:08:53.100  session-2.scope thawed
  19:08:53.103  nvidia-resume.service starts
  19:08:53.109  kwin: Atomic modeset test failed! Permission denied
  19:08:53.109  kwin: Setting dpms mode failed!
  19:08:53.122  nvidia-resume.service finishes
                (kwin recovers silently, session resumes normally)

  Compare with the failed resume (without KWIN_DRM_DEVICES):

  14:31:06.826  session-2.scope thawed
  14:31:06.828  nvidia-resume.service starts
  14:31:06.831  kwin: Failed to open drm node: "/dev/dri/card0"
  14:31:06.835  kwin: Failed to open drm node: "/dev/dri/card0"
  14:31:06.849  kwin: Atomic modeset test failed! Permission denied
  14:31:06.849  kwin: Applying output configuration failed!
  14:31:06.851  kwin: Atomic modeset test failed! Permission denied
  14:31:06.856  nvidia-resume.service finishes
  14:31:15      Error storm begins — 2766+ errors, never recovers

  One visible difference is kwin trying to open /dev/dri/card0 during the failed resume, which doesn't exist (only
  card1=NVIDIA and card2=amdgpu are present). This might be what pushes kwin into the "Applying output configuration failed!"
   code path, which might in turn trigger the unrecoverable retry loop 9 seconds later.

  That said, a previous boot without KWIN_DRM_DEVICES also hit "Applying output configuration failed!" (from a failed card2
  open) and recovered fine — so the card0 probe failure alone doesn't guarantee the loop. The catastrophic failure might
  require a specific combination of conditions.

  What does seem clear is that the "Permission denied" modeset error by itself is recoverable — every boot has it briefly
  during the nvidia-resume race, and kwin handles it. Something additional has to go wrong to trigger the permanent loop.