Bug 511618 - kwin_wayland freezes on 2nd GPU hotplug
Summary: kwin_wayland freezes on 2nd GPU hotplug
Status: RESOLVED UPSTREAM
Alias: None
Product: kwin
Classification: Plasma
Component: core (other bugs)
Version First Reported In: 6.5.1
Platform: Debian unstable Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-11-04 12:34 UTC by Marc Riedel
Modified: 2025-11-12 22:02 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
journalctl, gdb backtrace (11.39 KB, text/plain)
2025-11-10 14:56 UTC, Christoph Haag
Details
journalctl (58.37 KB, text/plain)
2025-11-12 22:00 UTC, Marc Riedel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Riedel 2025-11-04 12:34:30 UTC
SUMMARY

My system has an IGD (i915) and a discrete GPU (nouveau).
My primary display is on the IGD. 
The display output of the second GPU is working well and can be configured via systems settings.
Detaching the discrete GPU is working fine.
But when attaching the discrete GPU back, then kwin_wayland freezes the hole primary display.

Those are the only two messages logged:
kwin_wayland_wrapper[18462]: kwin_core: Failed to open /dev/dri/renderD129 device (No such device)
kwin_wayland_wrapper[18462]: kwin_wayland_drm: failed to open drm device at "/dev/dri/renderD129"

After the hotplug event, udev sets the right acl for /dev/dri/card1 and /dev/dri/renderD129. 
Nevertheless the user is group video and render.

STEPS TO REPRODUCE

1. Send remove to discrete GPU: echo remove > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/uevent
kwin_wayland_wrapper[18462]: kwin_wayland_drm: Removing GPU "/dev/dri/card1"

2. Detach discrete GPU: rmmod nouveau
All is working fine so far.

3. Attach discrete GPU: modpobe nouveau

OBSERVED RESULT

Nouveau loads discrete GPU without any issue.
Primary display freezes. The session cannot be terminated via loginctl.
After killing the whole session and starting a new session, everything works flawlessly.
The display output of the second GPU is also working well and can be configured via systems settings.

EXPECTED RESULT

No session freeze.
Hotplug works out of the box.

SOFTWARE/OS VERSIONS

Operating System: Debian GNU/Linux 13
KDE Plasma Version: 6.5.1
KDE Frameworks Version: 6.18.0
Qt Version: 6.9.2
Kernel Version: 6.17.0-amd64 (64-bit)
Graphics Platform: Wayland

ADDITIONAL INFORMATION

The reason why I hotplug my discrete GPU is to pass it through to a virtual machine.
Nevertheless, I like to use discrete GPU when it's not passed through to a vm.
Comment 1 Marc Riedel 2025-11-04 12:38:02 UTC
Forgot to mention, that when exporting KWIN_DRM_DEVICES="/dev/dri/card0" kwin will not freeze. 
But it will not be possible to use the discrete GPU for driving a display.
Comment 2 Zamundaaa 2025-11-05 13:07:08 UTC
Please attach the output of
> journalctl --user-unit plasma-kwin_wayland --boot 0
after triggering the freeze. Getting a backtrace of KWin when it's frozen may also be useful:
> sudo gdb -p $(pidof kwin_wayland)
> bt
Comment 3 Christoph Haag 2025-11-10 14:56:44 UTC
Created attachment 186670 [details]
journalctl, gdb backtrace

Archlinux with kwin 6.5.2 and 7940HS (Radeon 790M) + 3080Ti eGPU here.

The only thing relevant in the logs seems to be

Nov 10 15:42:34 fw16 kwin_wayland[1458]: Failed to open /dev/dri/renderD129 device (No such device)
Nov 10 15:42:34 fw16 kwin_wayland[1458]: failed to open drm device at "/dev/dri/renderD129"

Race condition perhaps?
Comment 4 Zamundaaa 2025-11-10 22:25:29 UTC
Ideally we'd do a better job of filtering out render devices in the first place, but KWin wouldn't use them anyways, as they're not KMS capable, so the warning shouldn't be relevant.

That backtrace however shows the freeze is in Mesa - the device select layer specifically, it's waiting for the Wayland compositor to respond, which will hang for obvious reasons.
Comment 5 Marc Riedel 2025-11-11 14:30:13 UTC
Hi,
sorry for the late response, I was on vacation.

@Christoph Haag
Thank you for reproducing and doing the back trace

@Zamundaaa
I don't fully understand the answer. 
My understanding is that nouveau is KMS capable.
So what reasons are obvious?
Do you need more information?

Best regards,

Marc
Comment 6 Zamundaaa 2025-11-11 16:40:49 UTC
When KWin creates a gbm device for the new KMS node, Mesa blocks in a Vulkan layer, waiting for KWin to respond to it through Wayland. So it's making KWin wait on KWin, which just hangs infinitely.

Zink had some similar issues before, this needs to be reported at https://gitlab.freedesktop.org/mesa/mesa/-/issues and fixed in Mesa
Comment 7 Marc Riedel 2025-11-12 16:19:26 UTC
To keep track of the bug in mesa: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14148
Comment 8 Marc Riedel 2025-11-12 22:00:18 UTC
Created attachment 186740 [details]
journalctl
Comment 9 Marc Riedel 2025-11-12 22:02:11 UTC
I just want to confirm that with merge request https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38252 KWin does not freeze anymore when hot plugging my second GPU.
But KWin will hang for a short time (see attachment)