Bug 480895 - KWin has a DRM timeout issue with GPU resets when using the hardware cursor
Summary: KWin has a DRM timeout issue with GPU resets when using the hardware cursor
Status: RESOLVED FIXED
Alias: None
Product: kwin
Classification: Plasma
Component: wayland-generic (other bugs)
Version First Reported In: 5.93.0
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords: qt6
Depends on:
Blocks:
 
Reported: 2024-02-05 13:33 UTC by fililip
Modified: 2024-02-06 09:55 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
journalctl log (2.08 MB, text/x-log)
2024-02-05 13:33 UTC, fililip
Details

Note You need to log in before you can comment on or make changes to this bug.
Description fililip 2024-02-05 13:33:09 UTC
Created attachment 165565 [details]
journalctl log

SUMMARY
5.92.0 is the last KWin version where GPU resets work, on 5.93.0 they cause a DRM timeout:

[  309.469926] amdgpu 0000:0b:00.0: amdgpu: GPU reset(4) succeeded!
[  341.340379] amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
[  343.468579] amdgpu 0000:0b:00.0: [drm] *ERROR* flip_done timed out
[  343.468585] amdgpu 0000:0b:00.0: [drm] *ERROR* [CRTC:85:crtc-0] commit wait timed out
[  353.708655] amdgpu 0000:0b:00.0: [drm] *ERROR* flip_done timed out
[  353.708666] amdgpu 0000:0b:00.0: [drm] *ERROR* [PLANE:64:plane-4] commit wait timed out
[  354.242787] amdgpu 0000:0b:00.0: amdgpu: MODE1 reset

When that happens, the PC sometimes requires a hard reset, sometimes it doesn't, but KWin is unusable after that anyway.
Not sure if that's a KWin regression, usage of a new feature that hasn't gotten implemented in the kernel (DRM) uAPI or a kernel regression. I've also tried restoring the kernel to 6.7.0 as well as other packages (where I remember it working fine) to no avail.
Tested on latest Mesa 24.1-dev.

Screen configuration (if relevant):
- primary display: 1080p165 VRR (adaptive sync set to "Always")
- secondary display: 1080p60 non-VRR

STEPS TO REPRODUCE
1. Upgrade KWin to 5.93.0
2. Reset the GPU

OBSERVED RESULT
The session is frozen (and sometimes the entire computer with it)

EXPECTED RESULT
The session is restored just fine

SOFTWARE/OS VERSIONS
OS: Arch Linux 6.7.3-zen1-2-zen
KDE Plasma Version: 5.93.0
KDE Frameworks Version: 5.249.0
Qt Version: 6.7.0
Comment 1 Zamundaaa 2024-02-05 15:11:37 UTC
This is a kernel bug, please report it to https://gitlab.freedesktop.org/drm/amd/-/issues

Maybe we should add a fallback timer for when pageflips time out so that we're not stuck waiting for the kernel. I don't know if the system would be usable after that though.
Comment 2 fililip 2024-02-05 15:21:06 UTC
One more thing: it also happens on 5.92.0 but can be worked around by disabling the hardware cursor, then it works. Should have added that.
Comment 3 fililip 2024-02-05 15:21:21 UTC
(In reply to Zamundaaa from comment #1)
> This is a kernel bug, please report it to
> https://gitlab.freedesktop.org/drm/amd/-/issues
> 
> Maybe we should add a fallback timer for when pageflips time out so that
> we're not stuck waiting for the kernel. I don't know if the system would be
> usable after that though.

Ok, will do
Comment 4 fililip 2024-02-05 15:39:25 UTC
(In reply to fililip from comment #3)
> (In reply to Zamundaaa from comment #1)
> > This is a kernel bug, please report it to
> > https://gitlab.freedesktop.org/drm/amd/-/issues
> > 
> > Maybe we should add a fallback timer for when pageflips time out so that
> > we're not stuck waiting for the kernel. I don't know if the system would be
> > usable after that though.
> 
> Ok, will do

https://gitlab.freedesktop.org/drm/amd/-/issues/3155
Done
Comment 5 fililip 2024-02-05 16:56:45 UTC
Another observation: this has nothing to do with KWin's version. Disabling the hardware cursor just makes it work somehow.
Comment 6 Bug Janitor Service 2024-02-05 18:38:03 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/5113
Comment 7 Zamundaaa 2024-02-05 22:59:34 UTC
Git commit 14749e91e962935fe9ad6b3358ec720c3645e8ea by Xaver Hugl.
Committed on 05/02/2024 at 22:51.
Pushed by zamundaaa into branch 'master'.

backends/drm: try to handle page flips timing out

While this should really never happen in the first place, if the kernel still accepts
atomic commits, this is better than the screen(s) freezing and never recovering.

M  +27   -5    src/backends/drm/drm_commit_thread.cpp
M  +1    -1    src/backends/drm/drm_commit_thread.h
M  +1    -1    src/backends/drm/drm_pipeline.cpp

https://invent.kde.org/plasma/kwin/-/commit/14749e91e962935fe9ad6b3358ec720c3645e8ea
Comment 8 Zamundaaa 2024-02-05 23:11:06 UTC
Git commit bb7d2152ba453aba7ae48958e661672e54bb2275 by Xaver Hugl.
Committed on 05/02/2024 at 23:00.
Pushed by zamundaaa into branch 'Plasma/6.0'.

backends/drm: try to handle page flips timing out

While this should really never happen in the first place, if the kernel still accepts
atomic commits, this is better than the screen(s) freezing and never recovering.


(cherry picked from commit 14749e91e962935fe9ad6b3358ec720c3645e8ea)

M  +27   -5    src/backends/drm/drm_commit_thread.cpp
M  +1    -1    src/backends/drm/drm_commit_thread.h
M  +1    -1    src/backends/drm/drm_pipeline.cpp

https://invent.kde.org/plasma/kwin/-/commit/bb7d2152ba453aba7ae48958e661672e54bb2275
Comment 9 fililip 2024-02-06 09:55:31 UTC
Thanks a lot for the prompt fix!