Summary: | Window drag causes "Pageflip timed out! This is a kernel bug" | ||
---|---|---|---|
Product: | [Plasma] kwin | Reporter: | Lassi Väätämöinen <lassi.vaatamoinen> |
Component: | platform-drm | Assignee: | KWin default assignee <kwin-bugs-null> |
Status: | RESOLVED UPSTREAM | ||
Severity: | major | CC: | b.lucab1211, kde, lg096066587039, nate, postix, veehexx, vlad.zahorodnii, xaver.hugl |
Priority: | NOR | ||
Version: | 6.1.5 | ||
Target Milestone: | --- | ||
Platform: | openSUSE | ||
OS: | Linux | ||
See Also: |
https://bugs.kde.org/show_bug.cgi?id=492506 https://bugs.kde.org/show_bug.cgi?id=494044 https://bugs.kde.org/show_bug.cgi?id=489755 https://bugs.kde.org/show_bug.cgi?id=489878 |
||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Backtrace
Full backtrace with AMD gpu and Plasma 6.2.1 and Kernel 6.11.3 |
Description
Lassi Väätämöinen
2024-09-17 16:26:41 UTC
If it's a kernel bug, you should report it to the kernel, no? (In reply to Nate Graham from comment #1) > If it's a kernel bug, you should report it to the kernel, no? It would be awesome to have some pointers where to kernel this might belong to? DRM? AMD GPU? Other? CCing some KWin devs who might be able to answer that! *** Bug 493333 has been marked as a duplicate of this bug. *** It should be reported at https://gitlab.freedesktop.org/drm/amd/-/issues, though to make sure we should look at the system log. You can get it for the last boot with `journalctl --system --boot -1 | grep kernel` (In reply to Zamundaaa from comment #5) > It should be reported at https://gitlab.freedesktop.org/drm/amd/-/issues, > though to make sure we should look at the system log. You can get it for the > last boot with `journalctl --system --boot -1 | grep kernel` Ok, that excerpt I attached in the description is pretty much everything I'm able to see from dmesg or journal. (Also, I think 'grep kernel' here does same as '-k' flag for journalctl would do?) If amdgpu doesn't print anything, maybe this is a false positive. When the freeze happens, can you ssh in from another device? (In reply to Zamundaaa from comment #7) > If amdgpu doesn't print anything, maybe this is a false positive. When the > freeze happens, can you ssh in from another device? Perhaps. I was in a telco the other day and I could hear the live audio playing back just fine, but there was nothing happening on the graphics side, nor could I switch to another TTY. If you can ssh in, then please do that when you experience the hang, and start debugging KWin with > sudo gdb -p $(pidof kwin_wayland) and then get the backtrace with > bt and attach it here *** Bug 493754 has been marked as a duplicate of this bug. *** (In reply to Nate Graham from comment #10) > *** Bug 493754 has been marked as a duplicate of this bug. *** so thats my bug, but does seem different than OP (i found this, but didnt seem to be 100% identical, hence raising my own). i dont see it during window dragging every other freeze has resulted in no useful logs to report it. It's almost exclusively around sleep/wake time for me. This crash however, resulted in the journalctl log i posted in 493754. Framework 13 laptop i5-1240P (intel iGPU), the issue has only occured when using external display. REISUB has no recovery for me either. keyboard is completely locked up with even capslock LED frozen in its current state. Created attachment 174329 [details]
Backtrace
Got a back trace. Not sure how good it is for you, as I've had to do it via juicessh on phone.
The second run had kwin-wayland debuginfo installed. In the hope it shows more.
Crash was shortly after a resume from sleep, laptop only (no charge/dock/external display or peripherals connected)
Super rare for me to have a window move crash , so hopefully this helps.
(In reply to Tim D from comment #12) > Created attachment 174329 [details] > Backtrace > > Got a back trace. Not sure how good it is for you, as I've had to do it via > juicessh on phone. > > The second run had kwin-wayland debuginfo installed. In the hope it shows > more. > > Crash was shortly after a resume from sleep, laptop only (no > charge/dock/external display or peripherals connected) > > Super rare for me to have a window move crash , so hopefully this helps. to add to this, the laptop was connected to dock on external screen when i put laptop to sleep last night. Disconnected the TB3 cable to dock while in sleep and a number of hours later, wake laptop up. first move of a window resulted in the crash. Regarding Comment11 above, i guess the sysRq+REISUB method i'm doing wrong, and SSHd cfg was disabled from external connections so when i've tried in the past, i was unable to login. Fixed the SSH listener last night - just in time to get the required backtrace!! It's very helpful. The backtrace suggests that KWin hangs in the kernel, trying to allocate a new buffer for some kwin-internal window. I assume that's a kernel bug; could you please report it at https://gitlab.freedesktop.org/drm/i915/kernel/-/issues? Lassi Väätämöinen, I see you're on AMD, so your issue is likely triggered by something else. Please get a backtrace as well (In reply to Zamundaaa from comment #14) > It's very helpful. The backtrace suggests that KWin hangs in the kernel, > trying to allocate a new buffer for some kwin-internal window. I assume > that's a kernel bug; could you please report it at > https://gitlab.freedesktop.org/drm/i915/kernel/-/issues? > report raised (for Intel hardware)! Thanks for taking a look, i'd be lost without any help! https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12341 I guess https://invent.kde.org/plasma/kwin/-/merge_requests/6650 applies for this bug? :) Just experienced the same observed result as the OT, but the Steps to Reproduce here don't work for me, it just happens randomly. Last time while watching a YT video in a maximized Firefox 131 window for instance. Also switching TTYs hasn't worked anymore: No more input was accepted and I had to hard reset the laptop. Operating System: Fedora Linux 40 KDE Plasma Version: 6.2.0 KDE Frameworks Version: 6.7.0 Qt Version: 6.7.2 Kernel Version: 6.11.3-200.fc40.x86_64 (64-bit) Graphics Platform: Wayland Processors: 16 × AMD Ryzen 7 PRO 5850U with Radeon Graphics (In reply to postix from comment #16) > I guess https://invent.kde.org/plasma/kwin/-/merge_requests/6650 applies for > this bug? :) It'll help for GPU hotunplugs, not for actual pageflip timeouts that happen for other reasons. Those are kernel bugs that compositors can't recover from. (In reply to postix from comment #17) > Also switching TTYs hasn't worked anymore: No more input was accepted and I > had to hard reset the laptop. That can either mean that KWin's main thread hung up, or a kernel bug. You can find out which it is by looking at the kernel logs and the backtrace for KWin. Created attachment 174997 [details] Full backtrace with AMD gpu and Plasma 6.2.1 and Kernel 6.11.3 Ad comment 17 dmesg ``` [12299.273568] amdgpu 0000:06:00.0: [drm] Mode Validation Warning: Unknown Status failed validation. [12309.482092] amdgpu 0000:06:00.0: [drm] *ERROR* [CRTC:73:crtc-0] flip_done timed out [12312.044146] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:73:crtc-0] hw_done or flip_done timed out ``` journalctl ``` 20:13:39 kwin_wayland_wrapper[2767]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough! 20:13:39 kwin_wayland_wrapper[2767]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough! 20:13:39 kwin_wayland_wrapper[2767]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough! 20:13:39 kwin_wayland[2663]: Connecting to deprecated signal QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString) 20:13:39 kwin_wayland_wrapper[2767]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough! (...) 20:17:01 kwin_wayland[2663]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug 20:17:07 kwin_wayland[2663]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug 20:17:12 kwin_wayland[2663]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug 20:17:17 kwin_wayland[2663]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug 20:17:22 kwin_wayland[2663]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug (...) ``` gdb bt ``` #0 0x00007ff295f1cdb0 in __GI_ppoll (fds=fds@entry=0x55b43085d9b0, nfds=nfds@entry=13, timeout=<optimized out>, timeout@entry=0x7ffcd3ecba30, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42 #1 0x00007ff296751044 in ppoll (__fds=<optimized out>, __nfds=<optimized out>, __timeout=<optimized out>, __ss=<optimized out>) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/kernel/qcore_unix.cpp:100 #2 qt_ppoll (fds=0x55b43085d9b0, nfds=13, timeout_ts=0x7ffcd3ecba30) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/kernel/qcore_unix.cpp:103 #3 qt_ppoll (fds=0x55b43085d9b0, nfds=13, timeout_ts=0x7ffcd3ecba30) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/kernel/qcore_unix.cpp:100 #4 qt_safe_poll (fds=0x55b43085d9b0, nfds=nfds@entry=13, deadline=...) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/kernel/qcore_unix.cpp:135 #5 0x00007ff296756c8e in QEventDispatcherUNIX::processEvents (this=<optimized out>, flags=...) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/tools/qarraydatapointer.h:119 #6 0x00007ff297363492 in QUnixEventDispatcherQPA::processEvents (this=<optimized out>, flags=...) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/gui/platform/unix/qunixeventdispatcher.cpp:27 #7 0x00007ff2965a3bc3 in QEventLoop::exec (this=this@entry=0x7ffcd3ecbc00, flags=..., flags@entry=...) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/global/qflags.h:34 #8 0x00007ff29659fa7c in QCoreApplication::exec () at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/global/qflags.h:74 #9 0x00007ff296dd66ed in QGuiApplication::exec () at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/gui/kernel/qguiapplication.cpp:1926 #10 0x00007ff29798b189 in QApplication::exec () at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/widgets/kernel/qapplication.cpp:2555 #11 0x000055b41387f8de in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/kwin-6.2.0-2.fc40.x86_64/src/main_wayland.cpp:634 ``` The dmesg output got two lines more short time after: ``` [12309.482092] amdgpu 0000:06:00.0: [drm] *ERROR* [CRTC:73:crtc-0] flip_done timed out [12312.044146] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:73:crtc-0] hw_done or flip_done timed out [13742.052733] amdgpu 0000:06:00.0: [drm] *ERROR* flip_done timed out [13742.052747] amdgpu 0000:06:00.0: [drm] *ERROR* [CRTC:73:crtc-0] commit wait timed out [13752.292701] amdgpu 0000:06:00.0: [drm] *ERROR* flip_done timed out [13752.292715] amdgpu 0000:06:00.0: [drm] *ERROR* [PLANE:58:plane-3] commit wait timed out [13762.532627] amdgpu 0000:06:00.0: [drm] *ERROR* flip_done timed out [13762.532641] amdgpu 0000:06:00.0: [drm] *ERROR* [PLANE:70:plane-5] commit wait timed out ``` Regarding freezes on AMD GPUs and pageflip time outs, see also: > [amdgpu]: random freezes with flip_done timed out on kernel 6.6.0 * https://gitlab.freedesktop.org/drm/amd/-/issues/2950 > *ERROR* flip_done timed out
Yep, this is a kernel bug. Please add your info to the amd issue, or make a new one in that repository
(In reply to Zamundaaa from comment #22) > > *ERROR* flip_done timed out > Yep, this is a kernel bug. Please add your info to the amd issue, or make a > new one in that repository Can no longer reproduce it with Kernel 6.11.6 and Plasma 6.2.3. :) |