Bug 493277 - Window drag causes "Pageflip timed out! This is a kernel bug"
Summary: Window drag causes "Pageflip timed out! This is a kernel bug"
Status: RESOLVED UPSTREAM
Alias: None
Product: kwin
Classification: Plasma
Component: platform-drm (show other bugs)
Version: 6.1.5
Platform: openSUSE Linux
: NOR major
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
: 493333 493754 (view as bug list)
Depends on:
Blocks:
 
Reported: 2024-09-17 16:26 UTC by Lassi Väätämöinen
Modified: 2024-11-11 16:23 UTC (History)
8 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Backtrace (34.55 KB, text/plain)
2024-10-02 17:40 UTC, Tim D
Details
Full backtrace with AMD gpu and Plasma 6.2.1 and Kernel 6.11.3 (19.87 KB, text/x-log)
2024-10-18 18:36 UTC, postix
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lassi Väätämöinen 2024-09-17 16:26:41 UTC
SUMMARY
Desktop graphics output freezes when starting window move on desktop with mouse button hold + drag.
Frequency: today 2/3 boots this happened.

STEPS TO REPRODUCE
1. Boot up and sign in to Plasma session
2. Do normal desktop stuff
3. Grab a window at the title bar and start dragging.

OBSERVED RESULT
UI freezes, but for example audio that was playing in the background keeps playing normally.

EXPECTED RESULT
Window moves and nothing crashes.

SOFTWARE/OS VERSIONS
Operating System: openSUSE Tumbleweed 20240916
KDE Plasma Version: 6.1.5
KDE Frameworks Version: 6.6.0
Qt Version: 6.7.2
Kernel Version: 6.10.9-1-default (64-bit)
Graphics Platform: Wayland
Processors: 12 × AMD Ryzen 5 3600 6-Core Processor
Memory: 31,3 GiB of RAM
Graphics Processor: AMD Radeon RX 580 Series
Manufacturer: Micro-Star International Co., Ltd.
Product Name: MS-7B79
System Version: 4.0

ADDITIONAL INFORMATION

syys 17 19:15:47 kaappi kernel: sched: RT throttling activated
syys 17 19:15:51 kaappi kwin_wayland[2035]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
syys 17 19:15:56 kaappi kwin_wayland[2035]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
syys 17 19:16:01 kaappi kwin_wayland[2035]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
syys 17 19:16:06 kaappi kwin_wayland[2035]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
syys 17 19:16:11 kaappi kwin_wayland[2035]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
syys 17 19:16:16 kaappi kwin_wayland[2035]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
syys 17 19:16:21 kaappi kwin_wayland[2035]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug

There's no way to come out of this. The softest approach is Alt+SysRq+REISUB. Or just a hard reset.
Comment 1 Nate Graham 2024-09-17 20:36:42 UTC
If it's a kernel bug, you should report it to the kernel, no?
Comment 2 Lassi Väätämöinen 2024-09-17 20:48:44 UTC
(In reply to Nate Graham from comment #1)
> If it's a kernel bug, you should report it to the kernel, no?

It would be awesome to have some pointers where to kernel this might belong to? DRM? AMD GPU? Other?
Comment 3 Nate Graham 2024-09-17 22:10:32 UTC
CCing some KWin devs who might be able to answer that!
Comment 4 Nate Graham 2024-09-18 18:19:33 UTC
*** Bug 493333 has been marked as a duplicate of this bug. ***
Comment 5 Zamundaaa 2024-09-18 23:29:55 UTC
It should be reported at https://gitlab.freedesktop.org/drm/amd/-/issues, though to make sure we should look at the system log. You can get it for the last boot with `journalctl --system --boot -1 | grep kernel`
Comment 6 Lassi Väätämöinen 2024-09-19 06:40:20 UTC
(In reply to Zamundaaa from comment #5)
> It should be reported at https://gitlab.freedesktop.org/drm/amd/-/issues,
> though to make sure we should look at the system log. You can get it for the
> last boot with `journalctl --system --boot -1 | grep kernel`

Ok, that excerpt I attached in the description is pretty much everything I'm able to see from dmesg or journal.
(Also, I think 'grep kernel' here does same as '-k' flag for journalctl would do?)
Comment 7 Zamundaaa 2024-09-19 16:46:57 UTC
If amdgpu doesn't print anything, maybe this is a false positive. When the freeze happens, can you ssh in from another device?
Comment 8 Lassi Väätämöinen 2024-09-19 16:49:04 UTC
(In reply to Zamundaaa from comment #7)
> If amdgpu doesn't print anything, maybe this is a false positive. When the
> freeze happens, can you ssh in from another device?

Perhaps. I was in a telco the other day and I could hear the live audio playing back just fine, but there was nothing happening on the graphics side, nor could I switch to another TTY.
Comment 9 Zamundaaa 2024-09-25 15:41:25 UTC
If you can ssh in, then please do that when you experience the hang, and start debugging KWin with
> sudo gdb -p $(pidof kwin_wayland)
and then get the backtrace with
> bt
and attach it here
Comment 10 Nate Graham 2024-09-30 19:43:51 UTC
*** Bug 493754 has been marked as a duplicate of this bug. ***
Comment 11 Tim D 2024-10-01 15:36:49 UTC
(In reply to Nate Graham from comment #10)
> *** Bug 493754 has been marked as a duplicate of this bug. ***

so thats my bug, but does seem different than OP (i found this, but didnt seem to be 100% identical, hence raising my own).

i dont see it during window dragging every other freeze has resulted in no useful logs to report it. It's almost exclusively around sleep/wake time for me. This crash however, resulted in the journalctl log i posted in 493754.

Framework 13 laptop i5-1240P (intel iGPU), the issue has only occured when using external display.

REISUB has no recovery for me either. keyboard is completely locked up with even capslock LED frozen in its current state.
Comment 12 Tim D 2024-10-02 17:40:58 UTC
Created attachment 174329 [details]
Backtrace

Got a back trace. Not sure how good it is for you, as I've had to do it via juicessh on phone.

The second run had kwin-wayland debuginfo installed. In the hope it shows more.

Crash was shortly after a resume from sleep, laptop only (no charge/dock/external display or peripherals connected)

Super rare for me to have a window move crash , so hopefully this helps.
Comment 13 Tim D 2024-10-02 18:04:34 UTC
(In reply to Tim D from comment #12)
> Created attachment 174329 [details]
> Backtrace
> 
> Got a back trace. Not sure how good it is for you, as I've had to do it via
> juicessh on phone.
> 
> The second run had kwin-wayland debuginfo installed. In the hope it shows
> more.
> 
> Crash was shortly after a resume from sleep, laptop only (no
> charge/dock/external display or peripherals connected)
> 
> Super rare for me to have a window move crash , so hopefully this helps.

to add to this, the laptop was connected to dock on external screen when i put laptop to sleep last night. Disconnected the TB3 cable to dock while in sleep and a number of hours later, wake laptop up. first move of a window resulted in the crash.

Regarding Comment11 above, i guess the sysRq+REISUB method i'm doing wrong, and SSHd cfg was disabled from external connections so when i've tried in the past, i was unable to login. Fixed the SSH listener last night - just in time to get the required backtrace!!
Comment 14 Zamundaaa 2024-10-02 22:37:22 UTC
It's very helpful. The backtrace suggests that KWin hangs in the kernel, trying to allocate a new buffer for some kwin-internal window.  I assume that's a kernel bug; could you please report it at https://gitlab.freedesktop.org/drm/i915/kernel/-/issues?

Lassi Väätämöinen, I see you're on AMD, so your issue is likely triggered by something else. Please get a backtrace as well
Comment 15 Tim D 2024-10-06 19:09:45 UTC
(In reply to Zamundaaa from comment #14)
> It's very helpful. The backtrace suggests that KWin hangs in the kernel,
> trying to allocate a new buffer for some kwin-internal window.  I assume
> that's a kernel bug; could you please report it at
> https://gitlab.freedesktop.org/drm/i915/kernel/-/issues?
> 

report raised (for Intel hardware)! Thanks for taking a look, i'd be lost without any help!
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12341
Comment 16 postix 2024-10-17 12:24:58 UTC
I guess https://invent.kde.org/plasma/kwin/-/merge_requests/6650 applies for this bug? :)
Comment 17 postix 2024-10-17 21:30:17 UTC
Just experienced the same observed result as the OT, but the Steps to Reproduce here don't work for me, it just happens randomly.
Last time while watching a YT video in a maximized Firefox 131 window for instance.

Also switching TTYs hasn't worked anymore: No more input was accepted and I had to hard reset the laptop.


Operating System: Fedora Linux 40
KDE Plasma Version: 6.2.0
KDE Frameworks Version: 6.7.0
Qt Version: 6.7.2
Kernel Version: 6.11.3-200.fc40.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 PRO 5850U with Radeon Graphics
Comment 18 Zamundaaa 2024-10-17 21:43:09 UTC
(In reply to postix from comment #16)
> I guess https://invent.kde.org/plasma/kwin/-/merge_requests/6650 applies for
> this bug? :)
It'll help for GPU hotunplugs, not for actual pageflip timeouts that happen for other reasons. Those are kernel bugs that compositors can't recover from.

(In reply to postix from comment #17)
> Also switching TTYs hasn't worked anymore: No more input was accepted and I
> had to hard reset the laptop.
That can either mean that KWin's main thread hung up, or a kernel bug. You can find out which it is by looking at the kernel logs and the backtrace for KWin.
Comment 19 postix 2024-10-18 18:36:24 UTC
Created attachment 174997 [details]
Full backtrace with AMD gpu and Plasma 6.2.1 and Kernel 6.11.3

Ad comment 17

dmesg
```
[12299.273568] amdgpu 0000:06:00.0: [drm] Mode Validation Warning: Unknown Status failed validation.
[12309.482092] amdgpu 0000:06:00.0: [drm] *ERROR* [CRTC:73:crtc-0] flip_done timed out
[12312.044146] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:73:crtc-0] hw_done or flip_done timed out
```

journalctl
```
20:13:39 kwin_wayland_wrapper[2767]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough!
20:13:39 kwin_wayland_wrapper[2767]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough!
20:13:39 kwin_wayland_wrapper[2767]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough!
20:13:39 kwin_wayland[2663]: Connecting to deprecated signal QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString)
20:13:39 kwin_wayland_wrapper[2767]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough!
(...)
20:17:01 kwin_wayland[2663]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
20:17:07 kwin_wayland[2663]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
20:17:12 kwin_wayland[2663]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
20:17:17 kwin_wayland[2663]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
20:17:22 kwin_wayland[2663]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
(...)
```

gdb bt
```
#0  0x00007ff295f1cdb0 in __GI_ppoll (fds=fds@entry=0x55b43085d9b0, nfds=nfds@entry=13, timeout=<optimized out>, timeout@entry=0x7ffcd3ecba30, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#1  0x00007ff296751044 in ppoll (__fds=<optimized out>, __nfds=<optimized out>, __timeout=<optimized out>, __ss=<optimized out>) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/kernel/qcore_unix.cpp:100
#2  qt_ppoll (fds=0x55b43085d9b0, nfds=13, timeout_ts=0x7ffcd3ecba30) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/kernel/qcore_unix.cpp:103
#3  qt_ppoll (fds=0x55b43085d9b0, nfds=13, timeout_ts=0x7ffcd3ecba30) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/kernel/qcore_unix.cpp:100
#4  qt_safe_poll (fds=0x55b43085d9b0, nfds=nfds@entry=13, deadline=...) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/kernel/qcore_unix.cpp:135
#5  0x00007ff296756c8e in QEventDispatcherUNIX::processEvents (this=<optimized out>, flags=...) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/tools/qarraydatapointer.h:119
#6  0x00007ff297363492 in QUnixEventDispatcherQPA::processEvents (this=<optimized out>, flags=...) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/gui/platform/unix/qunixeventdispatcher.cpp:27
#7  0x00007ff2965a3bc3 in QEventLoop::exec (this=this@entry=0x7ffcd3ecbc00, flags=..., flags@entry=...) at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/global/qflags.h:34
#8  0x00007ff29659fa7c in QCoreApplication::exec () at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/corelib/global/qflags.h:74
#9  0x00007ff296dd66ed in QGuiApplication::exec () at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/gui/kernel/qguiapplication.cpp:1926
#10 0x00007ff29798b189 in QApplication::exec () at /usr/src/debug/qt6-qtbase-6.7.2-6.fc40.x86_64/src/widgets/kernel/qapplication.cpp:2555
#11 0x000055b41387f8de in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/kwin-6.2.0-2.fc40.x86_64/src/main_wayland.cpp:634
```
Comment 20 postix 2024-10-18 18:47:10 UTC
The dmesg output got two lines more short time after:

```
[12309.482092] amdgpu 0000:06:00.0: [drm] *ERROR* [CRTC:73:crtc-0] flip_done timed out
[12312.044146] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:73:crtc-0] hw_done or flip_done timed out
[13742.052733] amdgpu 0000:06:00.0: [drm] *ERROR* flip_done timed out
[13742.052747] amdgpu 0000:06:00.0: [drm] *ERROR* [CRTC:73:crtc-0] commit wait timed out
[13752.292701] amdgpu 0000:06:00.0: [drm] *ERROR* flip_done timed out
[13752.292715] amdgpu 0000:06:00.0: [drm] *ERROR* [PLANE:58:plane-3] commit wait timed out
[13762.532627] amdgpu 0000:06:00.0: [drm] *ERROR* flip_done timed out
[13762.532641] amdgpu 0000:06:00.0: [drm] *ERROR* [PLANE:70:plane-5] commit wait timed out
```
Comment 21 postix 2024-10-19 10:34:09 UTC
Regarding freezes on AMD GPUs and pageflip time outs, see also:
> [amdgpu]: random freezes with flip_done timed out on kernel 6.6.0
* https://gitlab.freedesktop.org/drm/amd/-/issues/2950
Comment 22 Zamundaaa 2024-10-23 12:38:41 UTC
> *ERROR* flip_done timed out
Yep, this is a kernel bug. Please add your info to the amd issue, or make a new one in that repository
Comment 23 postix 2024-11-08 16:10:16 UTC
(In reply to Zamundaaa from comment #22)
> > *ERROR* flip_done timed out
> Yep, this is a kernel bug. Please add your info to the amd issue, or make a
> new one in that repository

Can no longer reproduce it with Kernel 6.11.6 and Plasma 6.2.3. :)