Bug 493157 - Screen stuck flipping between frames
Summary: Screen stuck flipping between frames
Status: RESOLVED UPSTREAM
Alias: None
Product: kwin
Classification: Plasma
Component: wayland-generic (show other bugs)
Version: 6.1.90
Platform: Other Linux
: NOR crash
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-09-15 12:34 UTC by Jonathan L Hanmann
Modified: 2024-10-23 14:57 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Requested journalctl output (8.98 KB, text/plain)
2024-09-17 06:56 UTC, Jonathan L Hanmann
Details
Additional log from another freeze (19.21 KB, text/plain)
2024-09-17 16:13 UTC, Jonathan L Hanmann
Details
Wayland Crash/Lockup Journal Output (11.96 KB, text/x-log)
2024-10-04 17:09 UTC, Jonathan L Hanmann
Details
X11 Crash/Lockup journalctl output (24.24 KB, text/x-log)
2024-10-04 17:12 UTC, Jonathan L Hanmann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan L Hanmann 2024-09-15 12:34:44 UTC
SUMMARY

Screen appears to be stuck while system continues to operate in the background. Since screen drawing is dead/locked continued operation of the applications is not visible. I can switch to other system consoles (text consoles) using Ctrl-Alt-F3-Fx and they are still functional. I can kill the Plasma session by finding the started session and killing it. That immediately drops me back to the SDDM login screen. I can login to another sessions it it will work for a while until the problem recurs. This only happens with Wayland. X11 sessions appear to work without ever locking up.

Applications running when the screen freeze occurs run normally to completion if I wait for them. I simply can't see what is going on.

I thought this might be due to the new triple buffering feature so I tried disabling that via putting KWIN_DRM_DISABLE_TRIPLE_BUFFERING=1 in /etc/environment. That appeared to have no effect on the problem.

TTF is somewhat random. I can sometimes operate for hours while other times it occurs within minutes. I've waited for the system to recover but after several minutes it remains in the same non-function state.

No messages appear in the systemctl journal or the dmesg log when the failure occurs.

No triggering cause seems apparent. My actitivites when the screen freeze occurs is random.

OBSERVED RESULT

Screen freezes. Sometimes it appears to flip between two (or perhaps a few more) frames rather than being just frozen on a single frame.

EXPECTED RESULT


SOFTWARE/OS VERSIONS
Windows: 
macOS: 
(available in the Info Center app, or by running `kinfo` in a terminal window)
Linux/KDE Plasma:  Kernel for Rock-5b (RK3588) 6.1.75 Armbian Vendor Kernel
KDE Plasma Version: 6.2 Beta (Latest from GIT as of 6/13/2024)
KDE Frameworks Version: 6.7.0 (Latest from GIT as of 6/13/2024)
Qt Version: 6.7.2

ADDITIONAL INFORMATION
Comment 1 Zamundaaa 2024-09-16 19:48:23 UTC
Please attach the output of
> journalctl --user-unit plasma-kwin_wayland --boot 0
after the problem happens
Comment 2 Jonathan L Hanmann 2024-09-17 06:56:22 UTC
Created attachment 173753 [details]
Requested journalctl output

Took a fairly long time to reproduce this time of course. Ran for several hours before freezing up.
Comment 3 Jonathan L Hanmann 2024-09-17 16:13:49 UTC
Created attachment 173776 [details]
Additional log from another freeze

Freeze occurred around the Sep 17: 08:45-08:46 timestamp.
Comment 4 Zamundaaa 2024-09-24 12:48:47 UTC
okay, there's some suspicious lines in there, but nothing clearly pointing at a specific problem.
When the freeze happens again, could you get the output of
> sudo dmesg
and attach the output of that here?
Comment 5 Jonathan L Hanmann 2024-10-04 17:09:04 UTC
Created attachment 174429 [details]
Wayland Crash/Lockup Journal Output

This systemctl output is from a Plasma 6 Wayland session that locked up. There is a Panthor crash right at the time of the Plasma screen issue. I stripped out all the extraneous log information and zeroed it into the correct time and likely events related to the problem.

I expect you're going to say this is a kernel/Mesa problem and I probably couldn't disagree based on this information. Even if somehow Plasma was doing something wrong from an GL perspective it should cause this. I'll leave that up to your expertise though of course.
Comment 6 Jonathan L Hanmann 2024-10-04 17:12:11 UTC
Created attachment 174430 [details]
X11 Crash/Lockup journalctl output

This is likely the same problem occurring on an X11 session. I don't know if this will contribute anything unique from that I provided for the Wayland session but here it is anways.
Comment 7 Jonathan L Hanmann 2024-10-07 09:57:27 UTC
As some additional information, I am using Armbian kernel builds. I reverted from a kernel version 6.1.75 back to 6.1.43 and it doesn't appear to have the same screen lockup issue on X11 so far. I've run for about three days now and it seems stable.

It is a little harder for me to test Wayland since I am having some other issues that appear unique to it.

I'll continue testing and investigating and let you know if I find anything out.

I still get a lot of these message but they don't appear to cause anything fatal. It may not be related to this particular issue but just FYI.

Oct 07 02:46:51 rock-5b-3 kernel: [drm:vop2_plane_atomic_check] *ERROR* Invalid size: 64x3->64x3, min size is 4x4

or

Oct 07 02:46:51 rock-5b-3 kernel: [drm:vop2_plane_atomic_check] *ERROR* Invalid size: 64x2->64x2, min size is 4x4
Comment 8 Jonathan L Hanmann 2024-10-08 23:47:35 UTC
Another 2 days and still no crashes or other visible issues on X11 with 6.1.43 series kernel. Must be something changed in later kernel versions. Wayland still doesn't work well and Mesa overall is slower (glmark) than it should be even on X11. This seems to be something with my system but I haven't isolated the performance degradation cause yet. Different issue anyways...
Comment 9 Zamundaaa 2024-10-23 14:57:53 UTC
Yeah the plane size warning is weird - KWin never uses planes with a height that small - but the actual source of the problem seems to be
> [drm] *ERROR* Unhandled Page fault in AS1 at VA 0x00007FFFF6410000
Which is definitely a kernel bug