Summary: | kwin_wayland crashed sometimes in KWin::DrmGpu::presentationClock after Plasma dimmed/went black automatically and was resumed | ||
---|---|---|---|
Product: | [Plasma] kwin | Reporter: | Matt Fagnani <matt.fagnani> |
Component: | generic-crash | Assignee: | KWin default assignee <kwin-bugs-null> |
Status: | RESOLVED DUPLICATE | ||
Severity: | crash | CC: | kde, kdedev, postix |
Priority: | NOR | Keywords: | drkonqi |
Version First Reported In: | 6.1.90 | ||
Target Milestone: | --- | ||
Platform: | Fedora RPMs | ||
OS: | Linux | ||
See Also: | https://bugs.kde.org/show_bug.cgi?id=493277 | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | https://crash-reports.kde.org/organizations/kde/issues/53755 |
Description
Matt Fagnani
2024-10-03 15:58:48 UTC
coredumpctl gdb showed that in KWin::DrmGpu::presentationClock the pointers this and m_presentationClock appeared to be invalid or corrupted since dereferencing them resulted in errors like "Cannot access memory at address 0xcb71386c291b0381" and similarly in KWin::DrmGpu::pageFlipHandler for the pointer gpu in frame 5. The pageflip timeouts might've been related to such problems. Core was generated by `/usr/bin/kwin_wayland --wayland-fd 7 --socket wayland-0 --xwayland-fd 8 --xwayl'. Program terminated with signal SIGSEGV, Segmentation fault. --Type <RET> for more, q to quit, c to continue without paging--c #0 0x00007f9dd7280944 in __pthread_kill_implementation () from /lib64/libc.so.6 [Current thread is 1 (Thread 0x7f9dd9b8bb80 (LWP 2147))] (gdb) bt #0 0x00007f9dd7280944 in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x00007f9dd722825e in raise () from /lib64/libc.so.6 #2 0x00007f9ddaeaa1a2 in KCrash::defaultCrashHandler(int) () from /lib64/libKF6Crash.so.6 #3 <signal handler called> #4 KWin::DrmGpu::presentationClock (this=0xcb71386c291b0381) at /usr/src/debug/kwin-6.1.90-4.fc42.x86_64/src/backends/drm/drm_gpu.cpp:131 #5 KWin::DrmGpu::pageFlipHandler (fd=22, sequence=0, sec=2683, usec=631215, crtc_id=36, user_data=0x7f9d9800a470) at /usr/src/debug/kwin-6.1.90-4.fc42.x86_64/src/backends/drm/drm_gpu.cpp:566 #6 0x00007f9dd7739580 in drmHandleEvent (fd=22, evctx=0x7fffbef87cc0) at ../xf86drmMode.c:1070 #7 0x00007f9dda8f0b8c in KWin::DrmGpu::dispatchEvents (this=<optimized out>) at /usr/src/debug/kwin-6.1.90-4.fc42.x86_64/src/backends/drm/drm_gpu.cpp:581 #8 0x00007f9dd794c872 in void doActivate<false>(QObject*, int, void**) () from /lib64/libQt6Core.so.6 #9 0x00007f9dd795a44d in QSocketNotifier::activated(QSocketDescriptor, QSocketNotifier::Type, QSocketNotifier::QPrivateSignal) () from /lib64/libQt6Core.so.6 #10 0x00007f9dd795ac5b in QSocketNotifier::event(QEvent*) () from /lib64/libQt6Core.so.6 #11 0x00007f9dd8c3d218 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /lib64/libQt6Widgets.so.6 #12 0x00007f9dd78e6e08 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () from /lib64/libQt6Core.so.6 #13 0x00007f9dd7aa68c6 in QEventDispatcherUNIXPrivate::activateSocketNotifiers() () from /lib64/libQt6Core.so.6 #14 0x00007f9dd7aa71d4 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /lib64/libQt6Core.so.6 #15 0x00007f9dd8656492 in QUnixEventDispatcherQPA::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /lib64/libQt6Gui.so.6 #16 0x00007f9dd78f3b43 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /lib64/libQt6Core.so.6 #17 0x00007f9dd78ef9fc in QCoreApplication::exec() () from /lib64/libQt6Core.so.6 #18 0x0000564cb5f588de in main () (gdb) frame 4 #4 KWin::DrmGpu::presentationClock (this=0xcb71386c291b0381) at /usr/src/debug/kwin-6.1.90-4.fc42.x86_64/src/backends/drm/drm_gpu.cpp:131 131 return m_presentationClock; (gdb) p this $1 = (const KWin::DrmGpu * const) 0xcb71386c291b0381 (gdb) p *this Cannot access memory at address 0xcb71386c291b0381 (gdb) p m_presentationClock Cannot access memory at address 0xcb71386c291b03ad (gdb) frame 5 #5 KWin::DrmGpu::pageFlipHandler (fd=22, sequence=0, sec=2683, usec=631215, crtc_id=36, user_data=0x7f9d9800a470) at /usr/src/debug/kwin-6.1.90-4.fc42.x86_64/src/backends/drm/drm_gpu.cpp:566 566 std::chrono::nanoseconds timestamp = convertTimestamp(gpu->presentationClock(), CLOCK_MONOTONIC, (gdb) p gpu $3 = (KWin::DrmGpu * const) 0xcb71386c291b0381 (gdb) p *gpu Cannot access memory at address 0xcb71386c291b0381 The idle state I mentioned involved a dimming or shutting off of the VM screen with a lock screen possibly briefly shown. In Power Management in System Settings, the Dim automatically setting was 5 minutes and Turn off screen was 10 minutes by default. The VM's screen was dimmer than usual after kwin crashed. Does it happen often enough? (In reply to Vlad Zahorodnii from comment #2) > Does it happen often enough? I saw this problem 1/4 times I let the VM's screen go black by leaving it idle. The problem might involve a race condition. The gpu pointer in KWin::DrmGpu::pageFlipHandler might've been corrupted or freed due to the pageflip timeouts then used in KWin::DrmGpu::presentationClock. Should I report this problem to Kernel Bugzilla or some other kernel mailing list? Thanks. I had set the Dim automatically to 30 seconds and Turn off screen to 1 minute in Power Management in System Settings before the times when the problem didn't happen. Lock screen automatically was 5 minutes as default in Screen Locking in System Settings. I reproduced the crash with Plasma 6.2.0 by setting Dim automatically to 30 seconds and Turn off screen to 1 minute in Power Management in System Settings and Lock screen automatically to 1 minute and leaving the VM idle for a minute. The VM screen shut off, I moved the mouse and the lock screen was shown for about a second, then Plasma appeared with the drkonqi showing the kwin crash with the same kind of trace. So the lock screen might need to be shown after the screen shut off for the problem to happen. Okay, I will try to reproduce it by following those steps I can't reproduce the crash, unfortunately From the sentry bug report, it's interesting to see "kwin_wayland_drm: atomic commit failed: Invalid argument" in the logs. (In reply to Vlad Zahorodnii from comment #7) > From the sentry bug report, it's interesting to see "kwin_wayland_drm: > atomic commit failed: Invalid argument" in the logs. I reproduced the problem again with Plasma 6.2.0 with Lock screen set to 1 minute and Shut off screen set to 1 minute and submitted with drkonqi as an automatic report which is what you might have referred to. The journal in the 1-2 m before the four crashes of this type showed "kwin_wayland_wrapper[3539]: waiting got error - 16, slow gpu or hang?" then "kwin_wayland_drm: Pageflip timed out! This is a kernel bug" repeatedly with "kwin_wayland_drm: No drm events for gpu "/dev/dri/card1" within last 30 seconds" once. "kwin_wayland_drm: atomic commit failed: Invalid argument" was the last error before the crash in that automatic report I made, but it wasn't shown before the other three crashes. The pageflips timeouts and gpu errors might need to happen enough times when the screen turned off and on and the lock screen appeared. Then the gpu pointer in KWin::DrmGpu::pageFlipHandler might've been invalid so that when it was dereferenced by std::chrono::nanoseconds timestamp = convertTimestamp(gpu->presentationClock(), CLOCK_MONOTONIC, ... in it, the crash happened. The lock screen sometimes wasn't shown after the screen shut off and turned on again when I set Lock screen set to 1 minute and Shut off screen set to 1 minute, so for reproducing the problem might be more likely if one sets the lock screen time to be shorter than the Shut off screen time as is default so that the lock screen definitely has time to start first. So I set Lock screen set to 1 minute and Shut off screen set to 2 minutes, and reproduced the problem as before. I submitted an automatic report for that crash also. I didn't see this crash in VMs with 3D acceleration disabled using the llvmpipe mesa driver and virtio-gpu kernel driver, or on bare metal with radeonsi mesa driver and amdgpu kernel driver. So the problem might be specific to the virgl and virtio-gpu combination. Thanks. I've enabled hardware acceleration too. I see no kwin warnings at all when the screen is turned off. I do notice that the screen is not turned off completely though, i.e. sometimes it's dim after 2 minutes (In reply to Vlad Zahorodnii from comment #10) > I do notice that the screen is not turned off completely though, i.e. > sometimes it's dim after 2 minutes I saw the kwin crashes about 30-50% of the time after the VM screen turned off and there were messages like Display not connected on a black screen in GNOME Boxes, then I moved the mouse over the VM screen and the lock screen was briefly shown and Plasma was shown with drkonqi handling the kwin crash. If the VM's screen didn't shut off completely like that, the problem didn't usually happen. I reproduced the crash again and "kwin_wayland_drm: atomic commit failed: Invalid argument" was the last error before the crash. I'll submit the automatic report. There were lock screen journal errors like "kscreenlocker_greet[4594]: Failed to write to the pipe: Bad file descriptor." at times. The screen remained dim incorrectly after the kwin crashes. Given that the kwin and gpu errors/warnings in the journal started appearing 1-2 min before the crashes, they might be related to the automatic dimming which I had set to 30 s for the last 3 kwin crashes I saw. Sometimes the kwin and gpu journal errors were shown, but kwin didn't crash. I think the virgl driver passes OpenGL calls to the GPU, so there might be differences based on the GPU and its drivers. My system has an AMD A10-9620P CPU with integrated Radeon R5 GPU made in 2017. Thanks. The VM's screen might've been more likely to shut off automatically when I minimized GNOME Boxes, then left it idle at least for the Turn off screen time in System Settings, maximized it again, and moved the cursor over its screen. The kwin crash might be more frequent when doing that. I tested on git-master Wayland using the settings in comment 4: Dim automatically to 30 seconds Turn off screen to 1 minute in Power Management in System Settings Lock screen automatically to 1 minute and leaving the system idle for a minute. I let the screen dim and then shut off quite a few times over several hours but I wasn't able to reproduce the crash. *** This bug has been marked as a duplicate of bug 496015 *** |