Summary: | amdgpu: GPU reset crash loop | ||
---|---|---|---|
Product: | [Plasma] kwin | Reporter: | Matteo De Carlo <matteo.dek> |
Component: | core | Assignee: | KWin default assignee <kwin-bugs-null> |
Status: | RESOLVED UPSTREAM | ||
Severity: | crash | CC: | agurenko, contact, ennokoester, etfaker, firlaevhans.fiete, hasezoey, kde, maxicarlos08, nate, reuben_p, scheiter, star579avatar, xaver.hugl |
Priority: | NOR | ||
Version: | 5.24.4 | ||
Target Milestone: | --- | ||
Platform: | Arch Linux | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Attachments: |
dmesg output during boot
dmesg output after GPU reset gpu reset and kwin_wayland triggering another systemd journal from startup to hardreset with GPU reset and kwin_wayland loop dmesg from startup to hardreset with GPU reset and kwin_wayland loop |
Description
Matteo De Carlo
2022-04-28 14:56:56 UTC
> Reset the gpu (accidentally, can I manually reset it to test?)
You can do it intentionally with
sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
Does the reset loop also happen if you do that?
Created attachment 153517 [details]
dmesg output during boot
Added in improbable case that somthing interesting can happen before the manual GPU reset
Created attachment 153518 [details]
dmesg output after GPU reset
Thanks for the tip. I have the same problem occasionally and cannot pin it to any special event yet. Yes I can reproduce the problem that way. Logs attached. kwin version: 5.26.2.1-4 kernel version: 6.0.6.arch1-1 Let me know if I can provide anything else. For me this started after I switched from X to Wayland 1-2 months ago. Could be a red herring though as I also update my packages semi-regularly every few weeks. *** Bug 459872 has been marked as a duplicate of this bug. *** I also have a GPU hang/reset similar to 459872 (possibly a kernel or hardware issue, https://gitlab.freedesktop.org/drm/amd/-/issues/2068). KWin cannot recover from the reset and keeps printing the following warning, followed by some graphics information: > kwin_scene_opengl: A graphics reset not attributable to the current GL context occurred. My problems went away in December. Does not happen to me anymore, so I am fine. But for completeness sake I tried to trigger the problematic behavior manually again with the gpu_recover mentioned above and sure enough kwin spun out of control again. So if any logs or similar things are needed, it looks like I can still provide them pretty easily. *** Bug 465514 has been marked as a duplicate of this bug. *** *** Bug 465514 has been marked as a duplicate of this bug. *** I've recently been getting similar KWin hangs with an Intel iGPU whenever it resets (very frequently) kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:87b2bef9, in plasmashell [2627] kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0 kernel: i915 0000:00:02.0: [drm] plasmashell[2627] context reset due to GPU hang kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.bin version 70.5.1 kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3 kernel: i915 0000:00:02.0: [drm] HuC authenticated kernel: i915 0000:00:02.0: [drm] GuC submission enabled kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled kwin_wayland[2443]: kwin_scene_opengl: A graphics reset not attributable to the current GL context occurred. kwin_wayland[2443]: OpenGL vendor string: Intel kwin_wayland[2443]: OpenGL renderer string: Mesa Intel(R) Graphics (ADL GT2) kwin_wayland[2443]: OpenGL version string: 4.6 (Core Profile) Mesa 22.3.6 kwin_wayland[2443]: OpenGL shading language version string: 4.60 kwin_wayland[2443]: Driver: Intel kwin_wayland[2443]: GPU class: Unknown kwin_wayland[2443]: OpenGL version: 4.6 kwin_wayland[2443]: GLSL version: 4.60 kwin_wayland[2443]: Mesa version: 22.3.6 kwin_wayland[2443]: X server version: 1.22.1 kwin_wayland[2443]: Linux kernel version: 6.1.15 kwin_wayland[2443]: Requires strict binding: no kwin_wayland[2443]: GLSL shaders: yes kwin_wayland[2443]: Texture NPOT support: yes kwin_wayland[2443]: Virtual Machine: no ... and then everything starting from the first "kwin_wayland" line is repeated infinitely. Created attachment 157727 [details]
gpu reset and kwin_wayland triggering another
i occasionally also have a GPU reset happening and KDE being unable to recover, but this time the recover of "kwin_wayland" triggered another GPU reset and all kwin display processes seem to get "stuck" after the second GPU reset (not trying to recover, not updating anything, but are still existing and playing audio just fine, like a bell on shutdown)
note: once in the shutdown, the terminal that quickly shows messages before reboot like "Watchdog timeout" displayed fine
PS: after the GPU reset all displays quickly recovered to show somewhat of a image (the last frame that was done) but heavily messed up color
Displays:
- 1 HDMI 1080p
- 1 DP 108p
- 1 DP 1440p MAIN
System information:
Operating System: Manjaro Linux
KDE Plasma Version: 5.26.5
KDE Frameworks Version: 5.103.0
Qt Version: 5.15.8
Kernel Version: 6.2.7-2-MANJARO (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 5800X 8-Core Processor
Memory: 15.5 GiB of RAM
Graphics Processor: AMD Radeon RX Vega
Manufacturer: ASUS
GPU: AMD Vega 64
Attached is the log of before the reset, the rest(s) themself, and slightly after the last reset
I think I am having the same issue, when running Vulkan graphics sooner or later my GPU will reset and the loop begins. When resetting the screen just shows the last state before the freeze and keeps turning on and off. System info: Operating System: Arch Linux KDE Plasma Version: 5.27.3 KDE Frameworks Version: 5.104.0 Qt Version: 5.15.8 Kernel Version: 6.2.9-zen1-1-zen (64-bit) Graphics Platform: Wayland Processors: 12 × AMD Ryzen 5 5500U with Radeon Graphics Memory: 13.5 GiB of RAM Graphics Processor: AMD Radeon Graphics Manufacturer: LENOVO Product Name: 82LN System Version: IdeaPad 5 15ALC05 Created attachment 158022 [details]
systemd journal from startup to hardreset with GPU reset and kwin_wayland loop
Created attachment 158023 [details]
dmesg from startup to hardreset with GPU reset and kwin_wayland loop
I've been experiencing this issue for some time now, I will leave some more logs in case they are needed; System Info: Operating System: Arch Linux KDE Plasma Version: 5.27.4 KDE Frameworks Version: 5.105.0 Qt Version: 5.15.8 Kernel Version: 6.3.0-rc4 (64-bit) Graphics Platform: Wayland Processors: 16 × AMD Ryzen 9 6900HS with Radeon Graphics Memory: 46,3 GiB of RAM Graphics Processor: AMD Radeon Graphics Manufacturer: ASUSTeK COMPUTER INC. Product Name: ROG Zephyrus G14 GA402RK_GA402RK System Version: 1.0 To summarize what's happening on this topic: 1. for Intel users, there's a kernel bug causing this: https://gitlab.freedesktop.org/drm/intel/-/issues/8310 2. for AMD users, there's a RadeonSi bug causing this: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290 There's still some issues in KWin that can cause it to crash when a GPU reset happens, but this bug report specifically will be fixed with driver updates. If you experience a crash when a GPU reset happens, please make a separate bug report for that with a backtrace. *** Bug 472213 has been marked as a duplicate of this bug. *** |