Bug 459872 - Session freezes entirely and doesn't recover after AMD GPU reset caused by VAAPI
Summary: Session freezes entirely and doesn't recover after AMD GPU reset caused by VAAPI
Status: RESOLVED DUPLICATE of bug 453147
Alias: None
Product: kwin
Classification: Plasma
Component: platform-drm (show other bugs)
Version: 5.25.5
Platform: Fedora RPMs Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords: wayland
Depends on:
Blocks:
 
Reported: 2022-09-30 19:51 UTC by Firlaev-Hans
Modified: 2023-06-04 20:19 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Excerpt from system journal (8.98 KB, text/plain)
2022-09-30 19:51 UTC, Firlaev-Hans
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Firlaev-Hans 2022-09-30 19:51:39 UTC
Created attachment 152525 [details]
Excerpt from system journal

SUMMARY
Every now and then, VAAPI video decoding (in Firefox, in particular) triggers a GPU reset on my AMD iGPU.
Whenever that happens, the screen goes black for a second and then comes back but is entirely frozen. Sometimes I'm still able to switch to a tty, sometimes not. But in any case, the Plasma session never recovers.

STEPS TO REPRODUCE
1. Have a GPU reset trigger somehow

OBSERVED RESULT
KWin freezes entirely (but doesn't crash?)

EXPECTED RESULT
KWin should be able to recover from the crash.

SOFTWARE/OS VERSIONS
Operating System: Fedora Linux 36
KDE Plasma Version: 5.25.5
KDE Frameworks Version: 5.98.0
Qt Version: 5.15.5
Kernel Version: 5.19.11-200.fc36.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 4 × AMD Ryzen 3 3200U with Radeon Vega Mobile Gfx
Memory: 6.7 GiB of RAM
Graphics Processor: AMD Radeon Vega 3 Graphics

ADDITIONAL INFORMATION
I have attached an excerpt of the system journal from the time of the GPU reset.
It never shows any indication that KWin crashed or whatever.
The GPU resets successfully, and Plasmashell detects it and claims to restart its GPU process.
KWin never explicitly says anything about the reset at all, but for some reason it continuously prints OpenGL information the the journal several times a second, for about a minute until I switch to a TTY and then it stops, but continues once I try to switch back.
Comment 1 Vlad Zahorodnii 2022-10-03 08:57:25 UTC
It looks like kwin and plasmashell are stuck thinking that the OpenGL context has been lost. Do you know how to reliably trigger a gpu reset? Would reading /sys/kernel/debug/dri/0/amdgpu_gpu_reset work? or does vaapi do something different?
Comment 2 Firlaev-Hans 2022-10-03 15:17:03 UTC
(In reply to Vlad Zahorodnii from comment #1)
> It looks like kwin and plasmashell are stuck thinking that the OpenGL
> context has been lost. Do you know how to reliably trigger a gpu reset?
> Would reading /sys/kernel/debug/dri/0/amdgpu_gpu_reset work? or does vaapi
> do something different?

Running
> sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
does in fact seem to cause the exact same symptoms.
Comment 3 Zamundaaa 2022-10-13 19:43:50 UTC
According to some debugging we did, there doesn't appear to be anything KWin does wrong, it's a driver issue. See https://gitlab.freedesktop.org/mesa/mesa/-/issues/7460
Comment 4 Nate Graham 2022-10-17 22:20:40 UTC
In https://gitlab.freedesktop.org/mesa/mesa/-/issues/7460#note_1595216, Mesa folks say we need to handle it; re-opening.
Comment 5 Vlad Zahorodnii 2023-01-06 09:36:14 UTC

*** This bug has been marked as a duplicate of bug 453147 ***