Summary: | The current Wayland GPU recovery experience (AMD) is not ideal with AMS disabled | ||
---|---|---|---|
Product: | [Plasma] kwin | Reporter: | fililip <team> |
Component: | wayland-generic | Assignee: | KWin default assignee <kwin-bugs-null> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | agurenko, kde, nate, xaver.hugl |
Priority: | NOR | Keywords: | qt6 |
Version First Reported In: | 5.92.0 | ||
Target Milestone: | --- | ||
Platform: | Arch Linux | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
fililip
2024-01-15 13:50:32 UTC
Don't know if it's related to the issue (sorry if it isn't), but I tried applying this MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27097 on top of Mesa 24.0.0-rc1 (and the linked patches against Linux 6.7) and resetting the GPU. Now it looks completely broken; the gfx ring keeps soft recovering (with one display; with two the whole machine is frozen and even SSH doesn't work) but KWin keeps reset-looping the GPU. The entire session is unusable and requires SIGKILLing KWin a bunch of times to return to TTY. GPU reset handling is something that's very WIP throughout the stack as you saw from the fact that we have so many pending requests. The fact that we get a second reset implies things are lower in the stack. This feels like an upstream problem so far. Would it be better if kwin exited after N resets? > GPU reset handling is something that's very WIP throughout the stack By the stack do you mean kwin, upstream, or both? I thought kwin already had GPU recovery support for Wayland, I might be wrong though (unless what's currently present is experimental and that's why it's so hit-or-miss). What is the state of Wayland GPU recovery on other vendors' GPUs though? Does Intel work better? (asking out of curiosity) > Would it be better if kwin exited after N resets? Perhaps, if they happened way too close (time interval wise) to one another. KWin has very good GPU reset handling, which I've tested a lot, both voluntarily and involuntarily (amdgpu's been way too reset happy the last two weeks or so). The problem is further up in the stack; specifically amdgpu isn't too great at GPU resets, and Mesa had wrong spec interpretation for this until recently as well.
> What is the state of Wayland GPU recovery on other vendors' GPUs though? Does Intel work better?
It is better with both Intel and NVidia, as they only reset the affected app most times. amdgpu will gain the ability to do the same soon though.
Oh, I noticed one important thing - I was using the legacy DRM API (KWIN_DRM_NO_AMS=1) for tearing support. After disabling it, on latest Mesa 24.1 (dev) stuff works fine even on Plasma 5. Sorry for the trouble.
> amdgpu will gain the ability to do the same soon though
That's amazing news! Thank you for your continued effort.
Thanks! Should we keep this open to improve the non-AMS experience, or say that if you want a good experience, you just need to have AMD enabled? Personally, I don't think there's a point to maintaining legacy modesetting (even if there's still no tearing support for atomic), unless it's actually used on some devices (it's great for my non-VRR laptop; with a 60Hz display I can handle the broken recovery mechanism since tearing is much better than stuttering in my opinion). The reason I had it on in the first place was to test a bunch of games with both VRR and tearing on to see what frame rate limit I should set to avoid sporadic frametime jumps below 6.06ms which induced tearing/stuttering. After I was happy with 160 FPS, I forgot to unset the environment variable and that's how I got the issue. All right, thanks! |