When there is high GPU load by games and vsync is enabled for them (which seems to be a must on Wayland to prevent stutter, either fifo or mailbox mode), latency options with lower latency than "prefer smoother animations" cause frame drops. Manually setting "prefer smoother animations" each time a game is played is rather inconvenient and it also seems to have higher input lag than on Xorg with suspended compositing and vsync done entirely by the game.
STEPS TO REPRODUCE
1. Start Plasma Wayland and leave latency option at "balanced" or set it even more aggressive in favor of lower latency.
2. Start a game with vsync that has high GPU load (e.g. super sampling via configurable resolution scale in Hitman 2) and watch fps and frame time graph via Mangohud.
There are dropped frames and thus stuttering/reduced performance with the aforementioned latency options, whereas "prefer smoother animations" shows normal performance (with higher input lag than on Xorg with suspended compositing though):
Ideally, games on Wayland in fullscreen should work as good as on Xorg with automated suspending of compositing, i.e. without any frame drops and additional input lag by compositor vsync.
Linux/KDE Plasma: Arch 5.10
KDE Plasma Version: kwin git-master v5.19.90.r466.g2d1994e06
With sway, latency for games in fullscreen seems to be lower than with kwin's "prefer smoother animations" and there are no frame drops.
(In reply to tempel.julian from comment #1)
> With sway, latency for games in fullscreen seems to be lower than with
> kwin's "prefer smoother animations" and there are no frame drops.
What about windowed mode?
(In reply to Vlad Zahorodnii from comment #2)
> What about windowed mode?
With sway, latency seems to be higher in windowed mode (probably due to direct scanout in fullscreen?).
With kwin, fullscreen and windowed feel equally laggy or stuttery, depending on the latency option, even though I patched the direct scanout MR into my kwin build.
I btw. accidentally tested with amdvlk instead of mesa drivers. With mesa, the frame time graph looks better with lower latency options, though presentation on screen is still negatively affected in terms of stutter.
With super-sampling, I see no difference in the frame rate when the latency policy is set to the lowest and the highest value. I've also traced kwin to check if some frames are not rendered on time, but it seems like that's not the case, at least on my machine, I also have Radeon RX 5700 XT.
Created attachment 134988 [details]
lowest latency + no super-sampling
Even with the lowest latency policy, the frame rate is at 144Hz, in some parts it drops to lower values, but usually it stays around 144Hz.
Created attachment 134989 [details]
lowest latency + super-sampling
The GPU load increases dramatically, as expected, and the frame rate is usually somewhere between 80-90fps.
Created attachment 134990 [details]
smoothest animations + super-sampling
Choosing the extremely high latency doesn't improve the frame rate (on my machine). Similar to the extremely low latency policy, the frame rate is usually between 80-90fps.
Created attachment 134992 [details]
hitman 2 running on sway
Things look no different when Hitman 2 is running on sway.
@tempel.julian Can you provide the specs of your machine? Also, what kernel and drivers do you use?
First: Thanks a lot for your efforts trying to reproduce!
I've re-tested and re-confirmed the issue with
-RX 5700 XT on linux 5.10.8 from stable Arch repo
-radv mesa-git 21.0.0_devel.133381.5331b1d9456
-regular Proton 5.13-5
-1440p 60Hz (I suggest limiting the display to 60Hz, as it makes vsync stutter easier to perceive.)
-full refresh rate vsync enabled in the game (it's fifo)
I recommend setting
dxgi.maxFrameLatency = 1
dxgi.numBackBuffers = 3
in "steamapps/common/HITMAN2/Retail/dxvk.conf". The former option reduces rendering frame time variance (independently from vsync), whereas the latter ensures correct fifo triple buffer vsync behavior when fps are below refresh rate (unless amdvlk is used instead of radv :( ). When you run the game on Xorg with compositing suspended, the frame time graph of mangohud should now be completely flat (both when vsync caps fps and below that cap) and it should look as smooth as it can.
It seems that it might be random whether the Wayland stutter issue makes the frame time graph look "aliased" or flat, however it looks stuttery anyway.
Does it really look as smooth for you @60Hz with high latency reduction vs. lower settings/Xorg?
The latency option won't work on X11 if compositing is disabled, but I'll try to reproduce the stuttering issues with the aforementioned dxgi options. As for `dxgi.maxFrameLatency`, I suggest to crank it up a little bit. If some frame is running late, it's better to start painting a new frame anyway. It will cost you some latency but the frame rate should be more stable.
Created attachment 134995 [details]
bad performance with lower latency
Created attachment 134996 [details]
good performance with prefer smoother animations
RadeonSI and AMD Windows D3D11 driver force CPU prerender limit of 1 by default, I haven't been able yet to link any issue to it in years of usage. :)
I've attached two more screenshots that show the issue with RADV, fifo vsync and the aforementioned tweaks.
Btw: It seems any setting change in the kwin kcm add
to my kwinrc.
What's the refresh rate of your monitor? Can you post the output of `qdbus org.kde.KWin /KWin supportInformation`?
It's 59.95Hz with default EDID or 85Hz with overclock. Both refresh rates seem to be affected equally.
Created attachment 134998 [details]
TIL it's possible to overclock a monitor.
Based on the support information you provided, kwin assumes that your monitor has a refresh rate of 60Hz. I suspect this is the problem.
Can you provide the output of `kscreen-doctor -o`? Also, if you un-overclock your monitor, does this issue still persist?
The above qdbus information was without custom edid active, so only the default 60Hz is available as refresh rate. That's what I meant with that it also happens with 60Hz (default edid), so independently from any OC or other kinds of tampering.
My layman impression is that the issue is due to the latency option/kwin's swapchain handling not leaving game output "untouched", as far as this is possible on Wayland. But for whatever reason this works better on sway (but btw. not on Gnome either, it stutters as well).
Created attachment 135003 [details]
Yeah, with 60Hz, the frame rate issues are more noticeable. The problem is that kwin starts compositing on time and it finishes recording rendering commands before the next vblank, but it takes a while for those commands to be executed on GPU.
I guess we need to insert opengl queries after all to measure how long it takes to render a frame on the GPU side.
(In reply to tempel.julian from comment #19)
> My layman impression is that the issue is due to the latency option/kwin's
> swapchain handling not leaving game output "untouched", as far as this is
> possible on Wayland. But for whatever reason this works better on sway (but
> btw. not on Gnome either, it stutters as well).
I wonder why this works better on sway. Did you set the `max_render_time` option?
(In reply to Vlad Zahorodnii from comment #22)
> I wonder why this works better on sway. Did you set the `max_render_time`
Glad you can confirm with kwin.
I didn't change sway's rendering parameters. Of course subjective observations are always prone to human error, but it looked normally smooth to me.
Ah, I see. If the max_render_time option is not set, latency will be high if I recall correctly. Can you set the max_render_time option to 3 or 4?
For more details about the max_render_time option, you could refer to sway-output(5).
With 85Hz, I need to go as high as 12 (oddly close to the frame time?) to guarantee that the frame time graph is always flat. 11 seems to be mostly good enough, yet at times there still seem to be peaks in the graph. 10 definitely is too low, it causes double buffering-like behavior.
Still 12 still seems to feel a little more direct than the default/unlimited value, but this is really getting close to placebo self-tricking.
However, I was quite sure that even the default setting felt more direct than kwin with "prefer smoother animations".
Yeah, my point was that it's not fair to compare kwin with the lowest latency preset against sway with high latency on. For what it's worth, the lowest latency option is equivalent to setting the max_render_time option to a value between 3 and 5 (assuming that the refresh rate of your monitor is 60Hz).
Perhaps latency reduction can set itself automatically to a less aggressive value once direct scanout turns active? So one wouldn't have to switch latency values for playing games and regular desktop usage.
However, the current latency of "prefer smoother animations" wouldn't be to great either as a lasting solution, as I wouldn't want to play my competitive moba game with the increased lag on Wayland vs. Xorg with suspended compositing. :(
Unless you're testing my MR there is no direct scanout happening yet. It could even improve the situation a bit (as no rendering on KWins side is happening).
I had patched direct scanout into my local build of kwin, yes. It didn't made things noticeably better (or worse) though, latency in games was still higher with "prefer smoother animations" vs. Xorg with suspended compositing.
Not saying direct scanout didn't lower latency, but if it did, it wasn't enough for me to notice. So perhaps the largest share of latency is caused by presentation delays and not rendering?
Turns out you haven't actually been using direct scanout because of what I assume to be a Wine bug. See (and test) https://invent.kde.org/plasma/kwin/-/merge_requests/728
As discussed on Gitlab, !728 indeed fixes the issue for (now also Wine) fullscreen. :)
A remaining issue would be to achieve better frame presentation for windowed gaming (or perhaps some other 3D workload) too. While it naturally can't beat fullscreen with direct scanout, it'd still be nice if it was more bearable under high GPU load in windowed mode too. Though I can't judge if it makes sense at this point to put efforts into it, which in return probably means less time spent on improvements for other important areas.
This should be fixed in Plasma 6 with https://invent.kde.org/plasma/kwin/-/merge_requests/4293 and https://invent.kde.org/plasma/kwin/-/merge_requests/4397