Bug 431707

Summary: setting new latency option to balance or other options even more in favor of latency causes frame drops in gaming on Wayland
Product: [Plasma] kwin Reporter: tempel.julian
Component: compositingAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: bugs.kde.org.facelift226, katyaberezyaka, kdebugs, nate, rainer, tempel.julian, xaver.hugl
Priority: NOR    
Version: git master   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: lowest latency + no super-sampling
lowest latency + super-sampling
smoothest animations + super-sampling
hitman 2 running on sway
bad performance with lower latency
good performance with prefer smoother animations
kwininfo log
kscreen log

Description tempel.julian 2021-01-16 20:07:44 UTC
SUMMARY

When there is high GPU load by games and vsync is enabled for them (which seems to be a must on Wayland to prevent stutter, either fifo or mailbox mode), latency options with lower latency than "prefer smoother animations" cause frame drops. Manually setting "prefer smoother animations" each time a game is played is rather inconvenient and it also seems to have higher input lag than on Xorg with suspended compositing and vsync done entirely by the game.

STEPS TO REPRODUCE
1. Start Plasma Wayland and leave latency option at "balanced" or set it even more aggressive in favor of lower latency.
2. Start a game with vsync that has high GPU load (e.g. super sampling via configurable resolution scale in Hitman 2) and watch fps and frame time graph via Mangohud.

OBSERVED RESULT

There are dropped frames and thus stuttering/reduced performance with the aforementioned latency options, whereas "prefer smoother animations" shows normal performance (with higher input lag than on Xorg with suspended compositing though):
https://invent.kde.org/plasma/kwin/uploads/fe965708e03d4b86ce535a1a9bb29dff/Screenshot_20210115_142349.png
https://invent.kde.org/plasma/kwin/uploads/6d1aba27cd85f6ecf4e7fadda8864f25/Screenshot_20210115_142316.png


EXPECTED RESULT

Ideally, games on Wayland in fullscreen should work as good as on Xorg with automated suspending of compositing, i.e. without any frame drops and additional input lag by compositor vsync.


SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Arch 5.10
KDE Plasma Version: kwin git-master v5.19.90.r466.g2d1994e06
Comment 1 tempel.julian 2021-01-16 21:50:04 UTC
With sway, latency for games in fullscreen seems to be lower than with kwin's "prefer smoother animations" and there are no frame drops.
Comment 2 Vlad Zahorodnii 2021-01-18 07:59:16 UTC
(In reply to tempel.julian from comment #1)
> With sway, latency for games in fullscreen seems to be lower than with
> kwin's "prefer smoother animations" and there are no frame drops.

What about windowed mode?
Comment 3 tempel.julian 2021-01-18 15:09:21 UTC
(In reply to Vlad Zahorodnii from comment #2)
> What about windowed mode?
With sway, latency seems to be higher in windowed mode (probably due to direct scanout in fullscreen?).
With kwin, fullscreen and windowed feel equally laggy or stuttery, depending on the latency option, even though I patched the direct scanout MR into my kwin build.

I btw. accidentally tested with amdvlk instead of mesa drivers. With mesa, the frame time graph looks better with lower latency options, though presentation on screen is still negatively affected in terms of stutter.
Comment 4 Vlad Zahorodnii 2021-01-19 07:49:13 UTC
With super-sampling, I see no difference in the frame rate when the latency policy is set to the lowest and the highest value. I've also traced kwin to check if some frames are not rendered on time, but it seems like that's not the case, at least on my machine, I also have Radeon RX 5700 XT.
Comment 5 Vlad Zahorodnii 2021-01-19 07:50:44 UTC
Created attachment 134988 [details]
lowest latency + no super-sampling

Even with the lowest latency policy, the frame rate is at 144Hz, in some parts it drops to lower values, but usually it stays around 144Hz.
Comment 6 Vlad Zahorodnii 2021-01-19 07:52:02 UTC
Created attachment 134989 [details]
lowest latency + super-sampling

The GPU load increases dramatically, as expected, and the frame rate is usually somewhere between 80-90fps.
Comment 7 Vlad Zahorodnii 2021-01-19 07:53:32 UTC
Created attachment 134990 [details]
smoothest animations + super-sampling

Choosing the extremely high latency doesn't improve the frame rate (on my machine). Similar to the extremely low latency policy, the frame rate is usually between 80-90fps.
Comment 8 Vlad Zahorodnii 2021-01-19 08:24:07 UTC
Created attachment 134992 [details]
hitman 2 running on sway

Things look no different when Hitman 2 is running on sway.
Comment 9 Vlad Zahorodnii 2021-01-19 08:25:25 UTC
@tempel.julian Can you provide the specs of your machine? Also, what kernel and drivers do you use?
Comment 10 tempel.julian 2021-01-19 13:11:27 UTC
First: Thanks a lot for your efforts trying to reproduce!

I've re-tested and re-confirmed the issue with
-RX 5700 XT on linux 5.10.8 from stable Arch repo
-radv mesa-git 21.0.0_devel.133381.5331b1d9456
-regular Proton 5.13-5
-1440p 60Hz (I suggest limiting the display to 60Hz, as it makes vsync stutter easier to perceive.)
-full refresh rate vsync enabled in the game (it's fifo)

I recommend setting

dxgi.maxFrameLatency = 1
dxgi.numBackBuffers = 3

in "steamapps/common/HITMAN2/Retail/dxvk.conf". The former option reduces rendering frame time variance (independently from vsync), whereas the latter ensures correct fifo triple buffer vsync behavior when fps are below refresh rate (unless amdvlk is used instead of radv :( ). When you run the game on Xorg with compositing suspended, the frame time graph of mangohud should now be completely flat (both when vsync caps fps and below that cap) and it should look as smooth as it can.

It seems that it might be random whether the Wayland stutter issue makes the frame time graph look "aliased" or flat, however it looks stuttery anyway.
Does it really look as smooth for you @60Hz with high latency reduction vs. lower settings/Xorg?
Comment 11 Vlad Zahorodnii 2021-01-19 13:53:43 UTC
The latency option won't work on X11 if compositing is disabled, but I'll try to reproduce the stuttering issues with the aforementioned dxgi options. As for `dxgi.maxFrameLatency`, I suggest to crank it up a little bit. If some frame is running late, it's better to start painting a new frame anyway. It will cost you some latency but the frame rate should be more stable.
Comment 12 tempel.julian 2021-01-19 14:19:55 UTC
Created attachment 134995 [details]
bad performance with lower latency
Comment 13 tempel.julian 2021-01-19 14:20:20 UTC
Created attachment 134996 [details]
good performance with prefer smoother animations
Comment 14 tempel.julian 2021-01-19 14:23:30 UTC
RadeonSI and AMD Windows D3D11 driver force CPU prerender limit of 1 by default, I haven't been able yet to link any issue to it in years of usage. :)

I've attached two more screenshots that show the issue with RADV, fifo vsync and the aforementioned tweaks.

Btw: It seems any setting change in the kwin kcm add 

[KDE]
AnimationDurationFactor=0.5

to my kwinrc.
Comment 15 Vlad Zahorodnii 2021-01-19 15:32:24 UTC
What's the refresh rate of your monitor? Can you post the output of `qdbus org.kde.KWin /KWin supportInformation`?
Comment 16 tempel.julian 2021-01-19 16:04:35 UTC
It's 59.95Hz with default EDID or 85Hz with overclock. Both refresh rates seem to be affected equally.
Comment 17 tempel.julian 2021-01-19 16:05:04 UTC
Created attachment 134998 [details]
kwininfo log
Comment 18 Vlad Zahorodnii 2021-01-19 17:55:26 UTC
TIL it's possible to overclock a monitor. 

Based on the support information you provided, kwin assumes that your monitor has a refresh rate of 60Hz. I suspect this is the problem.

Can you provide the output of `kscreen-doctor -o`? Also, if you un-overclock your monitor, does this issue still persist?
Comment 19 tempel.julian 2021-01-19 19:55:23 UTC
The above qdbus information was without custom edid active, so only the default 60Hz is available as refresh rate. That's what I meant with that it also happens with 60Hz (default edid), so independently from any OC or other kinds of tampering.
My layman impression is that the issue is due to the latency option/kwin's swapchain handling not leaving game output "untouched", as far as this is possible on Wayland. But for whatever reason this works better on sway (but btw. not on Gnome either, it stutters as well).
Comment 20 tempel.julian 2021-01-19 19:55:53 UTC
Created attachment 135003 [details]
kscreen log
Comment 21 Vlad Zahorodnii 2021-01-20 10:14:22 UTC
Yeah, with 60Hz, the frame rate issues are more noticeable. The problem is that kwin starts compositing on time and it finishes recording rendering commands before the next vblank, but it takes a while for those commands to be executed on GPU.

I guess we need to insert opengl queries after all to measure how long it takes to render a frame on the GPU side.
Comment 22 Vlad Zahorodnii 2021-01-20 11:57:19 UTC
(In reply to tempel.julian from comment #19)
> My layman impression is that the issue is due to the latency option/kwin's
> swapchain handling not leaving game output "untouched", as far as this is
> possible on Wayland. But for whatever reason this works better on sway (but
> btw. not on Gnome either, it stutters as well).

I wonder why this works better on sway. Did you set the `max_render_time` option?
Comment 23 tempel.julian 2021-01-20 13:39:15 UTC
(In reply to Vlad Zahorodnii from comment #22)
> I wonder why this works better on sway. Did you set the `max_render_time`
> option?
Glad you can confirm with kwin.
I didn't change sway's rendering parameters. Of course subjective observations are always prone to human error, but it looked normally smooth to me.
Comment 24 Vlad Zahorodnii 2021-01-20 13:53:56 UTC
Ah, I see. If the max_render_time option is not set, latency will be high if I recall correctly. Can you set the max_render_time option to 3 or 4?

For more details about the max_render_time option, you could refer to sway-output(5).
Comment 25 tempel.julian 2021-01-20 15:02:59 UTC
With 85Hz, I need to go as high as 12 (oddly close to the frame time?) to guarantee that the frame time graph is always flat. 11 seems to be mostly good enough, yet at times there still seem to be peaks in the graph. 10 definitely is too low, it causes double buffering-like behavior.
Still 12 still seems to feel a little more direct than the default/unlimited value, but this is really getting close to placebo self-tricking.
However, I was quite sure that even the default setting felt more direct than kwin with "prefer smoother animations".
Comment 26 Vlad Zahorodnii 2021-01-21 07:32:54 UTC
Yeah, my point was that it's not fair to compare kwin with the lowest latency preset against sway with high latency on. For what it's worth, the lowest latency option is equivalent to setting the max_render_time option to a value between 3 and 5 (assuming that the refresh rate of your monitor is 60Hz).
Comment 27 tempel.julian 2021-01-21 11:26:51 UTC
Perhaps latency reduction can set itself automatically to a less aggressive value once direct scanout turns active? So one wouldn't have to switch latency values for playing games and regular desktop usage.

However, the current latency of "prefer smoother animations" wouldn't be to great either as a lasting solution, as I wouldn't want to play my competitive moba game with the increased lag on Wayland vs. Xorg with suspended compositing. :(
Comment 28 Zamundaaa 2021-01-22 15:25:55 UTC
Unless you're testing my MR there is no direct scanout happening yet. It could even improve the situation a bit (as no rendering on KWins side is happening).
Comment 29 tempel.julian 2021-01-22 15:52:53 UTC
I had patched direct scanout into my local build of kwin, yes. It didn't made things noticeably better (or worse) though, latency in games was still higher with "prefer smoother animations" vs. Xorg with suspended compositing.
Not saying direct scanout didn't lower latency, but if it did, it wasn't enough for me to notice. So perhaps the largest share of latency is caused by presentation delays and not rendering?
Comment 30 Zamundaaa 2021-02-23 16:49:57 UTC
Turns out you haven't actually been using direct scanout because of what I assume to be a Wine bug. See (and test) https://invent.kde.org/plasma/kwin/-/merge_requests/728
Comment 31 tempel.julian 2021-02-24 21:20:27 UTC
As discussed on Gitlab, !728 indeed fixes the issue for (now also Wine) fullscreen. :)

A remaining issue would be to achieve better frame presentation for windowed gaming (or perhaps some other 3D workload) too. While it naturally can't beat fullscreen with direct scanout, it'd still be nice if it was more bearable under high GPU load in windowed mode too. Though I can't judge if it makes sense at this point to put efforts into it, which in return probably means less time spent on improvements for other important areas.