Bug 450914

Summary: Wayland, games on Nvidia are force vsynced
Product: [Plasma] kwin Reporter: Alexander Streng <streng.alexander>
Component: wayland-genericAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED UPSTREAM    
Severity: normal CC: ales.astone, ekurzinger, kodatarule, miranda, nate, nexustcrax, patrik.lk, qydwhotmail, raul.chaves, sampingu02, tomblackwhite, xaver.hugl
Priority: NOR    
Version: 5.24.3   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
URL: https://forums.developer.nvidia.com/t/nvidia-bug-kde-wayland-games-are-force-vsynced/237880
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Alexander Streng 2022-02-27 07:56:37 UTC
SUMMARY
When playing games inside Wayland session with Nvidia gpu, games are forcefully vsynced compared to X11 and compared to Wayland session on Amd


STEPS TO REPRODUCE
1. Select wayland session
2. Launch a game with some kind of a fps overlay like DXVK or Mangohud


OBSERVED RESULT
in game vsync is off but the game is still locked to 60fps

EXPECTED RESULT
in game vsync is off and the game runs at unlocked framerate

SOFTWARE/OS VERSIONS
Linux/KDE Plasma:  Manjaro KDE
(available in About System)
KDE Plasma Version: 5.24.2
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
Comment 1 Zamundaaa 2022-02-27 19:37:38 UTC
KWin is not involved with application VSync, beyond providing infrastructure so that they can match timing. This is most likely a bug in the NVidia driver, or (rather unlikely though) in Xwayland
Comment 2 Alexander Streng 2022-02-27 19:44:50 UTC
Ah, I just thought that it might be a bug with kwin since I've heard that Gnome doesn't have this issue in Wayland session 🤔
Comment 3 Samuel 2022-03-03 05:55:57 UTC
(In reply to Alexander Streng from comment #2)
> Ah, I just thought that it might be a bug with kwin since I've heard that
> Gnome doesn't have this issue in Wayland session 🤔

Yes. GNOME Wayland doesnt have this issue. I doubt if this is really a upstream bug.
https://www.youtube.com/watch?v=mATihNvCizY
Comment 4 Samuel 2022-03-06 13:13:31 UTC
i can get unlocked fps in sway (wlroots) too in Apex Legends... Re-opening to confirm if it really is an upstream bug.
Comment 5 Zamundaaa 2022-03-09 20:15:56 UTC
*** Bug 451066 has been marked as a duplicate of this bug. ***
Comment 6 Zamundaaa 2022-03-09 20:30:15 UTC
Erik, do you know anything about where this could be coming from?
The only difference that I see between the compositors in question is that KWin doesn't implement presentation-time... but that should not even be remotely relevant for providing mailbox presentation.
Comment 7 Samuel 2022-03-10 06:54:08 UTC
I also noticed something interesting. In Fullscreen and Borderless modes, VSync is being forced in Plasma while the Windowed mode doesnt have this problem. Windowed mode has uncapped FPS similar to other compositors.
Comment 8 Erik Kurzinger 2022-03-10 16:21:45 UTC
Tried this with a debug NVIDIA driver, and it looks like the application is spending most of its time waiting for the PresentIdleNotify event from Xwayland. If vsync is disabled this should get delivered pretty much immediately, and as Samuel said, for windowed applications it does, but for full-screen applications for some reason we're only getting one every vblank interval.

Probably the important different there is that Xwayland will use the "copy" path in xwayland-present.c for windowed applications but will use the "flip" path for full-screen / borderless applications. So like in the former case Xwayland will copy from the application's buffer to one of it's own buffers, and in the latter case it will pass the application's buffer directly to Kwin.

As I understand it, with the copy path, the idle notification is sent immediately after the copy completes, but with the flip path it's only sent after the buffer has been released by Kwin. Is there possibly a difference between NVIDIA and Mesa that could cause Kwin to delay releasing these buffers? There is one known bug in our DRM driver where we don't report the correct timestamp for page flips (it will just be 0), not sure if that could be relevant.
Comment 9 Samuel 2022-03-11 11:25:59 UTC
(In reply to Erik Kurzinger from comment #8)

> Probably the important different there is that Xwayland will use the "copy"
> path in xwayland-present.c for windowed applications but will use the "flip"
> path for full-screen / borderless applications. So like in the former case
> Xwayland will copy from the application's buffer to one of it's own buffers,
> and in the latter case it will pass the application's buffer directly to
> Kwin.

So is this a KWin bug or GNOME and Sway not using this "flip" path or a NVIDIA Driver bug?
Comment 10 Zamundaaa 2022-03-11 16:52:15 UTC
There is some weird interaction of Xwayland with windowed contents that can cause something similar: https://gitlab.freedesktop.org/xorg/xserver/-/issues/1309
The timer in Xwayland is about 58-59fps, not 60 though, so it's most likely not the same issue. It also happens in other compositors, so there's that...

> Is there possibly a difference between NVIDIA and Mesa that could cause Kwin to delay releasing these buffers?

How many buffers does the NVidia driver allocate for a swapchain? While I really doubt that a lack of free buffers would manage to get exact smooth 60fps, Mesa needed to bump the number of buffers in Wayland and Xwayland up to 4 in order to remove some bottlenecks with direct scanout.

@Samuel you can quickly check if direct scanout makes a difference by putting KWIN_DRM_NO_DIRECT_SCANOUT=1 into /etc/environment and rebooting

One other thing that interacts with buffer releases is that KWin does dynamic frame scheduling, which neither Sway nor AFAIK a currently released version of Mutter do; Mutter always renders at vblank and IIRC Sway does so too by default. KWin with default settings starts compositing somewhere in the middle of the frame (depends on GPU load and user settings)

> There is one known bug in our DRM driver where we don't report the correct timestamp for page flips (it will just be 0), not sure if that could be relevant
KWin fakes the timestamp to be at the time of receiving the callback if it's not correct, it probably doesn't make a significant difference.
Comment 11 Samuel 2022-03-12 07:57:36 UTC
(In reply to Zamundaaa from comment #10)

> @Samuel you can quickly check if direct scanout makes a difference by putting KWIN_DRM_NO_DIRECT_SCANOUT=1 into /etc/environment and rebooting

No. Checked using `env` command if the environment variable is active, still the Fullscreen and Borderless modes stuck at 60 while Windowed mode gives uncapped FPS with the env var set.
Comment 12 Alessandro Astone 2022-04-01 17:06:18 UTC
Zamundaaa this is the same bug we discussed here https://bugs.kde.org/show_bug.cgi?id=448918

Not only is the application being "vsynced" (meaning the game rendering loop runs at VBlank frequency), but when other windows are drawing on a different monitor, the frequency *increases*. I've documented it in detail with exapmles in that bug report.

Following the comments here i've also tried KWIN_DRM_NO_DIRECT_SCANOUT and XWayland 22.1.1 which includes the fix to https://gitlab.freedesktop.org/xorg/xserver/-/issues/1309, without any difference.

Also reminder that the bug exclusively happens on fullscreen windows.
Comment 13 Alessandro Astone 2022-04-01 17:18:29 UTC
(In reply to Alessandro Astone from comment #12)
> Zamundaaa this is the same bug we discussed here
> https://bugs.kde.org/show_bug.cgi?id=448918
> 
> Not only is the application being "vsynced" (meaning the game rendering loop
> runs at VBlank frequency), but when other windows are drawing on a different
> monitor, the frequency *increases*. I've documented it in detail with
> exapmles in that bug report.
> 
> Following the comments here i've also tried KWIN_DRM_NO_DIRECT_SCANOUT and
> XWayland 22.1.1 which includes the fix to
> https://gitlab.freedesktop.org/xorg/xserver/-/issues/1309, without any
> difference.
> 
> Also reminder that the bug exclusively happens on fullscreen windows.

Also in that conversation I stated that weston-simple-egl was behaving correctly but i'm unable to reproduce that.
`__GL_SYNC_TO_VBLANK=0 weston-simple-egl -b [-f]` is still matching the monitor's refresh rate (even if not fullscreen), and does not present the behaviour that if another application is drawing in another monitor the framerate increases)
Comment 14 Zamundaaa 2022-05-02 15:08:36 UTC
*** Bug 453268 has been marked as a duplicate of this bug. ***
Comment 15 kodatarule 2022-05-02 19:41:42 UTC
Hmm, there seems to be something more going on with it - https://bugs.kde.org/show_bug.cgi?id=453268 this suggests that if the app remains in windowed mode and keeps it's border's it'll fix it, while for some cases it works it's not always the case.

KWIN_DRM_NO_DIRECT_SCANOUT does manage to fix Retroarch for example, but most xwayland apps still have some weird syncing going on. For example using 2 displays one 165hz and one 144hz, the syncing gets confused and results in horrible timings on the main monitor.
What is also more interesting is that it gets even worse once the gpu is taxed to the maximum.
Comment 16 kodatarule 2022-05-14 21:12:04 UTC
Ok, with the new driver, most of the issues seem remedied, but I think once this issue is fixed freesync/gsync should also be looked at as it's working on Sway.
Comment 17 kodatarule 2022-05-14 21:54:34 UTC
(In reply to kodatarule from comment #16)
> Ok, with the new driver, most of the issues seem remedied, but I think once
> this issue is fixed freesync/gsync should also be looked at as it's working
> on Sway.

Ouf I can't edit, --- further testing and no freesync/gsync doesn't kick in during sway session. However there seems to be some weird buffering going on under KDE wayland session that I am unable to reproduce in Gnome/Sway - it's almost like the game is failing to sync or something and results in weird hitches during gameplay, not sure if it is related to this entire issue.
Comment 18 Samuel 2022-05-15 09:50:34 UTC
I experienced hitching and some stuttery gameplay in GNOME, Sway and Plasma Wayland as soon as the FPS of game exceeds/deceases above/below  Screen's refresh rate.
Comment 19 Samuel 2022-05-15 09:55:30 UTC
(Gameplay in Plasma Wayland was smooth because VSync was forced.)
Comment 20 kodatarule 2022-05-19 18:43:29 UTC
Update as of Plasma 5.25 Beta(5.24.90)

On multiple displays it properly detects the refresh rate and sets proper V-Sync now, but limits the framerate to the monitor's refresh rate(165fps // 165hz), other than that it's definitely a step in the right direction as performance is on par +/- in some cases with xorg
Comment 21 Raul José Chaves 2022-07-03 20:01:33 UTC
(In reply to Samuel from comment #7)
> I also noticed something interesting. In Fullscreen and Borderless modes,
> VSync is being forced in Plasma while the Windowed mode doesnt have this
> problem. Windowed mode has uncapped FPS similar to other compositors.

I'm not experiencing this, Guild Wars 2 is locking my fps both in windowed and full screen mode. 

KDE on Xorg and Gnome on Wayland are working with uncapped FPS
Comment 22 kodatarule 2022-08-20 11:33:43 UTC
Any update on this? It's been a while and this hinders the performance a lot of some of the more demanding titles....
Comment 23 Samuel 2022-08-23 16:04:04 UTC
Is there anything that we can do to assist fixing this bug @Zamundaa and @Erik? This is a "Wayland Showstopper"-tier bug for gamers running FPS/Shooter/Battle Royale games.
Comment 24 kodatarule 2022-08-23 17:07:16 UTC
(In reply to Samuel from comment #23)
> Is there anything that we can do to assist fixing this bug @Zamundaa and
> @Erik? This is a "Wayland Showstopper"-tier bug for gamers running
> FPS/Shooter/Battle Royale games.

I double on this, if something can be done to speed it up from our side, we would gladly assist. If it helps as it was reported in previous duplicate thread, if you play a video on fullscreen on your 2nd monitor, it sort of gains performance(but still nowhere near as good as gnome wayland/kde on xorg) and there's this weird syncing going on with the game causing weird framerate.
Nothing else that I found of linked to this bug.
Comment 25 Erik Kurzinger 2022-08-23 18:34:16 UTC
So I did look into this a bit further, and can provide some hopefully relevant information...

Firstly, I noticed that, unfortunately, direct scan-out for Xwayland applications doesn't work at all with our driver right now. This is an issue on our end, we're allocating compressed buffers which aren't eligible for scan-out. It should be pretty easy to fix, but that will need to be in an upcoming driver release.

Pertaining to the issue at hand, though, that implies Kwin's direct scan-out behavior is not relevant. Xwayland applications will always be composited.

To follow-up on Zamundaaa's earlier comment, the difference between Mesa and NVIDIA does indeed appear to be due to the fact that Mesa will use 4 buffers in its swapchain if vsync is disabled and the Present extension reports it's using the flipping path. Our driver, on the other hand, will only use two, always. Note that's only for X11 applications with Xwayland, for native Wayland applications we do actually use 3, and, at least in my testing, those aren't capped to the display refresh rate.

I experimented with increasing the number of swapchain buffers to 3, and using mailbox-style logic similar to Mesa, and this does resolve the issue. So that is an option.

However, I still question why Kwin is holding on to both buffers for the entire frame if it's not doing direct scan-out. Like, if we do two wl_surface_attach/damage/commits in a single vblank period, shouldn't the second one cause the first buffer to be released? That's what happens with weston, mutter, wlroots, etc. but apparently not with Kwin.

Now, once we do fix the compressed buffer thing enabling direct scan-out, maybe we will need to add a third swapchain image anyway. That's assuming that Kwin's page flips will still be synchronized to the display refresh rate, which I believe is the case (correct me if I'm wrong).

But is that really the solution users want? Sure, it would technically let the game run at an uncapped refresh rate, but as I understand it the main reason people want to run games that way is to minimize input latency, and they're willing to accept possible tearing as a trade-off (e.g. competitive gamers and whatnot). So would it maybe make more sense to have Kwin do tearing flips instead for direct scan-out applications? In *that* case I think 2 swapchain buffers on the driver side would still be fine, right?
Comment 26 Samuel 2022-08-23 18:57:50 UTC
(In reply to Erik Kurzinger from comment #25)
> But is that really the solution users want? Sure, it would technically let
> the game run at an uncapped refresh rate, but as I understand it the main
> reason people want to run games that way is to minimize input latency, and
> they're willing to accept possible tearing as a trade-off (e.g. competitive
> gamers and whatnot). So would it maybe make more sense to have Kwin do
> tearing flips instead for direct scan-out applications?

There are open MR for allowing tearing in Wayland -
1. https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/65 ("Main blocker is, KMS doesnt support tearing due to atomic interface" - Simon Ser)
2. https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/103
Comment 27 Zamundaaa 2022-08-23 21:54:46 UTC
> Firstly, I noticed that, unfortunately, direct scan-out for Xwayland applications doesn't work at all with our driver right now. This is an issue on our end, we're allocating compressed buffers which aren't eligible for scan-out. It should be pretty easy to fix, but that will need to be in an upcoming driver release
There's also https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/818 to resolve that properly, so that Xwayland clients can make use of dmabuf feedback indirectly. With that it should be possible to use compressed buffers where beneficial without sacrificing direct scanout.

> I still question why Kwin is holding on to both buffers for the entire frame if it's not doing direct scan-out
I looked at where buffers are referenced and unreferenced, and surface pixmaps are only updated when KWin is actually painting. So it only releases a buffer that was used for compositing right before the next frame is rendered.

This can be fixed but I don't think there's a real upside to it - because of direct scanout, you need at least 3 buffers for fifo and 4 for mailbox anyways.

> Now, once we do fix the compressed buffer thing enabling direct scan-out, maybe we will need to add a third swapchain image anyway
3 buffers isn't enough. In the time between the compositor submitting the next frame to drm and the associated page flip, one buffer is used for direct scanout and another is imported and locked for the next frame. If you commit a new buffer in that time the compositor can't release either buffer yet, so it keeps all 3 buffers locked until the page flip happens.
This problem is made a lot worse by most compositors needing to do compositing in the middle of a refresh cycle (on weak hardware even earlier) to reliably ensure no frames get dropped, so you'd potentially block for quite some time with only 3 buffers in the client.

To solve this problem there's effectively two options: The drm API could gain support for mailbox natively. The compositor could then replace the pending frame with the new buffer, and release the old one immediately without waiting for any page flips. This API is wanted a lot but not too likely to happen in the very near future.  The other option is that the client allocates 4 buffers instead of 3.

Also see https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15786 for some more discussions about this.

> That's assuming that Kwin's page flips will still be synchronized to the display refresh rate, which I believe is the case (correct me if I'm wrong).
That is correct for all compositors that I know of. Currently there is nothing else one can do with atomic modesetting, and falling back to legacy is not really an option.

> But is that really the solution users want? Sure, it would technically let the game run at an uncapped refresh rate, but as I understand it the main reason people want to run games that way is to minimize input latency, and they're willing to accept possible tearing as a trade-off (e.g. competitive gamers and whatnot). So would it maybe make more sense to have Kwin do tearing flips instead for direct scan-out applications?
Absolutely not. Not all gamers want tearing and mailbox still presents a (small) latency advantage vs FIFO. We also really do not want to restrict direct scanout to games.

> In *that* case I think 2 swapchain buffers on the driver side would still be fine, right?
yes but also no. The problem is that the compositor will not necessarily actually do tearing when the client wants it (if the window isn't focused, if the user disabled it, the buffer format / modifier isn't usable for tearing or whatever) so you must allocate sufficient buffers for mailbox or you risk stalling the application.
Comment 28 Samuel 2022-08-24 05:50:33 UTC
(In reply to Erik Kurzinger from comment #25)
> It should be pretty easy to fix, but that will need to be in an
> upcoming driver release.

Just to get it clarified, direct scan-out will be fixed by next driver update and capped FPS in KWin will be fixed in the one after the next update?
Comment 29 kodatarule 2022-09-17 22:57:01 UTC
(In reply to Samuel from comment #28)
> (In reply to Erik Kurzinger from comment #25)
> > It should be pretty easy to fix, but that will need to be in an
> > upcoming driver release.
> 
> Just to get it clarified, direct scan-out will be fixed by next driver
> update and capped FPS in KWin will be fixed in the one after the next update?

That would be a good idea, is the nvidia driver going to get the direct scan-out fix and after that the mesa style mailbox-style logic alongside the correct swapchains or there will be a different workaround this bug ?
Comment 30 kodatarule 2022-10-21 23:42:26 UTC
Any ETA of when this might be resolved ? Note: The forced Vsync is not the issue, real problem is when you have multiple monitors it cuts the game's performance in 70% which is more than half and on demanding titles this is a huge show stopper ...
Comment 32 Samuel 2022-12-19 05:51:45 UTC
(In reply to Erik Kurzinger from comment #25)
> It should be pretty easy to fix, but that will need to be in an
> upcoming driver release.

I updated to latest stable NVIDIA proprietary driver version - 525.60.11 and the VSync is still forced in games in fullscreen and borderless modes in Plasma. Any fix for this in future?
Comment 33 kodatarule 2022-12-24 06:44:08 UTC
Perhaps this should be reported to the nvidia forums and kept track on there, maybe that would help out on a solution for this issue.
Comment 34 Samuel 2022-12-28 16:01:57 UTC
Here is the latest news on this bug:-
"Hi, I’m currently working on a driver-side change that should resolve the forced vsync issue, as well as enable direct scan-out for eligible Xwayland applications.

Work on Gamma LUT support is also in progress. The implementation is currently under internal code-review."

- Erik Kurzinger
Comment 35 Samuel 2022-12-28 16:02:49 UTC
Here is the NVIDIA Developer forum link for this bug : https://forums.developer.nvidia.com/t/nvidia-bug-kde-wayland-games-are-force-vsynced/237880
Comment 36 Nate Graham 2023-01-02 21:16:01 UTC
Thanks!
Comment 37 Samuel 2023-06-19 16:10:19 UTC
Update: This issue will be fixed in the upcoming NVIDIA 545 Driver (https://forums.developer.nvidia.com/t/nvidia-bug-kde-wayland-games-are-force-vsynced/237880/21).