Bug 481716 - KWin drops frames on nvidia (Wayland)
Summary: KWin drops frames on nvidia (Wayland)
Status: RESOLVED UPSTREAM
Alias: None
Product: kwin
Classification: Plasma
Component: wayland-generic (show other bugs)
Version: 6.0.0
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords: qt6
Depends on:
Blocks:
 
Reported: 2024-02-23 11:09 UTC by fililip
Modified: 2024-03-15 04:49 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description fililip 2024-02-23 11:09:16 UTC
SUMMARY
There seem to be weird frame drops when using KWin on an Nvidia dGPU (on Wayland).
Adaptive sync on/off doesn't matter.

STEPS TO REPRODUCE
1. Move the cursor in a circular fashion
or
2. Launch testufo.com

OBSERVED RESULT
In the case of cursor motion, there are "dropped" cursor images (persistent vision; better visible on 120Hz+).
With UFOTest, stuttering is noticeable.
(Does not happen on AMD/Intel)

EXPECTED RESULT
Everything should be smooth.

SOFTWARE/OS VERSIONS
Arch Linux 
KDE Plasma Version: 5.93.0
KDE Frameworks Version: 5.249.0
Qt Version: 6.7.0
Kernel Version: 6.7.5-arch1-1 (64-bit)
Graphics Platform: Wayland

ADDITIONAL INFORMATION
I use the nvidia-open-beta modules and nvidia-utils-beta (550.40.07) because my GPU does not work on 545 (too old drivers). KWin was launched with the DRM device env var set to just the nvidia card (/dev/dri/card1, since I also have an AMD GPU in my PC).

Is this something that can be fixed in KWin or is it an nvidia specific bug?
Comment 1 fililip 2024-02-29 16:21:47 UTC
Note: I set my CPU frequency to 1750MHz minimum, now the framedropping doesn't occur with cursor motion, but there are observable CPU usage spikes on one thread when anything's happening on the screen.
Comment 2 fililip 2024-02-29 18:44:31 UTC
The stutters still occur on Plasma 6.0.0. Will the explicit sync protocol address this? (I do not have a problem with apps glitching out on xwayland, just desktop wide stutters, even in native Wayland apps.)
Comment 3 fililip 2024-03-03 16:22:09 UTC
I have observed something in dmesg when using the Nvidia card with KWin:

[  303.194078] [drm] [nvidia-drm] [GPU ID 0x00000c00] Framebuffer memory not appropriate for scanout
[  303.194258] [drm] [nvidia-drm] [GPU ID 0x00000c00] Framebuffer memory not appropriate for scanout
Comment 4 Zamundaaa 2024-03-04 11:56:58 UTC
> Will the explicit sync protocol address this?

No, it's only about app<->KWin synchronization, synchronization with the kernel is ensured by KWin code.

> Framebuffer memory not appropriate for scanout

Are there any specific situations where that happens? It's pretty much expected that you get some of these when making apps fullscreen
Comment 5 fililip 2024-03-04 12:11:13 UTC
Only happens on startplasma-wayland with the env var or connecting a display to the nvidia card, gets displayed only two times, making apps fullscreen doesn't trigger it again. I just thought it's relevant to the stutters.
Comment 6 Zamundaaa 2024-03-04 12:45:24 UTC
If it only happens on startup, it shouldn't be relevant.
I can think of two reasons this could happen:
1. if atomic commits take a long time, the commit thread can miss frames - this was an issue with amdgpu for a while, so maybe NVidia is affected too. I'll have to measure how long committing takes on NVidia
2. the timing is very close; in that case https://invent.kde.org/plasma/kwin/-/merge_requests/5349 should fix this
Comment 7 Zamundaaa 2024-03-04 14:56:42 UTC
I measured commit times, and while they're not amazing, they're far below 1ms (which is not the case for Intel) so I doubt this is the problem.
Can you test https://invent.kde.org/plasma/kwin/-/merge_requests/5349?
Comment 8 fililip 2024-03-04 16:02:57 UTC
> (which is not the case for Intel)

Interesting. I did not have these problems on an Arc A750 at all, even though I was also doing dual-GPU back when I still had it.

What's more, these stutters do not happen on gnome/xorg, but they do on kwin/xorg. They also happen in gamescope (so I doubt the MR you linked me to will do anything about it).

Using the nvidia vulkan driver to render xwayland/native wayland games and presenting them on a screen plugged into the AMD card works just fine, I don't seem to exhibit the xwayland explicit sync issue even there.

I also noticed very high CPU usage when compositing on the nvidia card, compared to when off-loading. This is probably the strangest part.

I really hope this is an early adopter problem, as this is a new RTX 4070 Super, which was not even properly referred to as such by the nvidia driver (it said Unknown Graphics Device or something like that) before 550. The other thing is that it behaves the same way on nouveau, no improvement there (nouveau also seems not to support AMS and adaptive sync, that's interesting).

I'll try to test the MR soon. If it doesn't fix it, I'll try getting in touch with Nvidia (since I assume there is nothing more you guys can do about it).
Comment 9 fililip 2024-03-05 16:56:50 UTC
Alright, I tested kwin from master today and the issue still seems to exist - very high (33-40%) one CPU thread usage & occasional stuttering when simply compositing with an app open (VRRTest).
I've also tried blacklisting amdgpu, turning on/off 64-bit decoding in UEFI, there's nothing I can do to make it work.

Additionally, I tested xfce4 (xorg), just for fun, and found out that it works just fine.

I'd say this is definitely Nvidia's fault. They seem to have issues with Wayland compositing in general. (At least for me.)
What's interesting is that using nouveau does not help.

I'll try to report the issue upstream. If nothing gets fixed, that's not a big deal for me, I still have the AMD card for presentation. At least offloading works just fine, and I have CUDA support.

Thank you very much for your effort and sorry for the trouble.

(I have another observation: multi-monitor VRR doesn't seem to work with Nvidia. The stutters still happen with one screen (it's ok with fullscreen though, but the CPU usage is frustrating nonetheless), but VRR breaks completely for me if I attach my non-VRR secondary display to the mix. This is probably not related to the stuttering bug though, so feel free to close this one.)
Comment 10 Nate Graham 2024-03-15 04:49:03 UTC
Thanks for the follow-up!