Bug 490358 - Jerky/stuttering graphics in Plasma 6 with Intel GPU
Summary: Jerky/stuttering graphics in Plasma 6 with Intel GPU
Status: REOPENED
Alias: None
Product: kwin
Classification: Plasma
Component: performance (show other bugs)
Version: 6.1.2
Platform: Kubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-16 13:57 UTC by Michael Marley
Modified: 2024-08-05 23:41 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
KWin performance statistics during display stuttering (362.12 KB, text/csv)
2024-07-25 18:00 UTC, Michael Marley
Details
KWin performance statistics during display stuttering at lower CPU load (495.29 KB, text/csv)
2024-07-26 13:29 UTC, Michael Marley
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Marley 2024-07-16 13:57:31 UTC
SUMMARY
After upgrading from Plasma 5 to Plasma 6 (currently 6.1.2), I am experiencing jerky animation and frame drops in applications that were previously smooth.  I have seen this on Intel GPUs from multiple generations (including but not limited to Broadwell and Coffee Lake), but I have not seen it on an AMD GPU.

STEPS TO REPRODUCE
1. Start Plasma 5 (Wayland session)
2. Launch vkcube-wayland
3. Watch it for a bit.  Sometimes waving the cursor or dragging another window around will induce the jerkiness.

OBSERVED RESULT
The framerate occasionally falls to 30fps for very short intervals and/or frames are dropped, resulting in jerky motion.

EXPECTED RESULT
The framerate should stay at 60fps and frames should not be dropped.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Linux 6.10
KDE Plasma Version: 6.1.2
KDE Frameworks Version: 6.4.0
Qt Version: 6.6.2

ADDITIONAL INFORMATION
I think this may have something to do with the removal of the kwin latency setting and the autodetection of latency, as the jerkiness and stuttering I am seeing now closely mirrors what I would see with Plasma 5 if I set the latency to anything lower than maximum.
Comment 1 Zamundaaa 2024-07-16 16:16:55 UTC
Sounds like cursor updates are making atomic commits be too late. Does setting KWIN_FORCE_SW_CURSOR=1 improve the situation?
Comment 2 Michael Marley 2024-07-16 23:22:05 UTC
That doesn't seem to have any effect; I still get the framerate slowdowns and frame drops.
Comment 3 Michael Marley 2024-07-23 17:47:01 UTC
I've been doing more investigation of this and I decided to do a build of kwin 6.1.3 with https://invent.kde.org/plasma/kwin/-/commit/d7385d441417d8f43cfb09341b5c0ae449ccd219 and https://invent.kde.org/plasma/kwin/-/commit/ce1ba4252552c7f314149e8be87d5d63fca0210c cherry-picked, since they seemed to be related to graphical performance.  Much to my surprise, that actually made my issues significantly worse, with the framerate regularly falling to ~30fps for several seconds at a time.  I then tried disabling triple-buffering (export KWIN_DRM_DISABLE_TRIPLE_BUFFERING=1).  That made the issue much, much better, with significantly fewer frame drops and no more ~30fps.  So, it seems like when kwin tries to use triple buffering, it is dropping to ~30fps for some reason.  This seems counter-intuitive because triple-buffering is supposed to improve performance, but that is nevertheless what I am seeing.
Comment 4 Zamundaaa 2024-07-24 16:36:26 UTC
that's a good data point

*** This bug has been marked as a duplicate of bug 488843 ***
Comment 5 Michael Marley 2024-07-25 17:16:58 UTC
The fix in https://bugs.kde.org/show_bug.cgi?id=488843 does eliminate the periodic drops to 30fps, but I'm still seeing some stuttering that wasn't there with the latency increased to maximum on 5.x.  I understand that I'm pushing this little integrated GPU to its limits by asking it to drive a 4K screen, but the ability to manually set the latency definitely improved the smoothness.
Comment 6 Zamundaaa 2024-07-25 17:43:45 UTC
Is that even with KWIN_DRM_DISABLE_TRIPLE_BUFFERING=1, or only without it?
Comment 7 Michael Marley 2024-07-25 17:45:05 UTC
It happens no matter whether I disable triple buffering or not.  There doesn't seem to be any difference between modes.
Comment 8 Zamundaaa 2024-07-25 17:51:16 UTC
Okay, then please set KWIN_LOG_PERFORMANCE_DATA=1, reboot and upload the .csv file with performance data KWin creates in your home folder after triggering the stutter for a bit.
Comment 9 Michael Marley 2024-07-25 18:00:33 UTC
Created attachment 171989 [details]
KWin performance statistics during display stuttering

Sure, here's the CSV file.  While capturing this data, I started the system, launched Firefox, launched IntelliJ, shut down both of those applications, and then logged off.  While launching Firefox and especially while launching IntelliJ, it was stuttering quite badly.
Comment 10 Zamundaaa 2024-07-26 11:03:47 UTC
hmm, effectively all dropped frames are because of late commits, which means KWin's missing some deadlines on the CPU side.
When you start the apps, does that perhaps come with a bunch of CPU usage?
Comment 11 Michael Marley 2024-07-26 13:29:36 UTC
Created attachment 172006 [details]
KWin performance statistics during display stuttering at lower CPU load

Good point, the CPU usage is rather high during that period, though I still get some frame drops even when none of the CPUs are pegged.  Regardless, I captured more performance statistics to hopefully make it more obvious.  This time, I booted the system, started IntelliJ, and waited for the CPU to calm down.  I then launched vkcube-wayland (just to make sure that kwin tried to render at 60fps continuously) and started typing in IntelliJ, which also reproduces the frame drops.  That happened closer to the end of the capture.  The CPU usage was surprisingly high for just typing in an IDE, but it wasn't close to being pegged on any core.

For what it is worth, I'm still pretty sure this didn't happen with 5.x.
Comment 12 Michael Marley 2024-07-26 13:45:18 UTC
For what it is worth, I also tried it with the "performance" CPU governor and I see no difference in the frame drops.
Comment 13 Zamundaaa 2024-07-26 14:29:14 UTC
It has half the dropped frames, so that's something at least.

> and started typing in IntelliJ, which also reproduces the frame drops
That probably does autocomplete predictions, which do take some CPU temporarily. If the scheduler is annoying, it might cause the issue...

As in both cases the very vast majority of dropped frames are from missing the commit deadline, we probably need to increase the 1.5ms from https://invent.kde.org/plasma/kwin/-/blob/457b3a47ffe91b335ab18170cfb830d35217b9a8/src/backends/drm/drm_commit_thread.cpp#L296 for you.
Could you compile KWin and see if that actually helps? I have an idea for how to automatically decide when that needs to be increased, but I don't really want to blindly make changes without knowing if it's really the problem.
Comment 14 Michael Marley 2024-07-26 15:09:40 UTC
Sure, I can do that.  Do you have a suggestion on what value I should try, or should I just increase it until I don't see any more frame drops?
Comment 15 Michael Marley 2024-07-29 15:00:51 UTC
I ended up trying a few different values.  I started by doubling it to 3000us, which had no effect.  I then doubled it again to 6000us, which had very little or no effect.  I then used 15003us, which is, assuming my understanding of the kwin 5 code is correct, the value that was used in kwin 5 when maximum latency had been configured.  With this value, I get almost no frame drops even while starting Firefox and IntelliJ.

For kicks, I also tried renicing kwin to -20, which had no effect.  I also set it to round-robin scheduling, which also had no effect.
Comment 16 Michael Marley 2024-08-01 17:57:26 UTC
I've done quite a bit more testing and I have made some interesting and hopefully-relevant findings.  After determining that raising the safetyMargin to 15003ms eliminated most of the frame drops, I then attempted to restore the latency configuration that was removed in https://invent.kde.org/plasma/kwin/-/merge_requests/4408 with an additional default option "Automatic" to use the current behavior.  I did that successfully (including proving that the expectedCompositingTime values were actually getting set as intended).  However, I found that had absolutely no effect on the frame drops, so I further modified it to set an expectedCompositingTime so high that it would effectively force it to start rendering the next as soon as the previous had been displayed.  That also had no effect.

I then realized that safetyMargin is also used in drm_commit_thread.cpp independently of its use in renderloop.cpp.  Upon determining that, I changed renderloop.cpp to hardcode an addition of 1500us instead of using safetyMargin and then set the safetyMargin to 15003us again.  This has very similar or the same behavior to the previous test where I only set safetyMargin to 15003us without hardcoding the old value in renderloop.cpp, indicating that the framedrops I am seeing are causing by something in drm_commit_thread.cpp.  I'm not well-versed enough with the code to understand the implications of that, however.
Comment 17 Michael Marley 2024-08-01 18:09:50 UTC
(When I said 15003ms, I actually meant 15003us, sorry.)
Comment 18 Michael Marley 2024-08-01 19:29:07 UTC
I just made a startling discovery.  The kwin packaging for Debian/Ubuntu (which I had been using all along) doesn't set CAP_SYS_NICE on kwin_wayland, so the attempt to set a higher priority for the input and output threads was of course failing.  Setting CAP_SYS_NICE manually alone makes a minor difference, but when combined with increasing the safetyMargin to 3000us, it almost entirely stops the frame dropping.

I know one of the maintainers, so I'm going to ask him why CAP_SYS_NICE isn't set.
Comment 19 Zamundaaa 2024-08-02 13:14:56 UTC
Okay, that's good. I mean, it's not good that you need a safety margin of 3ms, but I think it's possible to detect that with some generic code in KWin.