Bug 499609 - Cursor becomes VERY sluggish after a random amount of time
Summary: Cursor becomes VERY sluggish after a random amount of time
Status: RESOLVED UPSTREAM
Alias: None
Product: kwin
Classification: Plasma
Component: performance (other bugs)
Version First Reported In: 6.2.4
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-02-06 19:38 UTC by Evert Vorster
Modified: 2025-02-10 13:23 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
Color profile I am using (9.64 KB, application/vnd.iccprofile)
2025-02-06 19:38 UTC, Evert Vorster
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Evert Vorster 2025-02-06 19:38:56 UTC
Created attachment 178026 [details]
Color profile I am using

SUMMARY
Sometimes the cursor becomes very sluggish, with about one position update per second. The only way to fix this was a reboot. 
I can't see anything wrong with the system, nothing untoward in journal or dmesg, and no high load on any cpu core, both GPU's are idling, there is absolutely no reason it should be doing this... 

STEPS TO REPRODUCE
1. Use laptop with dual GPU, and high resolution display. Wayland session, custom ICC color profile, and set display resolution to non-native resolution.
2. Wait. Sometimes minutes, sometimes hours. 

OBSERVED RESULT
Cursor becomes VERY sluggish. Less than one update per second. 

EXPECTED RESULT
Cursor should never become sluggish. 

SOFTWARE/OS VERSIONS
Operating System: Arch Linux 
KDE Plasma Version: 6.2.4
KDE Frameworks Version: 6.8.0
Qt Version: 6.8.0
Kernel Version: 6.12.10-arch1-1.1-g14 (64-bit)
Graphics Platform: Wayland
Processors: 32 × AMD Ryzen 9 7945HX3D with Radeon Graphics
Memory: 62.0 GiB of RAM
Graphics Processor: AMD Radeon 610M
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: ROG Strix G733PYV_G733PYV
System Version: 1.0

ADDITIONAL INFORMATION
Some background info, not sure if any of this will help. 
Set this Arch Linux system to be only updated to a day before python 3.13 was released on Arch (Early December) for reasons, but it explains why I am running such a relatively old version of kwin for an Arch install. 
I normally have a Wayland session, and use the fractional scaling to get my high resolution monitor to be readable by my old eyes. 
Was running into the issue of having a kernel oops on the amd driver, similar to this one:https://bbs.archlinux.org/viewtopic.php?id=300299
There are two ways around this issue, one is to set a custom ICC color profile, and another is to have the kernel switch amdgpu.dcdebugmask=0x10.
For the longest time, I was running the kernel switch, but then I got curious, and found the icc profile I will attach to this bug. Switched that on, and the kernel switch off. The error stayed away, and the only weirdness I saw was that sometimes on the panel around the clock the screen would be corrupted. Whelp, turn off transparency, and all seemed OK. 

Now, I am also playing some games, and the scaling of the monitor gives weird values through to the games, and I would like to have exactly 1920x1080. Unfortunately, no scaling option exists that gives this resolution through to games. (My native resolution is 2560x1440, exactly one third more, and impossible to express in percentages. Math is cruel. 
It is important to note that up to this point, I have not seen the cursor slow down issue. 

I then decided it would be a good idea to just set the screen resolution to 1920x1080, and scaling to 100%. It seemed ideal, because the games would get the nice powers-of-two resolutions, and I could see no downsides to this approach. (I can't see the extra resolution on my monitor anyways... as I mentioned, I am old.)

Unfortunately, about two hours into a session, my cursor became extremely sluggish. I check CPU loads, and none were busy. I checked dmesg, and nothing untoward in there, same with the journal. Checked the CPUs with mission-control, ran glances in a terminal, and all did not show anything untoward. 
Logged out of the session, and still the cursor was lagging, so it was not anything that was running as my user. 
The only thing that was changed was the running of non-native resolution, and this does not make any sense to me!

Rebooting the laptop brought temporary relief, but sometimes it would only last minutes before the cursor became sluggish again. Tried cold-booting it. 

When running native resolutions and scaling, the issue does not show up. But, I want the perfect power of two resolutions, so I am looking for a solution where I can have a non-native resolution and not have the cursor become sluggish. 

I have now plugged in the amdgpu.dcdebugmask=0x10 kernel parameter, and have not seen the issue since, but I find it hard to believe that this could be a driver issue. I will continue checking, of course. That kernel parameter disables PSR on the amdgpu, and I don't like it, as it messes up my power management on the laptop, which is why I went over to the ICC profile. 

Any ideas of what to check the next time this shows up would be greatly appreciated. Also, if this rings any bells, point me in the direction of the literature I will happily jump down that rabbit hole.
Comment 1 Zamundaaa 2025-02-06 20:49:47 UTC
> I have now plugged in the amdgpu.dcdebugmask=0x10 kernel parameter, and have not seen the issue since, but I find it hard to believe that this could be a driver issue
Why do you find that hard to believe?

https://gitlab.freedesktop.org/drm/amd/-/issues/2858 is identical to this; I'm not sure why it got closed but you can create a new issue about this at https://gitlab.freedesktop.org/drm/amd/-/issues.
In general amdgpu's PSR support is a bit broken with the latest kernels, I too have it disabled on my laptop to work around a similar issue :/
Comment 2 Evert Vorster 2025-02-07 05:16:21 UTC
Don't just close a ticket without confirming, please.

I read the linked bug report, and calling it identical is wrong. 
Some differences: 
1. The ticket describes the cursor being laggy on  one screen basically all the time. 
In my case the cursor speed is fine for a random amount of time. Could be minutes, could be hours. THEN it slows down. 
2. The ticket describes multiple monitors, on my system it is a single monitor. 
3. My issue shows up only when I have the magic trifecta of a non-native resolution, Wayland with custom ICC profile and no kernel parameters, the linked issue has it all the time.

I did quite a bit of searching through Google on this issue before I picked kwin as an entry point. I could not find anything remotely like this on the web. 

The reason I picked kwin as an entry point is that it uses Wayland and draws a software cursor.
What I would like from the fine folks here is some pointers on what to check when this bug shows up. Does kwin have a debug mode?
I'll go ahead and open a ticket with the amdgpu guys as well, and attack this issue from both ends.
Comment 3 Evert Vorster 2025-02-10 06:14:41 UTC
Cross-filed here: https://gitlab.freedesktop.org/drm/amd/-/issues/3950
Let's see where this bug actually resides!
Comment 4 Zamundaaa 2025-02-10 13:23:34 UTC
I did confirm that it's a driver bug, and that doesn't change with minor differences to another bug report. If the problem goes away when you disable PSR, it can literally only ever be a kernel bug.
Userspace doesn't even know about PSR, all changes you make in userspace that happen to work around the bug presumably just happen to change kernel heuristics about when to enable PSR.

> The reason I picked kwin as an entry point is that it uses Wayland and draws a software cursor
It doesn't draw a software cursor unless you force it to. What you're seeing is the hardware cursor being buggy.