Bug 499560 - plasmashell allocates 500-1500 MB of VRAM if a monitor connects or disconnects
Summary: plasmashell allocates 500-1500 MB of VRAM if a monitor connects or disconnects
Status: REOPENED
Alias: None
Product: plasmashell
Classification: Plasma
Component: generic-performance (show other bugs)
Version: 6.3.2
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: 1.0
Assignee: Plasma Bugs List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-02-05 19:15 UTC by Kai Krakow
Modified: 2025-02-28 22:56 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kai Krakow 2025-02-05 19:15:08 UTC
SUMMARY

According to nvidia-smi, plasmashell leaks a lot of VRAM over a period of less than 20 hours even. This also happens while the system is actively used, e.g. during gaming, and finally leads to low gaming performance, render problems, OBS render dropouts, and some apps may crash. That instability situation has become a lot better since I last tried but it still exists and the following dmesg output can be observed:

[73453.409090] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[73453.409420] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[73453.426989] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[73453.427009] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[73469.422839] NVRM: Xid (PCI:0000:01:00): 69, pid=508775, name=chrome, Class Error: ChId 0056, Class 0000902d, Offset 0000023c, Data 00000000, ErrorCode 00000004
[73483.852230] NVRM: Xid (PCI:0000:01:00): 69, pid=508839, name=chrome, Class Error: ChId 0056, Class 0000902d, Offset 0000023c, Data 00000000, ErrorCode 00000004

But this one is not the point here (nvidia-drivers should probably handle that better), it's just a side effect of plasmashell leaking VRAM memory at an exceptionally high rate while the system is idle.

I've used `kcmshell6 qtquicksettings` to switch between automatic and threaded render loop, than restart to apply. With threaded render loop, the rate at which plasmashell leaks VRAM seems to be 3-4 times lower. Thus I'm currently running with threaded render loop.

STEPS TO REPRODUCE

1. Reboot into a wayland session, open some Chrome windows and Steam, one control bar per monitor
2. check VRAM usage with nvidia-smi
3. turn off monitors, leave the system idle for 20 hours, turn monitors back on
4. check VRAM usage with nvidia-smi

OBSERVED RESULT

Wed Feb  5 01:23:34 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16              Driver Version: 570.86.16      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
|    0   N/A  N/A          469044      G   /usr/bin/plasmashell                    452MiB |
+-----------------------------------------------------------------------------------------+

Wed Feb  5 19:47:26 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16              Driver Version: 570.86.16      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
|    0   N/A  N/A          469044      G   /usr/bin/plasmashell                   1772MiB |
+-----------------------------------------------------------------------------------------+

VRAM allocations of plasmashell went from 452 MB to 1772 MB.

Overall VRAM usage went from 2062 MB to 3127 MB.

EXPECTED RESULT

plasmashell VRAM usage should not go way above the initial value, especially if the system is left otherwise.

SOFTWARE/OS VERSIONS

Operating System: Gentoo Linux 2.17
KDE Plasma Version: 6.2.5
KDE Frameworks Version: 6.9.0
Qt Version: 6.8.1
Kernel Version: 6.12.12-gentoo (64-bit)
Graphics Platform: Wayland
Processors: 20 × 12th Gen Intel® Core™ i7-12700K
Memory: 31.1 GiB of RAM
Graphics Processor: NVIDIA GeForce RTX 3080 Ti/PCIe/SSE2

ADDITIONAL INFORMATION

I'm not sure if plasma notifications play a role here, I have multiple websites open which are allowed to display notifications. Also, I'm using the wallpaper-of-the-day features which changes the wallpaper every so often. The wallpaper probably changed only once so it should not have a big impact.

Monitor setup, none rotated, HDR disabled, VRR auto:

1. Left, 16:18 2560x2880, plasma bar at the left
2. Center, 16:9 3840x2160, plasma bar at the bottom
3. Right, 16:9 3840x2160, plasma bar at the right
4. TV, clone of (2), 3840x2160
Comment 1 David Redondo 2025-02-06 15:05:13 UTC
Doesn't happen on my nvidia machine. Please note that you are running a Beta driver.
Comment 2 Kai Krakow 2025-02-06 21:51:18 UTC
 Instead of just closing this as "works for me", how about writing which driver version you are using, so I could try that.

Closing it just because it works for you without even giving any hint what should work, is not really helpful.

But okay, let's reopen this after the driver became stable. I'm sure the problem still exists then because it existed with previously stable drivers, just that this driver is the first version I can seriously daily drive wayland now.
Comment 3 David Redondo 2025-02-07 07:54:27 UTC
I am using 550
Comment 4 Kai Krakow 2025-02-07 14:45:20 UTC
Thanks. I may give 550 a try.

After using the system with high VRAM usage for a while, I noticed that plasmashell VRAM allocations goes down again to the original value.

So it's not a leak per-se but high temporary usage. And it may be related to turning the monitors off and back on. I'll observe this with the monitors still off but logged in via SSH, and then report back.

So far, increased VRAM usage of plasmashell has been mostly gone while using the system. As initially written, the issue is not about the driver complaining about allocation errors during high VRAM usage (that's a driver bug and it should handle that better). This is purely about plasmashell allocating a lot of VRAM, which of course makes the problem of the driver more apparent.
Comment 5 Kai Krakow 2025-02-07 15:41:03 UTC
I just logged in remotely. The monitors are off, and the VRAM usage is even below the "normal" usage:

|    0   N/A  N/A          848779      G   /usr/bin/plasmashell                    335MiB |

I'm not yet at home so I cannot see how this changes the moment I'll turn the monitors back on. I will try each monitor individually.
Comment 6 Kai Krakow 2025-02-11 18:09:51 UTC
So I've watched plasmashell with `nvidia-smi pmon -s m`, and each time I turn a monitor off and back on, it adds about 500-1500 MB of VRAM allocations to plasmashell. The actual value is not consistent and fluctuates. There's probably some garbage collection going on in the driver in the background which can also be observed by waiting for some minutes and checking again and the VRAM allocation goes slowly down again.

But the amount of VRAM allocated - even temporarily - could be (and actually is) quite disruptive to whatever is currently also running. As an example, if my TV goes into deep sleep (which happens some hours after turning it off), it disconnects from the HDMI port and causes severe stutters for 20-30s in games, and some games eventually even do not properly recover because some allocations may stay in system memory.

While we can probably say that the driver should manage memory in a smarter way and should be able to migrate memory back from system to VRAM, I'm still wondering why plasmashell allocated such a huge amount of VRAM when a monitor is disconnected or reconnected. This probably shouldn't happen in the first place.

I cannot quite follow the argument how this huge allocation should be an issue with the driver, whether it's a beta driver or not. In the end, plasmashell is probably the one asking for VRAM. How the driver cleans up is a completely different case.

If this issue persists with the stable driver, I'm going to request to reopen this report.

For now, I'm adjusting the title to better reflect what is happening.
Comment 7 Kai Krakow 2025-02-28 22:56:33 UTC
Reopening because the issue can be observed with the stable driver 570.124.04.

Bumping to 6.3.2 because I've also updated Plasma and the problem persists.

Disconnect a monitor and connect it again (my monitors disconnect if I turn them off, I know that some monitors don't do this), and plasma-shell VRAM usage increases by 500-1500 MB, potentially disrupting the performance of currently running applications or even cause crashes due to VRAM exhaution.

TBF, waiting some minutes (5-20 minutes), the VRAM usage slowly decreases again. But it should not spike in the first place - at least not by such vast amount just because a new monitor has been connected, or one monitor has been disconnected.