Bug 490829 - Use of nvidia-smi causes jittery desktop performance with GSP enabled
Summary: Use of nvidia-smi causes jittery desktop performance with GSP enabled
Status: RESOLVED UPSTREAM
Alias: None
Product: ksystemstats
Classification: Frameworks and Libraries
Component: General (show other bugs)
Version: unspecified
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Plasma Bugs List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-25 19:01 UTC by That One Seong
Modified: 2024-08-15 14:13 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description That One Seong 2024-07-25 19:01:23 UTC
SUMMARY
Using a system monitor widget set to monitor NVIDIA GPU activity/stats causes nvidia-smi to be opened repeatedly--however, this seems to be the cause of hitching in the Plasma Wayland session with either NV drivers 555 or 560.

STEPS TO REPRODUCE
1. Using an NVIDIA GPU without `nvidia.NVreg_EnableGpuFirmware=0` set, have a System Monitor sensor widget on a Plasma panel set to monitor said GPU.
2. Use the Wayland session--observe performance of window dragging or desktop effects be slower and noticeably choppy.

OBSERVED RESULT
Plasma desktop performance appears choppy with bad performance, almost like it's running at half the display refresh rate.

EXPECTED RESULT
Plasma desktop should be smooth.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Kernel 6.9
KDE Plasma Version: 6.1.5
KDE Frameworks Version: 6.4.0
Qt Version: 6.7.2

ADDITIONAL INFORMATION
Might be related to bug #487728 as it also only affects the Wayland session, though hard to say if this is also a supporting cause to Wl plasmashell crashes. Was suggested by an Nvidia contributor @ https://github.com/NVIDIA/open-gpu-kernel-modules/issues/538#issuecomment-2251021404
Comment 1 Arjen Hiemstra 2024-08-06 11:03:41 UTC
The problem with using the suggested library is that the headers are in a proprietary SDK that cannot be freely distributed, which means that it would make the NVidia GPU integration practically unbuildable on most machines. Even if we were to include the header in ksystemstats (which its license doesn't actually allow, but I see some projects do) we'd still be stuck since the library itself is bundled in the driver and that is generally also not installed on build machines.

So ultimately, running `nvidia-smi` is pretty much the only way we can support this without introducing a nasty build system issue. And frankly, it seems to me that it's an upstream issue anyway? Running `nvidia-smi` shouldn't have such an impact in the first place?
Comment 2 Milos Tijanic 2024-08-13 16:45:09 UTC
Hey there. Sorry, I misread the relevant code and thought that you were constantly spawning and killing the nvidia-smi process. IIUC that is not the case, you're just running `nvidia-smi pmon` in the background and parsing its output. This bypasses the common problem other monitoring tools had where the setup/teardown was causing the issue.

But, I now see where just `nvidia-smi pmon` can be a cause of stutter, because it is fetching a lot of data out of the GSP via NV2080_CTRL_CMD_PERF_GET_GPUMON_PERFMON_UTIL_SAMPLES_V2. Switching to NVML would not fix this. I'll have to look a bit deeper and see if there's a better way to get the needed info; and/or if this can get fixed in NVML or the driver itself.
Comment 3 Nate Graham 2024-08-15 14:13:30 UTC
Thanks Milos, looking forward to that!