Bug 444042 - Add total and per process GPU usage to system/activity monitor
Summary: Add total and per process GPU usage to system/activity monitor
Status: RESOLVED FIXED
Alias: None
Product: ksystemstats
Classification: Frameworks and Libraries
Component: General (other bugs)
Version First Reported In: unspecified
Platform: Other Linux
: NOR wishlist
Target Milestone: ---
Assignee: Plasma Bugs List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-19 13:02 UTC by Tvrtko Ursulin
Modified: 2025-05-21 12:49 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
intel_gpu_top prototype showing per client stats in action (110.74 KB, image/png)
2021-10-19 13:02 UTC, Tvrtko Ursulin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tvrtko Ursulin 2021-10-19 13:02:32 UTC
Created attachment 142623 [details]
intel_gpu_top prototype showing per client stats in action

This is a feature request which forks in two sub-requests.In the interest of full disclosure I am an Intel GPU kernel developer and second part of the feature request actually looks to find interesting userspace in order I can upstream my kernel feature.

First part of the feature request is to start showing overall GPU engine usage.

In case of Intel graphics the data is available for a few years now via the standard Linux/Perf API and also our own intel_gpu_top from igt-gpu-tools.

Consideration may be how to do this generically for different GPU vendors but with some abstraction in your code base I think it should be doable. Depending on the level of detail you want to show how the design would look like.

Second part is actually being able to show per process GPU usage. For that I am trying to drive a common specification (see https://patchwork.freedesktop.org/series/92574/) and need an userspace client so I can upstream my kernel work.

I'll attach a screen shot to show a working prototype.

Essentially I am looking for an interesting developer (or plural) to work together on pioneering the second feature to Linux desktop.
Comment 1 David Redondo 2021-10-20 09:20:02 UTC
Thanks for contacting us. In fact those two things are also something we are interested in!

When writing the new system monitor we also looked into how we could get gpu statistics, both total and and per PID.

For global engine  and vram usage we have abstractions in place to support different code paths for different vendors.
For example on AMD we read some sysfs files or on Nvidia run nvidia-smi.
Unfortunately we couldn't figure out a way to get global usage info for Intel GPUs . We concluded that we would require privileges for accessing perf events across all pids (as would running intel_gpu_top and parsing output).
(I just rechecked and apprarently now you only CAP_PERFMON instead of CAP_SYS_ADMIN, so maybe we have an oppportunity here with a helper binary like we do for per  process network speeds?)


A standard interface for per process GPU would indeed be amazing!
We can extend process info dynamically via plugins although
right now we only have one for nvidia which is running nvidia-smi in a different mode...
We actually found one of your earlier patches for this during research
https://lists.freedesktop.org/archives/intel-gfx/2020-September/248062.html and hoped for something like this for the future.
From glancing at your series am I right, that amdgpu already uses such an interface like you are proposing but it's not documented?

I think I can also speak for the other plasma-systemmonitor developers  that we are happy to collaborate. We are also available on #plasma, feel free to ping us (DavidRedondo, d_ed or ahiemstra) for a more real time conversation
Comment 2 Tvrtko Ursulin 2021-11-02 14:40:07 UTC
Yes CAP_PERFMON is required for global stats and then you could either get the data directly using perf_event_open(2) or going through intel_gpu_top. If you go for latter then note it has a JSON output mode which may be handy. Personally I prefer direct/light-weight solutions but it would be up to you.

Regarding per client usage and amdgpu, yes, it is exporting this data since commit 874442541133 ("drm/amdgpu: Add show_fdinfo() interface") (kernel 5.14 I believe). My proposal is to standardise the exported fields there, or at least allow for documenting at a single place. Because at the moment it does not look that i915 and amdgpu would be exporting the exact same format (ns vs % over integration time), but as long as vendors commit to interface stability it should be workable for userspace.

I would be very happy to support you with kernel patches to use for developing the prototype. And also with patches to intel_gpu_top which kind of shows how to read and interpret the data.
Comment 3 David Redondo 2022-01-24 07:58:22 UTC
Sorry for taking a while to respond, I made a poc to add for total intel gpu statistics  a while back https://invent.kde.org/plasma/ksystemstats/-/commit/1cd660c3c5c4f8a73978b79d94e8736298ce1e05 , it's a separate binary that can run with CAP_PERFMON which is run/read by ksystemstats. I hope the reading of the counters is correct but it worked at least on my system.

One question is how would one expose "total gpu usage" from those, I observed that adding all engines together can go over 100%. On the other hand taking the average usage per engine might also leave a wrong impression if playing a game at full performance but we would hypothetically only report 25%.
Comment 4 Tvrtko Ursulin 2022-01-25 11:59:03 UTC
Cool - do you have a screenshot at hand? :)

"Total GPU usage" is a good question which I think doesn't have a good answer. Neither max or normalized is correct when looked across different GPU engine "classes" (types?). To an extent it is possible to draw a parallel with the CPU world, where there are multiple engine instances of the same class. But not fully because whereas CPU cores are (for our practical purpose) functionally identical, GPU engines are not.

So if you go for normalized usage then user might see 25% load but the GPU could be truly be maxed out on the only engine which can run the workload in question. Parallel with CPU world is that user can see 25% CPU (quad-core example) when running a single threaded program, so it just can't go any faster despite 25%. If we look at it like that we can perhaps justify it.

If you go for max then of course the opposite goes - GPU can be 100% busy encoding a video stream but render engine might have plenty capacity to run users game/UI/whatever.

For me best answer could be to forgo "Total GPU usage" and show multiple engine classes - "GPU Render", "GPU video", etc. If you have a single graphing widget then you can overlay separate graphs on a single canvas.

What does KDE do here for other vendors?

For a discussion point, I have recently made a quick and dirty RFC against xosview:
https://github.com/tursulin/xosview/commit/c9cca738aeade15d3f46d182a9ca956a88effe72

There I did what I described above, apart that I did begrudgingly go for "max" for the numeric representation.

For your actual implementation two things stand out which will need improving. First is support for multiple GPUs (it's a thing since Intel entered discrete market with laptops containing both integrated and discrete Intel GPUs already on the market) and also support for more than a single engine of a class  (for instance the zero in i915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_VIDEO, 0) is only the first instance of this engine while some platforms have more than one).

Again, you can have a peek at my xosview prototype to see how I enumerate GPUs (class GPUList) and count engines on each  (class GPU).
Comment 5 Arjen Hiemstra 2022-01-25 12:27:31 UTC
> What does KDE do here for other vendors?

For AMD GPUs there is only one value exposed in sysfs, "gpu_busy_percentage" so that is what we use. I don't know what the AMDGPU driver uses to expose that value. For NVidia, we use the "nvidia_smi" executable which exposes an "SM", "ENC" and "DEC" value which we add together for the final GPU usage value.
Comment 6 Tvrtko Ursulin 2022-05-16 10:10:57 UTC
Progress update for the per process GPU utilisation - the common spec and the i915 implementation have been merged and should hit the 5.19 kernel.  At the same time AMD and Freedreno drivers have in progress working patchsets which implement the same. The two have not yet been merged upstream but there are no blockers.
Comment 7 David Redondo 2022-05-18 08:27:04 UTC
That's very good news!
Comment 8 Arjen Hiemstra 2025-05-21 12:49:21 UTC
Plasma 6.4 now includes both global GPU statistics for Intel GPUs as well as per-process statistics for Intel/AMD/Nvidia.