| Summary: | Add total and per process GPU usage to system/activity monitor | ||
|---|---|---|---|
| Product: | [Frameworks and Libraries] ksystemstats | Reporter: | Tvrtko Ursulin <tursulin> |
| Component: | General | Assignee: | Plasma Bugs List <plasma-bugs-null> |
| Status: | RESOLVED FIXED | ||
| Severity: | wishlist | CC: | ahiemstra, kde, kde, kde, nate, notmart |
| Priority: | NOR | ||
| Version First Reported In: | unspecified | ||
| Target Milestone: | --- | ||
| Platform: | Other | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: | intel_gpu_top prototype showing per client stats in action | ||
|
Description
Tvrtko Ursulin
2021-10-19 13:02:32 UTC
Thanks for contacting us. In fact those two things are also something we are interested in! When writing the new system monitor we also looked into how we could get gpu statistics, both total and and per PID. For global engine and vram usage we have abstractions in place to support different code paths for different vendors. For example on AMD we read some sysfs files or on Nvidia run nvidia-smi. Unfortunately we couldn't figure out a way to get global usage info for Intel GPUs . We concluded that we would require privileges for accessing perf events across all pids (as would running intel_gpu_top and parsing output). (I just rechecked and apprarently now you only CAP_PERFMON instead of CAP_SYS_ADMIN, so maybe we have an oppportunity here with a helper binary like we do for per process network speeds?) A standard interface for per process GPU would indeed be amazing! We can extend process info dynamically via plugins although right now we only have one for nvidia which is running nvidia-smi in a different mode... We actually found one of your earlier patches for this during research https://lists.freedesktop.org/archives/intel-gfx/2020-September/248062.html and hoped for something like this for the future. From glancing at your series am I right, that amdgpu already uses such an interface like you are proposing but it's not documented? I think I can also speak for the other plasma-systemmonitor developers that we are happy to collaborate. We are also available on #plasma, feel free to ping us (DavidRedondo, d_ed or ahiemstra) for a more real time conversation Yes CAP_PERFMON is required for global stats and then you could either get the data directly using perf_event_open(2) or going through intel_gpu_top. If you go for latter then note it has a JSON output mode which may be handy. Personally I prefer direct/light-weight solutions but it would be up to you.
Regarding per client usage and amdgpu, yes, it is exporting this data since commit 874442541133 ("drm/amdgpu: Add show_fdinfo() interface") (kernel 5.14 I believe). My proposal is to standardise the exported fields there, or at least allow for documenting at a single place. Because at the moment it does not look that i915 and amdgpu would be exporting the exact same format (ns vs % over integration time), but as long as vendors commit to interface stability it should be workable for userspace.
I would be very happy to support you with kernel patches to use for developing the prototype. And also with patches to intel_gpu_top which kind of shows how to read and interpret the data.
Sorry for taking a while to respond, I made a poc to add for total intel gpu statistics a while back https://invent.kde.org/plasma/ksystemstats/-/commit/1cd660c3c5c4f8a73978b79d94e8736298ce1e05 , it's a separate binary that can run with CAP_PERFMON which is run/read by ksystemstats. I hope the reading of the counters is correct but it worked at least on my system. One question is how would one expose "total gpu usage" from those, I observed that adding all engines together can go over 100%. On the other hand taking the average usage per engine might also leave a wrong impression if playing a game at full performance but we would hypothetically only report 25%. Cool - do you have a screenshot at hand? :) "Total GPU usage" is a good question which I think doesn't have a good answer. Neither max or normalized is correct when looked across different GPU engine "classes" (types?). To an extent it is possible to draw a parallel with the CPU world, where there are multiple engine instances of the same class. But not fully because whereas CPU cores are (for our practical purpose) functionally identical, GPU engines are not. So if you go for normalized usage then user might see 25% load but the GPU could be truly be maxed out on the only engine which can run the workload in question. Parallel with CPU world is that user can see 25% CPU (quad-core example) when running a single threaded program, so it just can't go any faster despite 25%. If we look at it like that we can perhaps justify it. If you go for max then of course the opposite goes - GPU can be 100% busy encoding a video stream but render engine might have plenty capacity to run users game/UI/whatever. For me best answer could be to forgo "Total GPU usage" and show multiple engine classes - "GPU Render", "GPU video", etc. If you have a single graphing widget then you can overlay separate graphs on a single canvas. What does KDE do here for other vendors? For a discussion point, I have recently made a quick and dirty RFC against xosview: https://github.com/tursulin/xosview/commit/c9cca738aeade15d3f46d182a9ca956a88effe72 There I did what I described above, apart that I did begrudgingly go for "max" for the numeric representation. For your actual implementation two things stand out which will need improving. First is support for multiple GPUs (it's a thing since Intel entered discrete market with laptops containing both integrated and discrete Intel GPUs already on the market) and also support for more than a single engine of a class (for instance the zero in i915_PMU_ENGINE_BUSY(I915_ENGINE_CLASS_VIDEO, 0) is only the first instance of this engine while some platforms have more than one). Again, you can have a peek at my xosview prototype to see how I enumerate GPUs (class GPUList) and count engines on each (class GPU). > What does KDE do here for other vendors?
For AMD GPUs there is only one value exposed in sysfs, "gpu_busy_percentage" so that is what we use. I don't know what the AMDGPU driver uses to expose that value. For NVidia, we use the "nvidia_smi" executable which exposes an "SM", "ENC" and "DEC" value which we add together for the final GPU usage value.
Progress update for the per process GPU utilisation - the common spec and the i915 implementation have been merged and should hit the 5.19 kernel. At the same time AMD and Freedreno drivers have in progress working patchsets which implement the same. The two have not yet been merged upstream but there are no blockers. That's very good news! Plasma 6.4 now includes both global GPU statistics for Intel GPUs as well as per-process statistics for Intel/AMD/Nvidia. |