Bug 481815

Summary: GPU marked as "GPU 2" even with only one GPU
Product: [Frameworks and Libraries] ksystemstats Reporter: Jonathan Croteau-Dicaire <jonathan.croteau.dicaire>
Component: GeneralAssignee: Plasma Bugs List <plasma-bugs-null>
Status: CONFIRMED ---    
Severity: minor CC: ahiemstra, kdedev, nai.xia, nate, plasma-bugs-null, strong.drum0546
Priority: NOR    
Version First Reported In: 6.2.2   
Target Milestone: ---   
Platform: Neon   
OS: Linux   
See Also: https://bugs.kde.org/show_bug.cgi?id=503082
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: Sensor list with only GPU 2

Description Jonathan Croteau-Dicaire 2024-02-25 15:22:51 UTC
Created attachment 166092 [details]
Sensor list with only GPU 2

SUMMARY
I recently changed from an Nvidia (3070) to an AMD GPU (7900 XT). Now the plasma-systemmonitor and the widget see the new GPU as GPU 2 even if there is no GPU 1 present. I am not sure if this is a bug with my setup that would happen even without the fact that I had an Nvidia in this system before or a remnant of the Nvidia GPU.
I don't think this is the correct behaviour and if there is only one GPU it should be marked as GPU 0

OBSERVED RESULT
The only GPU in the sensor list is GPU 2

EXPECTED RESULT
The only GPU in the sensor list should be GPU 0

SOFTWARE/OS VERSIONS
Operating System: EndeavourOS 
KDE Plasma Version: 5.27.10
KDE Frameworks Version: 5.115.0
Qt Version: 5.15.12
Kernel Version: 6.7.6-arch1-1 (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 3700X 8-Core Processor
Memory: 31.3 GiB of RAM
Graphics Processor: AMD Radeon RX 7900 XT
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: A520M DS3H AC
Comment 1 Jonathan Croteau-Dicaire 2024-02-26 23:13:13 UTC
I completely forgot to mention.
I know my way around programming, I was a Qt developer at another job. So I don't necessary need someone else to do the patch.
A pointer to what to look for (if this is a real issue and not some strange expected behaviour that I fail to understand) could help me make a patch if necessary.
Comment 2 Arjen Hiemstra 2024-02-29 17:15:03 UTC
So to a certain extent this is intentional: The GPU plugin uses the DRM device number to determine what each device is. This is because of a previous bug where just blindly iterating "/dev/drm/card*" resulted in cards changing across reboots because the order wasn't necessarily fixed. See https://invent.kde.org/plasma/ksystemstats/-/blob/master/plugins/gpu/LinuxBackend.cpp?ref_type=heads#L35 for the relevant code. The reason you end up with card 2 is most likely that there's an iGPU as card 1 that we don't support.

Ultimately I think the way to solve this isn't to change the indexes or some such, but rather come up with a better way of naming these devices. It's actually rather ugly that we have "GPU x" as names and I'd prefer it if we could instead have something more like a proper device name. We do have a sensor exposing the device name from udev, but I seem to recall we don't use it because it can be rather lengthy. So it might need some extra work to come up with a scheme, but I think it would be a more proper solution.
Comment 3 Jonathan Croteau-Dicaire 2024-02-29 18:37:29 UTC
(In reply to Arjen Hiemstra from comment #2)
> So to a certain extent this is intentional: The GPU plugin uses the DRM
> device number to determine what each device is. This is because of a
> previous bug where just blindly iterating "/dev/drm/card*" resulted in cards
> changing across reboots because the order wasn't necessarily fixed. See
> https://invent.kde.org/plasma/ksystemstats/-/blob/master/plugins/gpu/
> LinuxBackend.cpp?ref_type=heads#L35 for the relevant code. The reason you
> end up with card 2 is most likely that there's an iGPU as card 1 that we
> don't support.

That pretty strange. My CPU (Ryzen 7 3700X) doesn't have any iGPU and I removed my NVIDIA before adding the AMD GPU.
I don't have a /dev/drm/ folder, but I got a /dev/dri/ folder. I only got card1 and renderD128 in it (and a by-path folder)
Comment 4 Jonathan Croteau-Dicaire 2024-02-29 18:49:18 UTC
I think I found something interesting
The return of the command : 
`kstatsviewer --list | grep gpu`
```
gpu/all/usage All GPUs Usage
gpu/gpu1/power1 GPU 2 PPT
gpu/gpu1/totalVram GPU 2 Total Video Memory
gpu/gpu1/temp2 GPU 2 junction
gpu GPU
gpu/gpu1/usage GPU 2 Usage
gpu/all/usedVram All GPUs Used Memory
gpu/gpu1 GPU 2
gpu/gpu1/usedVram GPU 2 Video Memory Used
gpu/gpu1/temperature GPU 2 Temperature
gpu/all/totalVram All GPUs Total Memory
gpu/gpu1/memoryFrequency GPU 2 Memory Frequency
gpu/gpu1/in0 GPU 2 vddgfx
gpu/gpu1/power GPU 2 Power
gpu/all All GPUs
gpu/gpu1/temp3 GPU 2 mem
gpu/gpu1/coreFrequency GPU 2 Frequency
gpu/gpu1/name GPU 2 Name
gpu/gpu1/fan1 GPU 2 Fan 1
```
It seems that the software is correctly detecting my gpu as gpu1, but giving it the name GPU 2
Comment 5 Jonathan Croteau-Dicaire 2024-02-29 19:04:03 UTC
I cloned the project on my desktop and compiled it.
I see that there is three project kstatsviewer, ksystemstats and ksystemstatstest. If I want to investigate why my GPU got the name GPU 2 which one should I use (and with which argument) and do you have pointer or advice on how to set up my environment to investigate and understand why this is happening? ksystemstats seems to have a replace option that could be useful to redo the discovery phase of the sensor, I assume. But I prefer to have your advice before doing random things.
Comment 6 Jonathan Croteau-Dicaire 2024-02-29 20:28:01 UTC
I found this merge request : https://github.com/dylanaraps/neofetch/issues/1646 for neofetch
One person sent this suggestion : glxinfo -B | grep -Po '(?<=^OpenGL renderer string: ).*(?= \(.*\)$)'
(https://github.com/dylanaraps/neofetch/issues/1646)
On my side, it seems to return a reasonable GPU name (https://github.com/dylanaraps/neofetch/issues/1646),
but I don't like the idea of relying on the output of a command to get the GPU name
Comment 7 Arjen Hiemstra 2024-03-01 09:27:19 UTC
(In reply to Jonathan Croteau-Dicaire from comment #5)
> I cloned the project on my desktop and compiled it.
> I see that there is three project kstatsviewer, ksystemstats and
> ksystemstatstest. If I want to investigate why my GPU got the name GPU 2
> which one should I use (and with which argument) and do you have pointer or
> advice on how to set up my environment to investigate and understand why
> this is happening?

Generally I'd recommend setting up an environment following https://community.kde.org/Get_Involved/development so that you can build all dependencies as well. That said, if you already managed to build ksystemstats that's probably fine as well, it doesn't necessarily depend on the latest version of everything. If you have a build, there is a "prefix.sh" script in the build dir that, when sourced, sets up the right environment variables so the installed application should run correctly.

>ksystemstats seems to have a replace option that could be useful to redo the discovery phase of the sensor, I assume. 

For development, I generally run with "--replace --remain", replace for replacing any running instance, remain so it doesn't quit when you don't have anything using it. Starting ksystemstats will actually redo any discovery and initialization that it needs to do for its sensors, there is no cached data.
Comment 8 Arjen Hiemstra 2024-03-01 09:34:00 UTC
> It seems that the software is correctly detecting my gpu as gpu1, but giving it the name GPU 2

Oh huh. That is rather odd.(In reply to Jonathan Croteau-Dicaire from comment #6)
> I found this merge request :
> https://github.com/dylanaraps/neofetch/issues/1646 for neofetch
> One person sent this suggestion : glxinfo -B | grep -Po '(?<=^OpenGL
> renderer string: ).*(?= \(.*\)$)'
> (https://github.com/dylanaraps/neofetch/issues/1646)
> On my side, it seems to return a reasonable GPU name
> (https://github.com/dylanaraps/neofetch/issues/1646),
> but I don't like the idea of relying on the output of a command to get the
> GPU name

Yeah, we don't want to be running random tools for it, however, it might be possible to figure out where glxinfo gets its name from and use that code directly. Come to think of it, glxinfo just creates an OpenGL context and reads the renderer string from that. We could actually try and do the same, or dig into where the rendering string comes from.

One somewhat more tricky case when using renderer string is dealing with things like the iGPU on my machine, which is just listed as "AMD Radeon Graphics". I also seem to recall some machines end up with even worse things like "AMD Ryzen Processor with Integrated Radeon Graphics" or somesuch, which is a little too long.
Comment 9 Arjen Hiemstra 2024-03-01 09:38:53 UTC
> On my side, it seems to return a reasonable GPU name (https://github.com/dylanaraps/neofetch/issues/1646),
but I don't like the idea of relying on the output of a command to get the GPU name

Actually, the name udevadm outputs is the name we already expose as the "name" sensor. Compared to the renderer string it's not the most useful however, my GPU name is currently listed as "Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT]" whereas the renderer string has the right name "AMD Radeon RX 6700 XT".

Note that you can use kstatsviewer to query ksystemstats for sensors, so "kstatsviewer gpu/gpu0/name" outputs the name sensor for me.
Comment 10 Arjen Hiemstra 2024-03-01 09:42:30 UTC
(In reply to Jonathan Croteau-Dicaire from comment #4)
> I think I found something interesting
> The return of the command : 
> `kstatsviewer --list | grep gpu`
> ```
> gpu/all/usage All GPUs Usage
> gpu/gpu1/power1 GPU 2 PPT
> gpu/gpu1/totalVram GPU 2 Total Video Memory
> gpu/gpu1/temp2 GPU 2 junction
> gpu GPU
> gpu/gpu1/usage GPU 2 Usage
> gpu/all/usedVram All GPUs Used Memory
> gpu/gpu1 GPU 2
> gpu/gpu1/usedVram GPU 2 Video Memory Used
> gpu/gpu1/temperature GPU 2 Temperature
> gpu/all/totalVram All GPUs Total Memory
> gpu/gpu1/memoryFrequency GPU 2 Memory Frequency
> gpu/gpu1/in0 GPU 2 vddgfx
> gpu/gpu1/power GPU 2 Power
> gpu/all All GPUs
> gpu/gpu1/temp3 GPU 2 mem
> gpu/gpu1/coreFrequency GPU 2 Frequency
> gpu/gpu1/name GPU 2 Name
> gpu/gpu1/fan1 GPU 2 Fan 1
> ```
> It seems that the software is correctly detecting my gpu as gpu1, but giving
> it the name GPU 2

The name is actually correct, the first GPU should actually be "gpu0", it uses 0-based indexing.

> That pretty strange. My CPU (Ryzen 7 3700X) doesn't have any iGPU and I removed my NVIDIA before adding the AMD GPU.
> I don't have a /dev/drm/ folder, but I got a /dev/dri/ folder. I only got card1 and renderD128 in it (and a by-path folder

Oh I confused "/dev/dri" with "/sys/class/drm", dri is the right one. Since you only have "card1" there it's actually udev already that uses the wrong index, since the first card *should* be "card0".
Comment 11 Jonathan Croteau-Dicaire 2024-03-01 15:57:11 UTC
Thank you,
I now have a setup where I can code, compile and debug ksystemstats and the GPU plugin.

I was thinking, maybe we should count graphics card based on vendor. Instead of counting GPU 1..2..3..X we could have AMD 1, NVIDIA 1, INTEL 1 etc. I don't think we should remove the number to keep things consistent with other sensor. This would help people that have an AMD+NVIDIA and INTEL+NVIDIA or INTEL+AMD better identify which GPU is witch. The only setup where it would change "nothing" is an AMD+AMD setup. 

We could maybe also use a sort based on pci port used to keep things consistent between reboot. This solution could be implemented without doing my other suggestion.

I am not sure if I want to change the gpuId (I assume this is the string used by the monitor app to save witch gpu they report). In my case, this is because it broke when I changed my graphics card that I investigated what happened. But at the same time, we could argue that it is an udev issues and not a KDE one. But I also think that it doesn't make any sense to jump number in our numbering.
So, I am asking for your feedback on these two suggestions and if we want to change the gpuId too.
Comment 12 Nai Xia 2024-11-06 02:32:35 UTC
(In reply to Jonathan Croteau-Dicaire from comment #11)
> Thank you,
> I now have a setup where I can code, compile and debug ksystemstats and the
> GPU plugin.
> 
> I was thinking, maybe we should count graphics card based on vendor. Instead
> of counting GPU 1..2..3..X we could have AMD 1, NVIDIA 1, INTEL 1 etc. I
> don't think we should remove the number to keep things consistent with other
> sensor. This would help people that have an AMD+NVIDIA and INTEL+NVIDIA or
> INTEL+AMD better identify which GPU is witch. The only setup where it would
> change "nothing" is an AMD+AMD setup. 
> 
> We could maybe also use a sort based on pci port used to keep things
> consistent between reboot. This solution could be implemented without doing
> my other suggestion.
> 
> I am not sure if I want to change the gpuId (I assume this is the string
> used by the monitor app to save witch gpu they report). In my case, this is
> because it broke when I changed my graphics card that I investigated what
> happened. But at the same time, we could argue that it is an udev issues and
> not a KDE one. But I also think that it doesn't make any sense to jump
> number in our numbering.
> So, I am asking for your feedback on these two suggestions and if we want to
> change the gpuId too.

Same here. And this "GPU X" name changes between reboots on my Dell XPS 9500 with a NV card. 
It ranges randomly from "GPU 1" to "GPU 3",  so my systemmonitor widget keep losing its source.
Comment 13 Nai Xia 2024-11-30 06:18:43 UTC
Hi guys, any updates on this bug fix ?
Or, its fix is already on this way to KDE neon?
Comment 14 Lenzoid 2025-05-10 21:30:28 UTC
@Nai Xia, sorry to hear of this. If this issue is still current for you, you could consider opening a new report, since it seems to be a different thing at play here than the original report.
Comment 15 Jonathan Croteau-Dicaire 2025-05-12 01:12:44 UTC
Having a consistent name could be a solution for his issues too, I think.
I just saw that the info center is able to know if the Graphics processor is a discrete one. Maybe we could have two series of name (discrete and integrated) and be sure that each start at one. That would fix the issues of people not knowing which GPU is the discrete one, and should be stable between reboot for most people.
Comment 16 Nate Graham 2025-05-13 16:18:17 UTC
I can reproduce this on my AMD laptop with only a 780M iGPU. The fancy new GPU usage graph on the history page says "GPU 2"
Comment 17 TraceyC 2025-05-15 19:26:15 UTC
I cannot reproduce this on a Lenovo Flex laptop which only has the original integrated AMD GPU

kstatsviewer shows it as
gpu/gpu0 GPU 1

Same on the Steamdeck
Comment 18 Nate Graham 2025-05-16 13:13:51 UTC
On my affected system:

$ kstatsviewer --list | grep -i gpu
gpu/gpu1/in1 GPU 2 vddnb
gpu/all All GPUs
gpu GPU
gpu/all/usage All GPUs Usage
gpu/gpu1/usedVram GPU 2 Video Memory Used
gpu/gpu1/temperature GPU 2 Temperature
gpu/gpu1/usage GPU 2 Usage
gpu/gpu1/power GPU 2 Power
gpu/gpu1/in0 GPU 2 vddgfx
gpu/gpu1/totalVram GPU 2 Total Video Memory
gpu/all/totalVram All GPUs Total Memory
gpu/gpu1/name GPU 2 Name
gpu/gpu1/coreFrequency GPU 2 Frequency
gpu/gpu1/memoryFrequency GPU 2 Memory Frequency
gpu/all/usedVram All GPUs Used Memory
gpu/gpu1/power1 GPU 2 PPT
gpu/gpu1 GPU 2


$ ls /dev/dri/
by-path  card1  renderD128
Comment 19 Jonathan Croteau-Dicaire 2025-05-16 14:56:33 UTC
I kinda remember seeing that Render* was evaluating before my actual GPU and bumping the count even if it was supposed to be ignored. 
At the time I thought about simply putting back the count by one in this situation but I was not sure it was the solution we would want because if I am not mistaken it would break the custom page for user (gpu1 would become gpu0 again) 
I will have a complete week off between 2 jobs soon. So if we decide to go this way I could implement this fix and test it with my setup.
Comment 20 Arjen Hiemstra 2025-05-20 09:50:45 UTC
Moving this to ksystemstats as that's where the paths come from.