Created attachment 171913 [details] Topology of 2970WX ThreadRipper SUMMARY The k10temp implementastion in the cpu_plugin of ksystemstats maps the value of tdie or tctl to all CPU cores. This ignores the fact, that there are multi-die CPUs like ThreadRipper. In `linuxcpuplugin.cpp`, `LinuxCpuPluginPrivate::addSensorsAmd()` is called once for each found k10temp sensor. Each call sets the found sensor for all cpu cores. On ThreadRipper Systems, more then one die is present (see attachment). This looses important information. Also, the k10temp sensor is blacklisted in the lmsensors plugin, despite the fact that this seems to be the only logical way to provide those temperatures. STEPS TO REPRODUCE 1. Run a threadripper system 2. get the temperatures for all CPU cores 3. compare them OBSERVED RESULT all values are identical EXPECTED RESULT the values should differ, as different cores are on different dies with different temperatures SOFTWARE/OS VERSIONS git commit b994c553f2e5d5d235f289c0112f1509b18e4e45 ADDITIONAL INFORMATION This may also be related to missing temperature values over all #474766
(In reply to Thomas Berger from comment #0) > This may also be related to missing temperature values over all #474766 Sorry, i used the wrong ID, here is the correct one: https://bugs.kde.org/show_bug.cgi?id=484019
Output of sensors: # sensors k10temp-pci-\* Di 23 Jul 2024 00:10:59 CEST k10temp-pci-00d3 Adapter: PCI adapter Tctl: +59.4°C Tdie: +32.4°C k10temp-pci-00c3 Adapter: PCI adapter Tctl: +60.0°C Tdie: +33.0°C Tccd1: +59.2°C Tccd2: +59.8°C Tccd3: +57.8°C k10temp-pci-00db Adapter: PCI adapter Tctl: +58.6°C Tdie: +31.6°C k10temp-pci-00cb Adapter: PCI adapter Tctl: +58.9°C Tdie: +31.9°C
Yeah, that would be nice, but there would need to be a reliable way to assign the sensors to the individual cpus. Just out of curiosity, could you post the output of grep -e . -r /sys/bus/pci/devices/0000\:00\:{18,19,1a,1b}.3/ Also, I wonder what are the Tccd* temperatures. Why are they only 3? And why are they so high?
Created attachment 171934 [details] PCI Bus devices
(In reply to Jiri Palecek from comment #3) > Yeah, that would be nice, but there would need to be a reliable way to > assign the sensors to the individual cpus. After further thoughts, i don't think this scales well on user side. Having 48 Temperature values for 4 DIEs, this will most likely get very confusing. I understand that there is the wish to keep the behavior identical across different systems with different CPU types, but AMD works differently, and especially on workstations with a high CPU count, having 4 clearly defined values instead of 12x4 may make more sense. > Just out of curiosity, could you post the output of > > grep -e . -r /sys/bus/pci/devices/0000\:00\:{18,19,1a,1b}.3/ See attachment. > Also, I wonder what are the Tccd* temperatures. Why are they only 3? And why > are they so high? From my understanding, those are the Tctl temperature of the other 3 dies.
Further investigations into https://bugs.kde.org/show_bug.cgi?id=484019 showed, that this "design issue" and the bug are indeed related. The current assumption of the code, that there will always be only one, and exactly one, k10temp sensor per AMD system (which is obviously false) leads to the error seen (missing values).
I would like to propose the following fix: 1. Add a new list of SensorProperties to LinuxAllCpusObject 2. Add each Tdie temperature (one per k10temp) to this list as "DIE%id temperature" 3. If more then one k10temp sensor is found REMOVE the temperature objects from the CPU cores This would allow users to monitor their temperatures on multi-DIE systems, while users with single-DIE CPUs don't have changes in their views. Instead of removing the temperature values from each CPU object, we could calculate an average over all DIE temperatures and provide them there, for consistency, but this may provide a false impression to the user about the current state of his thermals. I am willing to implement those changes and provide a patch.
Dear Maintainer, can please put your search for a perfect solution on hold, and instead go with the working solution. Your attempt to be user friendly has gotten out of bounds. To deny users access to the CPU Temperature sensors while the search for the perfekt solution is going on, isn't exactly user friendly. version 6.1.4 still no working Temp Sensors for multi die cpu There was a Working solution https://invent.kde.org/plasma/ksystemstats/-/merge_requests/84 I was forced with the update to KDE6 to uninstall ksysguard , therefore my patience is running a bit thin now.