Bug 497024

Summary: Reading temp prevent D3cold mode on Nvidia GPU
Product: [Frameworks and Libraries] ksystemstats Reporter: alborto <alborto>
Component: GeneralAssignee: Plasma Bugs List <plasma-bugs-null>
Status: REPORTED ---    
Severity: normal CC: ahiemstra, john.kizer
Priority: NOR    
Version First Reported In: 6.2.4   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description alborto 2024-12-04 06:15:03 UTC
Hello, I use Thermal Monitor, https://github.com/kotelnik/plasma-applet-thermal-monitor
When I enable Nvidia GPU temperature reading on a laptop with dual AMD/Nvidia graphics cards in hybrid mode  with Nvidia Prime drivers, this does not allow the Nvidia card to go into D3cold mode.

On Gnome DE I use the Freon, https://github.com/UshakovVasilii/gnome-shell-extension-freon,  extension that instead, when the discrete GPU goes into stand by mode,  shows NA instead of the temperature.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Endevour OS updated till today
KDE Plasma Version: 6.2.4
KDE Frameworks Version: 6.8.0
Qt Version: 6.8.1
Comment 1 John Kizer 2024-12-15 07:59:49 UTC
Hi - this bug described an issue with a third-party widget, which would likely need to be investigated by the developers of that software.

Could you provide steps to reproduce this bug (following the guidelines here: https://community.kde.org/Get_Involved/Issue_Reporting#Step_6:_File_a_high-quality_Bugzilla_ticket ) that identify an issue specifically with KDE software?

Thanks,
Comment 2 alborto 2024-12-17 08:57:42 UTC
(In reply to John Kizer from comment #1)
> Hi - this bug described an issue with a third-party widget, which would
> likely need to be investigated by the developers of that software.
> 
> Could you provide steps to reproduce this bug (following the guidelines
> here:
> https://community.kde.org/Get_Involved/Issue_Reporting#Step_6:_File_a_high-
> quality_Bugzilla_ticket ) that identify an issue specifically with KDE
> software?
> 
> Thanks,

Hello, i already opened an issue to the Thermal Monitor page but the developer suggest me that the problem could depend on systemstats or nvidia-smi.
https://invent.kde.org/olib/thermalmonitor/-/issues/17

Since I use a similar program also on gnome and I have no problems I thought I could exclude nvidia-sm so here we are.

1) add Thermal Monitor on a panel
2) go on Thermal Monitor's configuration
3) on Sensor tab go to Add Sensor
4) scroll or search for GPU and add GPU Temperature for the discrete Nvidia GPU, in my case it is GPU 1 Temperature
5) use a command to see the state of the GPUs, i use cat /sys/class/drm/card*/device/power_state

Now my discrete GPU isn't on D3cold anymore but in D0 state.

When i delete GPU 1 Temperature from the sensors shown, after a while, my discrete GPU go back to D3cold.

Thanks in advance for your help.
Comment 3 Arjen Hiemstra 2025-01-23 16:17:24 UTC
For NVidia GPUs, we read the output of the `nvidia-smi` program provided by NVidia. If you get the same results when you run that directly, with something like `nvidia-smi dmon -d 2 -s pucm`, I would say that this is a driver bug.
Comment 4 alborto 2025-01-24 15:49:40 UTC
(In reply to Arjen Hiemstra from comment #3)
> For NVidia GPUs, we read the output of the `nvidia-smi` program provided by
> NVidia. If you get the same results when you run that directly, with
> something like `nvidia-smi dmon -d 2 -s pucm`, I would say that this is a
> driver bug.

Hello, it is precisely as you said, nvidia-smi wake the Nvidia GPU. 
I think Thermal Monitor do it differently but i don't know how.

$ cat /sys/class/drm/card*/device/power_state
D3cold
D0

$ nvidia-smi dmon -d 2 -s pucm
# gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk   pclk     fb   bar1   ccpm 
# Idx      W      C      C      %      %      %      %      %      %    MHz    MHz     MB     MB     MB 
    0    752     32      -      9      0      0      0      0      0   6001   1282     14      1      0 
    0     18     33      -      0      0      0      0      0      0   6001   1282     14      1      0 
    0     18     33      -      0      0      0      0      0      0   6001   1282     14      1      0 
    0     18     33      -      0      0      0      0      0      0   6001   1282     14      1      0 
    0     18     33      -      0      0      0      0      0      0   6001   1282     14      1      0 
    0     18     33      -      0      0      0      0      0      0   6001   1282     14      1      0 
    0     12     32      -      0      0      0      0      0      0    405    210     14      1      0 
    0     11     32      -      0      0      0      0      0      0    405    210     14      1      0 
    
$ cat /sys/class/drm/card*/device/power_state
D0
D0