Bug 470474

Summary: new nvidia beta driver (535 series) not showing statistics
Product: [Applications] plasma-systemmonitor Reporter: Marcelo Bossoni <mmbossoni>
Component: generalAssignee: KSysGuard Developers <ksysguard-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: ahiemstra, CoolP, jon9097, julien.dlq, kde, kelvie, kx, MrRessiPiyent125, nate, plasma-bugs, sampingu02, ser.sto12, stephen.greenham, theattish, wildreiser, yizel7
Priority: NOR    
Version: 5.27.5   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In: 5.27.7
Sentry Crash Report:

Description Marcelo Bossoni 2023-05-31 00:32:17 UTC
SUMMARY
I've installed new nvidia drivers and card stats reporting seems to be broken again :(

STEPS TO REPRODUCE
1.  Install new nvidia 535 driver

OBSERVED RESULT
All stats showing 0 as result (temp, usage, frequency...)

EXPECTED RESULT
All statistics properly shown

SOFTWARE/OS VERSIONS
Operating System: Arch Linux 
KDE Plasma Version: 5.27.5
KDE Frameworks Version: 5.106.0
Qt Version: 5.15.9
Kernel Version: 6.3.4-zen3-xanmod1-1 (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 5700X 8-Core Processor
Memory: 15.5 GiB of RAM
Graphics Processor: NVIDIA GeForce GTX 1070/PCIe/SSE2
Manufacturer: Micro-Star International Co., Ltd.
Product Name: MS-7A37
System Version: 1.0

ADDITIONAL INFORMATION
nvidia-smi dmon -s pucm
# gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk   pclk     fb   bar1   ccpm 
# Idx      W      C      C      %      %      %      %      %      %    MHz    MHz     MB     MB     MB 
    0     11     48      -     0      5      0      0      -     -   405    139    512     15      0 
    0     11     48      -     0      5      0      0      -     -   405    139    512     15      0 
    0     11     48      -     0      5      0      0      -     -   405    139    504     15      0 
    0     11     48      -     0      5      0      0      -     -   405    139    504     15      0 
    0     11     48      -     0      5      0      0      -     -   405    139    504     15      0 
    0     11     48      -     0      5      0      0      -     -   405    139    504     15      0 
    0     11     48      -     0      5      0      0      -     -   405    139    496     15      0 
    0     11     47      -     0      5      0      0      -     -   405    139    496     15      0 
    0     11     47      -     0      5      0      0      -     -   405    139    496     15      0 
    0     11     47      -     0      5      0      0      -     -   405    139    488     15      0 
    0     11     47      -     0      5      0      0      -     -   405    139    488     15      0 
    0     11     47      -     0      5      0      0      -     -   405    139    488     15      0 
    0     11     47      -     0      6      0      0      -     -   405    139    480     15      0 
    0     11     47      -     0      5      0      0      -     -   405    139    480     15      0 
    0     11     47      -     0      5      0      0      -     -   405    139    480     15      0 
    0     11     47      -     0      5      0      0      -     -   405    139    480     15      0 
    0     11     47      -     0      5      0      0      -     -   405    139    472     15      0 
    0     11     47      -     7     11      0      0      -     -   405    139    586     15      0 
    0     13     47      -     1      6      0      0      -     -   405    139    586     15      0 
    0     12     47      -     0      6      0      0      -     -   405    139    586     15      0 
    0     12     47      -     0      5      0      0      -     -   405    139    578     15      0
Comment 1 David Redondo 2023-06-07 07:40:05 UTC
I don't have the driver yet but it looks like they added new fields

I think we should look for the index of the fields in the header  that we are interested in instead of bailing when encountering a new one
Comment 2 Bug Janitor Service 2023-06-15 12:34:12 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/ksystemstats/-/merge_requests/57
Comment 3 HeathenHacks 2023-06-17 02:26:15 UTC
Yep. Same issue here.(In reply to Marcelo Bossoni from comment #0)
> SUMMARY
> I've installed new nvidia drivers and card stats reporting seems to be
> broken again :(
> 
> STEPS TO REPRODUCE
> 1.  Install new nvidia 535 driver
> 
> OBSERVED RESULT
> All stats showing 0 as result (temp, usage, frequency...)
> 
> EXPECTED RESULT
> All statistics properly shown
> 
> SOFTWARE/OS VERSIONS
> Operating System: Arch Linux 
> KDE Plasma Version: 5.27.5
> KDE Frameworks Version: 5.106.0
> Qt Version: 5.15.9
> Kernel Version: 6.3.4-zen3-xanmod1-1 (64-bit)
> Graphics Platform: Wayland
> Processors: 16 × AMD Ryzen 7 5700X 8-Core Processor
> Memory: 15.5 GiB of RAM
> Graphics Processor: NVIDIA GeForce GTX 1070/PCIe/SSE2
> Manufacturer: Micro-Star International Co., Ltd.
> Product Name: MS-7A37
> System Version: 1.0
> 
> ADDITIONAL INFORMATION
> nvidia-smi dmon -s pucm
> # gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk 
> pclk     fb   bar1   ccpm 
> # Idx      W      C      C      %      %      %      %      %      %    MHz 
> MHz     MB     MB     MB 
>     0     11     48      -     0      5      0      0      -     -   405   
> 139    512     15      0 
>     0     11     48      -     0      5      0      0      -     -   405   
> 139    512     15      0 
>     0     11     48      -     0      5      0      0      -     -   405   
> 139    504     15      0 
>     0     11     48      -     0      5      0      0      -     -   405   
> 139    504     15      0 
>     0     11     48      -     0      5      0      0      -     -   405   
> 139    504     15      0 
>     0     11     48      -     0      5      0      0      -     -   405   
> 139    504     15      0 
>     0     11     48      -     0      5      0      0      -     -   405   
> 139    496     15      0 
>     0     11     47      -     0      5      0      0      -     -   405   
> 139    496     15      0 
>     0     11     47      -     0      5      0      0      -     -   405   
> 139    496     15      0 
>     0     11     47      -     0      5      0      0      -     -   405   
> 139    488     15      0 
>     0     11     47      -     0      5      0      0      -     -   405   
> 139    488     15      0 
>     0     11     47      -     0      5      0      0      -     -   405   
> 139    488     15      0 
>     0     11     47      -     0      6      0      0      -     -   405   
> 139    480     15      0 
>     0     11     47      -     0      5      0      0      -     -   405   
> 139    480     15      0 
>     0     11     47      -     0      5      0      0      -     -   405   
> 139    480     15      0 
>     0     11     47      -     0      5      0      0      -     -   405   
> 139    480     15      0 
>     0     11     47      -     0      5      0      0      -     -   405   
> 139    472     15      0 
>     0     11     47      -     7     11      0      0      -     -   405   
> 139    586     15      0 
>     0     13     47      -     1      6      0      0      -     -   405   
> 139    586     15      0 
>     0     12     47      -     0      6      0      0      -     -   405   
> 139    586     15      0 
>     0     12     47      -     0      5      0      0      -     -   405   
> 139    578     15      0

Yep. Same issue here.
Comment 4 Arjen Hiemstra 2023-06-18 15:02:45 UTC
*** Bug 471097 has been marked as a duplicate of this bug. ***
Comment 5 Arjen Hiemstra 2023-06-18 15:03:15 UTC
*** Bug 471178 has been marked as a duplicate of this bug. ***
Comment 6 Arjen Hiemstra 2023-06-18 15:03:32 UTC
*** Bug 471193 has been marked as a duplicate of this bug. ***
Comment 7 kx 2023-06-20 07:22:10 UTC
Same issue here too. RTX 3060, and +1 for another user in an external report, also on 3060

nvidia-smi dmon -s pucm
# gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk   pclk     fb   bar1   ccpm 
# Idx      W      C      C      %      %      %      %      %      %    MHz    MHz     MB     MB     MB 
    0     41     45      -     0      1      0      0      0      0   7500   1867    882     28      0 
    0     51     46      -     0      1      0      0      0      0   7500   1867    882     28      0 
    0     37     44      -     2      6      0      0      0      0    810    330    881     28      0 
    0     30     44      -     2      7      0      0      0      0    810    217    877     28      0 
    0     29     44      -    26     22      0      0      0      0    405    210    874     28      0 
    0     29     44      -    21     21      0      0      0      0    405    345    874     28      0 
    0     28     44      -     4     13      0      0      0      0    405    345    874     28      0 
    0     28     45      -     4     13      0      0      0      0    405    210    873     28      0 
    0     28     44      -     5     13      0      0      0      0    405    210    873     28      0 
    0     28     44      -     6     13      0      0      0      0    405    210    873     28      0 
    0     28     44      -     5     14      0      0      0      0    405    210    857     28      0 
    0     28     44      -     4     13      0      0      0      0    405    210    857     28      0 
    0     28     44      -     5     13      0      0      0      0    405    210    857     28      0 
    0     28     44      -     5     13      0      0      0      0    405    210    857     28      0
Comment 8 Stephen Greenham 2023-06-20 08:33:29 UTC
Observed yesterday this is affecting more than just Plasma System Monitor.

The built-in in-game overlay in Overwatch 2 used to be able to display GPU temperature but now displays 0C.

So I guess the way NVIDIA provides the data has fundamentally changed. Nice of them to tell anyone...
Comment 9 Altamush Nayyer Khan 2023-06-20 08:35:53 UTC
I haven't tested MangoHud yet. This is a sad thing. Nvidia should have informed people about this.

Probably MangoHud won't work well either.
Comment 10 kx 2023-06-20 19:24:56 UTC
(In reply to Altamush Nayyer Khan from comment #9)
> I haven't tested MangoHud yet. This is a sad thing. Nvidia should have
> informed people about this.
> 
> Probably MangoHud won't work well either.

https://j.gifs.com/mlbqYX.gif

That's I guess what happens with proprietary software and companies that don't care for anything but money.

Did anyone report this to NVIDIA / is in contact with them? Is there no documentation about this anywhere?
Comment 11 Pascal 2023-06-27 23:22:35 UTC
NVIDIA Driver Version (out of beta now): 535.54.03 on 2080 Super. Kubuntu. KDE Plasma: 5.27.4. KDE Frameworks Version: 5.104.0

Same problem here but it seems that Mangohud is unaffected and can read the info just fine, like it did with the previous 530 driver and there wasn't an update to Mangohud in between.

It also doesn't help to change sensor definitions to "all GPUs" for testing since the data fields (I'm mainly monitoring temperatures and VRAM usage) remain at zero all the time no matter what.
Comment 12 kx 2023-06-28 00:28:23 UTC
I checked it myself, and I'm pretty certain that https://invent.kde.org/plasma/ksystemstats/-/merge_requests/57 should already fix this.

It seems that NVIDIA added columns to the output.

Was: gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk  fb  bar1
Is:     gpu   pwr gtemp mtemp    sm   mem   enc   dec   jpg    ofa   mclk   pclk     fb   bar1   ccpm

 and because the old parser implemented the following using index based splitting:

if (parts.count() != 12) {
            continue;
}

that might've caused all the fields to be 0. #57 filters the headers not by index of a split, so this should fix it.
Comment 13 Ananaserio 2023-06-28 09:25:26 UTC
(In reply to kx from comment #12)
> I checked it myself, and I'm pretty certain that
> https://invent.kde.org/plasma/ksystemstats/-/merge_requests/57 should already fix this.

I'm not really familiar with the process here so I apologize for asking the obvious. Am I understanding correctly that now we're just waiting for it to be merged, so (a future) update will fix it?
Comment 14 kx 2023-06-28 10:54:02 UTC
> I'm not really familiar with the process here so I apologize for asking the
> obvious. Am I understanding correctly that now we're just waiting for it to
> be merged, so (a future) update will fix it?

No worries, you're not doing wrong for asking.

You're pretty much right. You can imagine it as if there is the releases or versions, (5.27.1, 5.27.2...... etc.) and those are a combined effort to fix bugs, add features or clean up the code and so on. What goes into these versions is dependent on what's relevant at that time, it's just a continuous development process in which a lot of people and the community in general are involved.

A fix for this bug has been developed and proposed as an addition to the next possible version. When that will be, I actually don't know myself. Someone has to approve the request first to make sure it's nothing malicious, has proper code style, quality and so on.

One thing you could do, if you want to use the fix earlier than officially released, is build it yourself (from the branch the merge request, that I linked, resides on) after making sure it's nothing malicious. That's why open source is so helpful.
Comment 15 Pascal 2023-06-28 13:25:06 UTC
In the name of us newbies, thanks for that explanation and work. Always fun to learn new things. :-)
Comment 16 Ananaserio 2023-06-28 13:45:49 UTC
Indeed, thanks a lot for the explanation!
Comment 17 wildreiser 2023-06-29 04:18:30 UTC
Also, i confirm that my lastes nvidia driver not show info to.
Arch-Linux witch lastest updates.
Comment 18 David Redondo 2023-06-29 07:40:56 UTC
Git commit 7f9ead6bddfdf6f13a1ea48791f8f5d5c80c6980 by David Redondo.
Committed on 28/06/2023 at 12:24.
Pushed by davidre into branch 'master'.

gpu/nvidia: Discover data fields based on headers

This guards us against the appearance of new fields or if they
ever appear in a different order.
FIXED-IN:5.27.7

M  +34   -16   plugins/gpu/NvidiaSmiProcess.cpp
M  +14   -0    plugins/gpu/NvidiaSmiProcess.h

https://invent.kde.org/plasma/ksystemstats/-/commit/7f9ead6bddfdf6f13a1ea48791f8f5d5c80c6980
Comment 19 David Redondo 2023-06-30 13:55:44 UTC
Git commit 4f7213e6e742b993feeaf300181a67923e60c0f4 by David Redondo.
Committed on 29/06/2023 at 08:38.
Pushed by davidedmundson into branch 'Plasma/5.27'.

gpu/nvidia: Discover data fields based on headers

This guards us against the appearance of new fields or if they
ever appear in a different order.
FIXED-IN:5.27.7

(cherry picked from commit 7f9ead6bddfdf6f13a1ea48791f8f5d5c80c6980)
Because in Qt5 QVector<T>::indexOf only takes T's we have to provide
our own indexOf here.

M  +39   -17   plugins/gpu/NvidiaSmiProcess.cpp
M  +14   -0    plugins/gpu/NvidiaSmiProcess.h

https://invent.kde.org/plasma/ksystemstats/-/commit/4f7213e6e742b993feeaf300181a67923e60c0f4
Comment 20 yizel7 2023-08-18 22:32:01 UTC
(In reply to David Redondo from comment #19)
> Git commit 4f7213e6e742b993feeaf300181a67923e60c0f4 by David Redondo.
> Committed on 29/06/2023 at 08:38.
> Pushed by davidedmundson into branch 'Plasma/5.27'.
> 
> gpu/nvidia: Discover data fields based on headers
> 
> This guards us against the appearance of new fields or if they
> ever appear in a different order.
> FIXED-IN:5.27.7
> 
> (cherry picked from commit 7f9ead6bddfdf6f13a1ea48791f8f5d5c80c6980)
> Because in Qt5 QVector<T>::indexOf only takes T's we have to provide
> our own indexOf here.
> 
> M  +39   -17   plugins/gpu/NvidiaSmiProcess.cpp
> M  +14   -0    plugins/gpu/NvidiaSmiProcess.h
> 
> https://invent.kde.org/plasma/ksystemstats/-/commit/
> 4f7213e6e742b993feeaf300181a67923e60c0f4

did this fix for anyone else? i updated to 5.27.7 and not work still
Comment 21 Ananaserio 2023-08-18 22:35:34 UTC
It works for me now.
Comment 22 yizel7 2023-08-18 22:38:03 UTC
(In reply to Ananaserio from comment #21)
> It works for me now.

thanks you. it works for me now too had to reboot
Comment 23 Altamush Nayyer Khan 2023-08-19 05:49:11 UTC
The latest update seems to have fixed the issue. I am marking this as fixed.