Bug 427603 - monitoring my NVIDIA does not show any info except temperature (missing output in nvidia-smi dmon)
Summary: monitoring my NVIDIA does not show any info except temperature (missing outpu...
Status: RESOLVED UNMAINTAINED
Alias: None
Product: ksysguard
Classification: Unmaintained
Component: general (show other bugs)
Version: 5.19.90
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: KSysGuard Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-12 17:22 UTC by Mathias Homann
Modified: 2024-09-23 21:00 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
screenshot of GPU monitoring widget (394.38 KB, image/bmp)
2020-10-12 17:33 UTC, Mathias Homann
Details
nvidia bug report logfile (836.23 KB, application/gzip)
2020-12-01 21:32 UTC, Mathias Homann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mathias Homann 2020-10-12 17:22:20 UTC
SUMMARY


STEPS TO REPRODUCE
1. Create a ksysguardd widget on desktop, line diagram style
2. Add Sensors for "GPU 1 Memory Usage %", "GPU 1 Shared Memory Usage %", "GPU 1  Power Usage" and "GPU 1 Temperature"
3. Hit "Apply"

OBSERVED RESULT
Only the GPU termperature display shows any data, the rest stays on 0

EXPECTED RESULT
I'd expect all sensors to display data - they used to until not too long ago.

SOFTWARE/OS VERSIONS
Operating System: openSUSE Leap 15.2
KDE Plasma Version: 5.19.90
KDE Frameworks Version: 5.75.0
Qt Version: 5.15.1
Kernel Version: 5.3.18-lp152.44-default
OS Type: 64-bit
Processors: 8 × Intel® Core™ i7-4771 CPU @ 3.50GHz
Memory: 31.3 GiB of RAM
Graphics Processor: GeForce GTX 1050/PCIe/SSE2

ADDITIONAL INFORMATION
NVIDIA Driver is 450.66 as provided by NVIDIA/openSUSE as rpm package
Comment 1 David Edmundson 2020-10-12 17:23:16 UTC
Can I have output of

nvidia-smi pmon


for a few seconds
Comment 2 David Edmundson 2020-10-12 17:24:13 UTC
Edit, 

nvidia-smi dmon
Comment 3 Mathias Homann 2020-10-12 17:33:56 UTC
Created attachment 132303 [details]
screenshot of GPU monitoring widget

Screenshot who
Comment 4 Mathias Homann 2020-10-12 17:35:40 UTC
lemmy@kumiko:/tmp> nvidia-smi pmon
# gpu        pid  type    sm   mem   enc   dec   command
# Idx          #   C/G     %     %     %     %   name
    0       2440     G     -     -     -     -   X              
    0       3824     G     -     -     -     -   kwin_x11       
    0       4140     G     -     -     -     -   akonadi_archive
    0       4145     G     -     -     -     -   akonadi_google_
    0       4148     G     -     -     -     -   akonadi_imap_re
    0       4151     G     -     -     -     -   akonadi_imap_re
    0       4152     G     -     -     -     -   akonadi_imap_re
    0       4154     G     -     -     -     -   akonadi_imap_re
    0       4157     G     -     -     -     -   akonadi_imap_re
    0       4158     G     -     -     -     -   akonadi_imap_re
    0       4160     G     -     -     -     -   akonadi_imap_re
    0       4163     G     -     -     -     -   akonadi_imap_re
    0       4173     G     -     -     -     -   akonadi_mailfil
    0       4186     G     -     -     -     -   akonadi_sendlat
    0       4187     G     -     -     -     -   akonadi_unified
    0       4421     G     -     -     -     -   nextcloud      
    0       4875     G     -     -     -     -   Keybase --type=
    0      24732     G     -     -     -     -   krunner        
    0      30109     G     -     -     -     -   plasmashell    
    0       2440     G     -     -     -     -   X              
    0       3824     G     -     -     -     -   kwin_x11       
    0       4140     G     -     -     -     -   akonadi_archive
    0       4145     G     -     -     -     -   akonadi_google_
    0       4148     G     -     -     -     -   akonadi_imap_re
    0       4151     G     -     -     -     -   akonadi_imap_re
    0       4152     G     -     -     -     -   akonadi_imap_re
    0       4154     G     -     -     -     -   akonadi_imap_re
    0       4157     G     -     -     -     -   akonadi_imap_re
    0       4158     G     -     -     -     -   akonadi_imap_re
    0       4160     G     -     -     -     -   akonadi_imap_re
    0       4163     G     -     -     -     -   akonadi_imap_re
    0       4173     G     -     -     -     -   akonadi_mailfil
    0       4186     G     -     -     -     -   akonadi_sendlat
    0       4187     G     -     -     -     -   akonadi_unified
    0       4421     G     -     -     -     -   nextcloud      
    0       4875     G     -     -     -     -   Keybase --type=
    0      24732     G     -     -     -     -   krunner        
    0      30109     G     -     -     -     -   plasmashell    
    0       2440     G     -     -     -     -   X              
    0       3824     G     -     -     -     -   kwin_x11       
    0       4140     G     -     -     -     -   akonadi_archive
    0       4145     G     -     -     -     -   akonadi_google_
    0       4148     G     -     -     -     -   akonadi_imap_re
    0       4151     G     -     -     -     -   akonadi_imap_re
Comment 5 Mathias Homann 2020-10-12 17:36:01 UTC
looks like nvidia-smi is broken...?
Comment 6 Mathias Homann 2020-10-12 17:48:19 UTC
I just checked - it's not a "root thing".
Comment 7 Mathias Homann 2020-10-12 18:01:56 UTC
(In reply to David Edmundson from comment #2)
> Edit, 
> 
> nvidia-smi dmon

oops.

lemmy@kumiko:~> nvidia-smi dmon
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    44     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
    0     -    43     -     0     0     -     -  3504  1784
Comment 8 Mathias Homann 2020-10-13 12:31:36 UTC
interesting detail: on my laptop which also uses a nvidia card "nvidia-smi dmon" properly reports memory usage...

Any ideas?
Comment 9 Mathias Homann 2020-10-15 05:20:01 UTC
I just got the next nvidia driver through openSUSE updates, now I have 450.80.02. Same problem.

Actually it appears to me as if nvidia-smi-[dp]mon is broken, nvidia-smi without any further parameters shows the memory consumption just fine:

lemmy@kumiko:~> nvidia-smi 
Thu Oct 15 07:19:02 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:01:00.0  On |                  N/A |
| 35%   42C    P0    N/A /  75W |    389MiB /  1997MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
Comment 10 David Edmundson 2020-10-15 16:48:50 UTC
>Any ideas?

Sounds like upstream is broken.

I'll tag our nvidia devs so they can make an upstream report, I don't think there's much we can do from the Plasma side. Sorry
Comment 11 Erik Kurzinger 2020-10-15 21:29:47 UTC
The mem value reported by "nvidia-smi dmon" refers to memory bandwidth utilization. If the GPU isn't actively rendering anything, it's normal for it to be zero. It should increase if you run a game or something, though. Incidentally, SM refers to streaming multiprocessor, not shared memory. These are what actually run the shader code, and the value reported represents their utilization. Sorry, I know the nvidia-smi documentation could probably make this clearer. Together, the two numbers can be useful for identifying bottlenecks.

If you want to know the video memory usage in the sense described here, you can run "nvidia-smi dmon -s m". It reports values for both framebuffer and bar1.
Comment 12 Mathias Homann 2020-10-15 21:49:06 UTC
...what is bar1?


and is there a way to have nvidia-smi report only the total framebuffer memory size?
Comment 13 Mathias Homann 2020-10-15 21:50:22 UTC
waitaminute. when you're running a graphical desktop with all kind of openGL based eye candy, wouldn't the BANDWIDTH greater than zero?
Comment 14 Erik Kurzinger 2020-10-15 22:09:23 UTC
Bar1 is the portion of video memory that can be accessed by the CPU over PCIe.

And yes, a graphical desktop environment would be expected to utilize GPU memory bandwidth, but only if you're actually interacting with it. If it's just sitting idle then it wouldn't be surprising that nvidia-smi reportes 0% mem utilization.

Could you try running "nvidia-smi dmon" and, moving windows around or activating some desktop effects? This should cause the mem and SM values to increase, unless there really is a bug.
Comment 15 Erik Kurzinger 2020-10-15 22:13:23 UTC
sorry, forgot to answer this...

> and is there a way to have nvidia-smi report only the total framebuffer memory size?

no, I don't think so, sorry. You could always pipe its output through some other tool like awk, though.
Comment 16 David Edmundson 2020-10-15 22:19:55 UTC
Erik, thanks for taking the time to answer. Sounds like there are some issues in our parsing!

Will adjust.
Comment 17 Mathias Homann 2020-10-16 05:12:27 UTC
(In reply to Erik Kurzinger from comment #14)
> Bar1 is the portion of video memory that can be accessed by the CPU over
> PCIe.
> 
> And yes, a graphical desktop environment would be expected to utilize GPU
> memory bandwidth, but only if you're actually interacting with it. If it's
> just sitting idle then it wouldn't be surprising that nvidia-smi reportes 0%
> mem utilization.
> 
> Could you try running "nvidia-smi dmon" and, moving windows around or
> activating some desktop effects? This should cause the mem and SM values to
> increase, unless there really is a bug.

hm.

so on my desktop pc (the one where monitoring doesn't work) I just ran "nvidia-smi dmon" in one xterm, and then dragged another one around all over my two screens, with the "wobbly windows when moving" effect turned up all the way to max - and the values stayed zero.

on the other hand, on my laptop which uses optimus technology and is set up for prime render offloading, so to my understanding the nvidia GPU should really not do anything unless I run a binary explicitely on that card, nvidia-dmon reports values for sm and mem even when the thing really does nothing, GPU-wise... so something seems to be off...
Comment 18 Mathias Homann 2020-10-16 08:05:04 UTC
(In reply to Mathias Homann from comment #17)
> (In reply to Erik Kurzinger from comment #14)
> > Bar1 is the portion of video memory that can be accessed by the CPU over
> > PCIe.
> > 
> > And yes, a graphical desktop environment would be expected to utilize GPU
> > memory bandwidth, but only if you're actually interacting with it. If it's
> > just sitting idle then it wouldn't be surprising that nvidia-smi reportes 0%
> > mem utilization.
> > 
> > Could you try running "nvidia-smi dmon" and, moving windows around or
> > activating some desktop effects? This should cause the mem and SM values to
> > increase, unless there really is a bug.
> 
> hm.
> 
> so on my desktop pc (the one where monitoring doesn't work) I just ran
> "nvidia-smi dmon" in one xterm, and then dragged another one around all over
> my two screens, with the "wobbly windows when moving" effect turned up all
> the way to max - and the values stayed zero.
> 
> on the other hand, on my laptop which uses optimus technology and is set up
> for prime render offloading, so to my understanding the nvidia GPU should
> really not do anything unless I run a binary explicitely on that card,
> nvidia-dmon reports values for sm and mem even when the thing really does
> nothing, GPU-wise... so something seems to be off...

ok i correct this - on my laptop nvidia-smi reports as expected.
Comment 19 Erik Kurzinger 2020-10-16 14:26:50 UTC
Ok, that is strange. It's kind of a separate issue to this ksysguard bug, though. Would you mind either sending an email to linux-bugs@nvidia.com or starting a thread on https://forums.developer.nvidia.com/c/gpu-unix-graphics/linux/ ? We can follow up with you there.

If you do so, it would help if you could run the nvidia-bug-report.sh script that we distribute with the driver and attach the file it generates.
Comment 20 Mathias Homann 2020-12-01 21:27:30 UTC
I'm running glxspheres, and "nvidia-smi dmon" and "nvidia-smi pmon" both show only zeros for sm,%, mem%, power consumption, and encoder/decoder usage, but running "nvidia-smi" without any parameters shows the actual usage stats.

Is that a bug, or not?
Comment 21 Mathias Homann 2020-12-01 21:32:22 UTC
Created attachment 133791 [details]
nvidia bug report logfile
Comment 22 Mathias Homann 2020-12-01 21:32:46 UTC
(In reply to Erik Kurzinger from comment #19)

> If you do so, it would help if you could run the nvidia-bug-report.sh script
> that we distribute with the driver and attach the file it generates.

mail sent.
Comment 23 Christoph Cullmann 2024-09-23 21:00:00 UTC
ksysguard is no longer maintained, in Plasma 6 there is the Plasma system monitor for this task.

If your issue still happens with the Plasma 6 replacement, please re-open and we can move this bug to the new product, thanks!