Bug 512769 - Repeatable System crash with dual NvGPUs
Summary: Repeatable System crash with dual NvGPUs
Status: REPORTED
Alias: None
Product: Haruna
Classification: Applications
Component: general (other bugs)
Version First Reported In: 1.4.0
Platform: KDE Linux Linux
: NOR major
Target Milestone: ---
Assignee: george fb
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-11-29 20:43 UTC by obious
Modified: 2025-12-07 17:40 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description obious 2025-11-29 20:43:11 UTC
SUMMARY
Dual NVIDIA GPU System, where one GPU is headless and only for compute, will hang irreparably when launching Haruna. 100% reproducible.

STEPS TO REPRODUCE
1. I have a fully configured dual NvGPU system. (see Additional Details)
2. Launch Haruna with any video media.
3. The system will immediately hang irrecoverably.

OBSERVED RESULT
Results in a complete graphics stack crash. All screens freeze. TTY unavailable. SSH into system is possible, but kill/pkill -9 doesn't work on any Wayland-related application.
A restart/reboot is NOT vasufficient. Complete power cycle is required to get the system back up.

EXPECTED RESULT
Do not crash the graphics stack.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Kubuntu 25.10 (Wayland)
KDE Plasma Version: 6.5.2
KDE Frameworks Version: 6.20.0 
Qt Version: 6.9.2

ADDITIONAL INFORMATION
> mpv --list-options | grep device      
 --alsa-mixer-device              String (default: default)
 --audio-device                   String (default: auto)
 --bluray-device                  String (default: ) [file]
 --cdda-device                    String (default: ) [file]
 --cuda-decode-device             Choices: auto (or an integer) (0 to 2147483647) (default: auto)
 --drm-device                     String (default: ) [file]
 --dvd-device                     String (default: ) [file]
 --vaapi-device                   String (default: /dev/dri/renderD128) <-- probably wrong GPU
 --vulkan-device                  String (default: )

---

> vainfo 
Trying display: wayland
libva info: VA-API version 1.22.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.22 (libva 2.22.0)
vainfo: Driver version: VA-API NVDEC driver [direct backend]
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain12             : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointVLD
      VAProfileHEVCMain444            : VAEntrypointVLD
      VAProfileHEVCMain444_10         : VAEntrypointVLD
      VAProfileHEVCMain444_12         : VAEntrypointVLD

---

> MESA_VK_DEVICE_SELECT=list vulkaninfo
selectable devices:
  GPU 0: 10de:1b06 "NVIDIA GeForce GTX 1080 Ti" discrete GPU 0000:04:00.0 <-- is headless
  GPU 1: 10de:1e84 "NVIDIA GeForce RTX 2070 SUPER" discrete GPU 0000:05:00.0 <-- dual monitors
  GPU 2: 10005:0 "llvmpipe (LLVM 20.1.8, 256 bits)" CPU 0000:00:00.0

---

> nvidia-smi 
Sat Nov 29 12:40:38 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:04:00.0 Off |                  N/A |
|  0%   25C    P8              8W /  250W |    1213MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 2070 ...    Off |   00000000:05:00.0  On |                  N/A |
| 12%   55C    P0             51W /  215W |    1554MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            7358      C   dsnote                                 1208MiB |
|    1   N/A  N/A            2536      G   /usr/lib/xorg/Xorg                       15MiB |
|    1   N/A  N/A            2820      G   /usr/bin/kwin_wayland                   466MiB |
|    1   N/A  N/A            2889      G   /usr/bin/Xwayland                         3MiB |
|    1   N/A  N/A            2931      G   /usr/bin/ksmserver                        2MiB |
...
Note that Xorg is not on headless display
Comment 1 obious 2025-11-29 20:51:26 UTC
I understand that ultimately this might be an NVIDIA driver bug, but I believe that the severity of the crash should warrant trying to mitigate a complete graphics stack failure when launching a media player.  Note that NVIDIA utils like `nvidia-smi` or `nvtop` no longer function through an SSH session once the UI is frozen. Similarly, `kwin_wayland --replace` also does not work.

Please let me know if you need any additional information.