Bug 482256 - wayland doesn't work in systems with dual nvidia gpus
Summary: wayland doesn't work in systems with dual nvidia gpus
Status: RESOLVED UPSTREAM
Alias: None
Product: kwin
Classification: Plasma
Component: platform-drm (show other bugs)
Version: 5.93.0
Platform: Neon Linux
: NOR grave
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords: qt6, wayland
Depends on:
Blocks:
 
Reported: 2024-03-02 22:57 UTC by John Salatas
Modified: 2024-10-07 22:58 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
This is how the wayland session looks like for a new user in first time login (82.59 KB, image/jpeg)
2024-03-02 22:57 UTC, John Salatas
Details
journal log (4.12 KB, text/plain)
2024-03-04 22:33 UTC, John Salatas
Details
plasmashell journal logs (3.68 KB, text/plain)
2024-03-05 02:35 UTC, John Salatas
Details
wayland session (2.21 MB, image/jpeg)
2024-03-05 02:37 UTC, John Salatas
Details
weston-simple-egl (2.11 MB, image/jpeg)
2024-03-06 01:28 UTC, John Salatas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description John Salatas 2024-03-02 22:57:07 UTC
Created attachment 166321 [details]
This is how the wayland session looks like for a new user in first time login

SUMMARY
***
In my system with dual nvidia GPUs (RTX A2000 and RTX A5000) wayland is unusable. If I removing one of the GPUs then everything works as expected.


STEPS TO REPRODUCE
1. Have a system with dual nvidia GPUs
2. 
3. 

OBSERVED RESULT
 screen is corrupted and flickers as seen on the attached 

EXPECTED RESULT
should work as it works with dual GPUs in X11, or with single GPU in wayland 

SOFTWARE/OS VERSIONS
Operating System: KDE neon 6.0
KDE Plasma Version: 6.0.0
KDE Frameworks Version: 6.0.0
Qt Version: 6.6.2
Kernel Version: 6.5.0-21-generic (64-bit)
Graphics Platform: Wayland
Processors: 20 × Intel® Xeon® W-2255 CPU @ 3.70GHz
Memory: 62.5 GiB of RAM
Graphics Processor: NVIDIA RTX A5000/PCIe/SSE2

ADDITIONAL INFORMATION

Please let me know if and how I can help. I'm willing to spend some time in fixing this issue but it seems so low level for my so I may need a lot of guidance. 

Thanks!
Comment 1 Zamundaaa 2024-03-04 13:46:16 UTC
That's odd, to say the least. Are there any warnings in KWin's log when this happens? You can access it with
> journalctl --user-unit plasma-kwin_wayland --boot 0
Comment 2 John Salatas 2024-03-04 22:33:55 UTC
Created attachment 166406 [details]
journal log

see attached the journal logs. I'm not sure what I'm supposed to look for :\
Comment 3 John Salatas 2024-03-04 23:22:42 UTC
Just in case: Seems like the following line doesn't appear on my yesterday's journal logs (the ones with a single gpu in which wayland was working)

kwin_wayland[1368]: qt.dbus.integration: QDBusConnection: couldn't handle call to Teardown, no slot matched

The error line below appears in either case 

kwin_wayland_wrapper[1368]: src/gbm_drv_common.c:130: GBM-DRV error (get_bytes_per_component): Unknown or not supported format: 875708754

Hope it helps.
Comment 4 Zamundaaa 2024-03-05 01:54:47 UTC
The gbm error is a minor driver bug, but it's unrelated. The dbus one is also unrelated.

Judging by the lack of errors in KWin + the image, the problem might be on the app side - maybe the apps are using the wrong GPU. I don't know if anything about that would be logged, but if so, please attach the output of
> journalctl --user-unit plasma-plasmashell --boot 0

As some additional things you could check, can you start apps that don't use the GPU at all, like Konsole? Also, if you set the
> KWIN_DRM_DEVICES=/dev/dri/card1:/dev/dri/card0
environment variable to force KWin to use the other GPU, does that change anything?
Comment 5 John Salatas 2024-03-05 02:03:28 UTC
> As some additional things you could check, can you start apps that don't use the GPU at all, like Konsole?

Sorry I missed to mention that. If I press Alt+F2 and then type knosole it runs as expected. It just leaves the window traces (like it is shown in the screenshot I attached) if I move it around. With the konsole open I can then run kate and even firefox. It seems like the applications that are corrupted are the ones that are based on qml (or kirigami?) like (the ones I tried) systemsettings, discover, eliza and spectacle. 

I'll test the other suggestions soon.....
Comment 6 John Salatas 2024-03-05 02:35:24 UTC
Created attachment 166407 [details]
plasmashell journal logs

please see attached the plasmashell journal log. 

setting KWIN_DRM_DEVICES=/dev/dri/card1:/dev/dri/card0 didn't change anything
Comment 7 John Salatas 2024-03-05 02:37:13 UTC
Created attachment 166408 [details]
wayland session

Just in case here is a wayland session with konsole and firefox running
Comment 8 Zamundaaa 2024-03-06 00:53:16 UTC
Okay, qml apps not working definitely points in the direction of the OpenGL driver. Does a simpler egl app like weston-simple-egl work?
Comment 9 Zamundaaa 2024-03-06 00:54:39 UTC
As one more thing, does
> KWIN_DRM_DEVICES=/dev/dri/card0:/dev/dri/card1
(card numbers reversed) change anything? It would be weird if the other order would be used as default, but it's not impossible
Comment 10 John Salatas 2024-03-06 01:01:45 UTC
>does
> KWIN_DRM_DEVICES=/dev/dri/card0:/dev/dri/card1
> (card numbers reversed) change anything? 

Tried that already and it seems like my computer used LSD :) 

I'll check weston-simple-egl soon...
Comment 11 John Salatas 2024-03-06 01:28:52 UTC
Created attachment 166452 [details]
weston-simple-egl

weston-simple-egl doesn't work as well. Similar issue with qml apps (see attached). 

So I guess it's not really a plasma issue. Right?
Comment 12 Zamundaaa 2024-03-06 01:32:17 UTC
Yeah, this is definitely a NVidia driver bug. I think the correct place to report them is in the NVidia forums.
Comment 13 John Salatas 2024-03-06 01:33:28 UTC
Thank so much for your effort!
Comment 14 John Salatas 2024-10-07 22:58:22 UTC
Ummm..... sorry for spamming this. I want (just in case someone else stumbles upon it) to mention that with the latest plasma version (6.1.5) and the nvidia drivers 560 the problem is solved.