Bug 478251

Summary: kwin_wayland with nvidia 545 driver drops user back to login after a few seconds
Product: [Plasma] kwin Reporter: Barry Scott <barry>
Component: wayland-genericAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED WORKSFORME    
Severity: grave CC: ekurzinger, fanzhuyifan, nate, ngompa, warp-spam_kde, xaver.hugl
Priority: NOR    
Version First Reported In: 5.27.9   
Target Milestone: ---   
Platform: Other   
OS: Other   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=2252447
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Barry Scott 2023-12-08 09:28:00 UTC
SUMMARY
Originally reported to Fedora https://bugzilla.redhat.com/show_bug.cgi?id=2252447

When I login there is a long pause with screen black and a cursor blinking in the top-left corner. Then the KDE plasma desktop appears.
I can start apps and the are working. But after about 10 seconds I get throw back the login screen.
Here are the journal --user logs.

2023-12-01T17:50:34+0000 plasmashell[4703]: qt.qpa.wayland: Wayland does not support QWindow::requestActivate()
2023-12-01T17:50:35+0000 plasmashell[4703]: QString::arg: 2 argument(s) missing in org.barrys-emacs.scm-workbench
2023-12-01T17:50:35+0000 systemd[4184]: Started app-org.barrys\x2demacs.scm\x2dworkbench-bf73e11b6a40477fb99b021e4439cbb9.scope - SCM Workbench.
2023-12-01T17:50:36+0000 kwin_wayland[4567]: kf.service.services: The desktop entry file "/usr/share/applications/qemu.desktop" has Type= "Application" but has no Exec field.
2023-12-01T17:50:36+0000 kwin_wayland[4567]: kf.service.services: The desktop entry file "/usr/share/applications/org.freedesktop.Xwayland.desktop" has Type= "Application" but has no Exec field.
2023-12-01T17:50:46+0000 plasmashell[4703]: org.kde.plasma.pulseaudio: No object for name "alsa_output.pci-0000_00_1f.3.analog-stereo.monitor"
2023-12-01T17:50:46+0000 kwin_wayland[4567]: kwin_wayland_drm: Atomic commit failed! Invalid argument
2023-12-01T17:50:46+0000 kwin_wayland[4567]: kwin_wayland_drm: Presentation failed! Invalid argument
2023-12-01T17:50:46+0000 kwin_wayland[4567]: kwin_core: Applying KScreen config failed!
2023-12-01T17:50:46+0000 plasmashell[4703]: org.kde.plasma.pulseaudio: No object for name "alsa_output.pci-0000_00_1f.3.analog-stereo"
2023-12-01T17:50:46+0000 plasmashell[4703]: org.kde.plasma.pulseaudio: No object for name "alsa_output.pci-0000_00_1f.3.analog-stereo.monitor"
2023-12-01T17:50:46+0000 plasmashell[4703]: org.kde.plasma.pulseaudio: No object for name "@DEFAULT_SINK@"
2023-12-01T17:50:46+0000 plasmashell[4703]: org.kde.plasma.pulseaudio: No object for name "@DEFAULT_SOURCE@"
2023-12-01T17:50:46+0000 plasmashell[4703]: org.kde.plasma.pulseaudio: No object for name "@DEFAULT_SINK@"
2023-12-01T17:50:46+0000 plasmashell[4703]: org.kde.plasma.pulseaudio: No object for name "@DEFAULT_SOURCE@"
2023-12-01T17:50:46+0000 plasmashell[4703]: org.kde.plasma.pulseaudio: No object for name "auto_null.monitor"
2023-12-01T17:50:46+0000 kwin_wayland[4567]: kwin_wayland_drm: Atomic commit failed! Permission denied
2023-12-01T17:50:46+0000 kwin_wayland[4567]: kwin_wayland_drm: Presentation failed! Permission denied
2023-12-01T17:50:46+0000 kwin_wayland[4567]: kwin_core: Applying KScreen config failed!
2023-12-01T17:50:46+0000 kwin_wayland[4567]: kwin_core: Applying KScreen config failed!

Here is the output of dmesg

$ dmesg | grep -i nvidia
[  +0.000000] Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.6.2-201.fc39.x86_64 root=UUID=f160dd82-834b-4cfa-8ee7-9c159b2a1b7b ro rootflags=subvol=root rd.luks.uuid=luks-904db66b-db23-4719-bbf6-fb596c23d831 initcall_blacklist=simpledrm_platform_driver_init nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau
[  +0.000011] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.6.2-201.fc39.x86_64 root=UUID=f160dd82-834b-4cfa-8ee7-9c159b2a1b7b ro rootflags=subvol=root rd.luks.uuid=luks-904db66b-db23-4719-bbf6-fb596c23d831 initcall_blacklist=simpledrm_platform_driver_init nvidia-drm.modeset=1 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau
[  +0.011260] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1c.4/0000:06:00.1/sound/card1/input11
[  +0.000614] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1c.4/0000:06:00.1/sound/card1/input12
[  +0.000584] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:1c.4/0000:06:00.1/sound/card1/input13
[  +0.000633] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:1c.4/0000:06:00.1/sound/card1/input14
[  +0.079026] nvidia: loading out-of-tree module taints kernel.
[  +0.000535] nvidia: module license 'NVIDIA' taints kernel.
[  +0.000520] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[  +0.000528] nvidia: module license taints kernel.
[  +0.113438] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[  +0.001374] nvidia 0000:06:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[  +0.048729] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  545.29.06  Thu Nov 16 01:59:08 UTC 2023
[  +0.066443] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[  +0.066863] nvidia-uvm: Loaded the UVM driver, major device number 511.
[  +0.037940] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  545.29.06  Thu Nov 16 01:47:29 UTC 2023
[  +0.005739] [drm] [nvidia-drm] [GPU ID 0x00000600] Loading driver
[  +1.072910] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:06:00.0 on minor 0
[  +0.000021] nvidia 0000:06:00.0: vgaarb: deactivate vga console
[  +0.127493] fbcon: nvidia-drmdrmfb (fb0) is primary device
[  +0.012371] nvidia 0000:06:00.0: [drm] fb0: nvidia-drmdrmfb frame buffer device


Reproducible: Always


STEPS TO REPRODUCE
1. upgrade to akmod-nvidia-545.29.06-1.fc39.x86_64
2. build nvidia drivers
3. reboot
4. login to plasma (wayland)
5. there is pause before desktop appears
6. start show apps, I use konsole and the scm workbench app
7. after 10 seconds you are returned to the login screen
8. new login attempts will not load a desktop

OBSERVED RESULT
kwin_wayland breaks

EXPECTED RESULT
kwin_wayland stable

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: Fedora 39
(available in About System)
KDE Plasma Version: kwin-wayland-5.27.9-3.fc39.x86_64
KDE Frameworks Version: 
Qt Version: qt5-qtbase-gui-5.15.11-7.fc39.x86_64 etc
RPMFusion NVidia drivers: akmod-nvidia-545.29.06-1.fc39.x86_64

ADDITIONAL INFORMATION
I have downgraded to the kwin-wayland-5.27.8-1.fc39.x86_64 version of kwin and its dependents.
This still shows the issue with being throw out of the desktop back to the login screen.
But its not happening all of the time. After a few attempts <5 usually I canget logged in and stay logged in.

If there is a ~30s delay from login to desktop appearing then after a further ~10s user is thrown to login screen every time.
If the desktop appears after <3s then will be left logged in from without further issue.

5.27.9 I could not stay logged in.

downgrading from the nvidia 545 drivers to the 535 drivers does not fix the issue.
Comment 1 Liz 2023-12-11 09:00:15 UTC
I am seeing what appears to be the same failure.

To provide a bit more information on what is failing, there appears to be two different things going wrong, though it is distinctly possible that the second problem is related to the first.

This is all in kwin 5.57.9 as packaged in Debian sid.

In src/backends/drm/drm_gbm_surface.cpp there is GbmSurface::swapBuffers, this calls eglSwapBuffers, and then calls gbm_surface_lock_front_buffer.

The latter call to gbm_surface_lock_front_buffer is failing, without any indication as to why.

When that fails, we eventually get to GbmSurface::~GbmSurface, which calls gbm_surface_destroy, which triggers a very unhappy error about something either being freed twice, or there being memory corruption in play.

That is a fatal error inside libc6, resulting in kwin dying.

What I can't figure out, so far at least, is _why_.

I am also on a nVidia RTX 3060, running the 545.23.08 driver.

I would love a minimum viable test application for EGL on GBM to make it easier to test success/failure.
Comment 2 Zamundaaa 2023-12-11 14:22:56 UTC
This isn't actionable as-is, though it sounds like a bug in the driver. Please attach a backtrace of the crash.
Comment 3 Liz 2023-12-15 01:58:30 UTC
Barry, can you try: sudo ln -s /usr/lib/x86_64-linux-gnu/nvidia/current/nvidia-drm_gbm.so /usr/lib/x86_64-linux-gnu/gbm/

As far as the stack trace, creating the above symlink fixed kwin not working on wayland, but the strack trace I was getting when it wasn't is below.

Of significant note, I added a bunch of debug log entries, so the offsets won't exactly match lines in an unmodified build.
                
Module libsystemd.so.0 from deb systemd-255-1.amd64
Module libzstd.so.1 from deb libzstd-1.5.5+dfsg2-2.amd64
Module libudev.so.1 from deb systemd-255-1.amd64
Stack trace of thread 3325:
#0  0x00007f0d23ca80fc __pthread_kill_implementation (libc.so.6 + 0x8a0fc)
#1  0x00007f0d23c5a472 __GI_raise (libc.so.6 + 0x3c472)
#2  0x00007f0d23c444b2 __GI_abort (libc.so.6 + 0x264b2)
#3  0x00007f0d23c451ed __libc_message (libc.so.6 + 0x271ed)
#4  0x00007f0d23cb1a75 malloc_printerr (libc.so.6 + 0x93a75)
#5  0x00007f0d23cb3ad0 _int_free (libc.so.6 + 0x95ad0)
#6  0x00007f0d23cb616f __GI___libc_free (libc.so.6 + 0x9816f)
#7  0x00007f0d269d27e9 _ZN4KWin10GbmSurfaceD2Ev (libkwin.so.5 + 0x3d27e9)
#8  0x00007f0d267c682e _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE19_M_release_last_useEv (libkwin.so.5 + 0x1c682e)
#9  0x00007f0d269cf438 _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv (libkwin.so.5 + 0x3cf438)
#10 0x00007f0d269cf712 operator() (libkwin.so.5 + 0x3cf712)
#11 0x00007f0d269cfec4 _ZNK4KWin18EglGbmLayerSurface13createSurfaceERK5QSizeRK4QMapIj7QVectorImEE (libkwin.so.5 + 0x3cfec4)
#12 0x00007f0d269d0b9d _ZN4KWin18EglGbmLayerSurface12checkSurfaceERK5QSizeRK4QMapIj7QVectorImEE (libkwin.so.5 + 0x3d0b9d)
#13 0x00007f0d269d13fe _ZN4KWin18EglGbmLayerSurface16renderTestBufferERK5QSizeRK4QMapIj7QVectorImEE (libkwin.so.5 + 0x3d13fe)
#14 0x00007f0d269c78c6 _ZN4KWin11EglGbmLayer15checkTestBufferEv (libkwin.so.5 + 0x3c78c6)
#15 0x00007f0d269eab28 _ZN4KWin11DrmPipeline21commitPipelinesAtomicERK7QVectorIPS0_ENS0_10CommitModeERKS1_IPNS_9DrmObjectEE (libkwin.so.5 + 0x3eab28)
#16 0x00007f0d269da33f _ZN4KWin6DrmGpu13testPipelinesEv (libkwin.so.5 + 0x3da33f)
#17 0x00007f0d269db424 _ZN4KWin6DrmGpu19checkCrtcAssignmentE7QVectorIPNS_12DrmConnectorEERKS1_IPNS_7DrmCrtcEE (libkwin.so.5 + 0x3db424)
#18 0x00007f0d269db38f _ZN4KWin6DrmGpu19checkCrtcAssignmentE7QVectorIPNS_12DrmConnectorEERKS1_IPNS_7DrmCrtcEE (libkwin.so.5 + 0x3db38f)
#19 0x00007f0d269db38f _ZN4KWin6DrmGpu19checkCrtcAssignmentE7QVectorIPNS_12DrmConnectorEERKS1_IPNS_7DrmCrtcEE (libkwin.so.5 + 0x3db38f)
#20 0x00007f0d269db424 _ZN4KWin6DrmGpu19checkCrtcAssignmentE7QVectorIPNS_12DrmConnectorEERKS1_IPNS_7DrmCrtcEE (libkwin.so.5 + 0x3db424)
#21 0x00007f0d269dbdd0 _ZN4KWin6DrmGpu24testPendingConfigurationEv (libkwin.so.5 + 0x3dbdd0)
#22 0x00007f0d269dc9db _ZN4KWin6DrmGpu13updateOutputsEv (libkwin.so.5 + 0x3dc9db)
#23 0x00007f0d269b520c _ZN4KWin10DrmBackend13updateOutputsEv (libkwin.so.5 + 0x3b520c)
#24 0x00007f0d24d062b2 n/a (libQt5Core.so.5 + 0x3062b2)
#25 0x00007f0d267db7f4 _ZN4KWin10Compositor10setupStartEv (libkwin.so.5 + 0x1db7f4)
#26 0x00007f0d267dcdd4 _ZN4KWin17WaylandCompositor5startEv (libkwin.so.5 + 0x1dcdd4)
#27 0x00007f0d24cf9940 _ZN7QObject5eventEP6QEvent (libQt5Core.so.5 + 0x2f9940)
#28 0x00007f0d24362f32 _ZN19QApplicationPrivate13notify_helperEP7QObjectP6QEvent (libQt5Widgets.so.5 + 0x162f32)
#29 0x00007f0d24ccc748 _ZN16QCoreApplication15notifyInternal2EP7QObjectP6QEvent (libQt5Core.so.5 + 0x2cc748)
#30 0x00007f0d24ccfe51 _ZN23QCoreApplicationPrivate16sendPostedEventsEP7QObjectiP11QThreadData (libQt5Core.so.5 + 0x2cfe51)
#31 0x00007f0d24d25115 _ZN20QEventDispatcherUNIX13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt5Core.so.5 + 0x325115)
#32 0x000055bc0100e8c1 _ZN23QUnixEventDispatcherQPA13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (kwin_wayland + 0x1508c1)
#33 0x00007f0d24ccb0fb _ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE (libQt5Core.so.5 + 0x2cb0fb)
#34 0x00007f0d24cd38a4 _ZN16QCoreApplication4execEv (libQt5Core.so.5 + 0x2d38a4)
#35 0x000055bc00f1636f main (kwin_wayland + 0x5836f)
#36 0x00007f0d23c456ca __libc_start_call_main (libc.so.6 + 0x276ca)
#37 0x00007f0d23c45785 __libc_start_main_impl (libc.so.6 + 0x27785)
#38 0x000055bc00f18461 _start (kwin_wayland + 0x5a461)

Stack trace of thread 3335:
#0  0x00007f0d23d19a1f __GI___poll (libc.so.6 + 0xfba1f)
#1  0x00007f0d22714277 n/a (libglib-2.0.so.0 + 0x5a277)
#2  0x00007f0d22714930 g_main_context_iteration (libglib-2.0.so.0 + 0x5a930)
#3  0x00007f0d24d27d4a _ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt5Core.so.5 + 0x327d4a)
#4  0x00007f0d24ccb0fb _ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE (libQt5Core.so.5 + 0x2cb0fb)
#5  0x00007f0d24ad9c52 _ZN7QThread4execEv (libQt5Core.so.5 + 0xd9c52)
#6  0x00007f0d26da87ab n/a (libQt5DBus.so.5 + 0x177ab)
#7  0x00007f0d24adaeb1 n/a (libQt5Core.so.5 + 0xdaeb1)
#8  0x00007f0d23ca63ec start_thread (libc.so.6 + 0x883ec)
#9  0x00007f0d23d26a5c __clone3 (libc.so.6 + 0x108a5c)

Stack trace of thread 3339:
#0  0x00007f0d23d19a1f __GI___poll (libc.so.6 + 0xfba1f)
#1  0x00007f0d22714277 n/a (libglib-2.0.so.0 + 0x5a277)
#2  0x00007f0d22714930 g_main_context_iteration (libglib-2.0.so.0 + 0x5a930)
#3  0x00007f0d24d27d4a _ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt5Core.so.5 + 0x327d4a)
#4  0x00007f0d24ccb0fb _ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE (libQt5Core.so.5 + 0x2cb0fb)
#5  0x00007f0d24ad9c52 _ZN7QThread4execEv (libQt5Core.so.5 + 0xd9c52)
#6  0x00007f0d24adaeb1 n/a (libQt5Core.so.5 + 0xdaeb1)
#7  0x00007f0d23ca63ec start_thread (libc.so.6 + 0x883ec)
#8  0x00007f0d23d26a5c __clone3 (libc.so.6 + 0x108a5c)
ELF object binary architecture: AMD x86-64
```
Comment 4 Bug Janitor Service 2023-12-30 03:46:04 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 5 Bug Janitor Service 2024-01-14 03:45:29 UTC
This bug has been in NEEDSINFO status with no change for at least
30 days. The bug is now closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

Thank you for helping us make KDE software even better for everyone!