Bug 396066

Summary: Wayland session is coring right after login [amdgpu/DisplayPort]
Product: [Plasma] kwin Reporter: Shmerl <shtetldik>
Component: wayland-genericAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED UPSTREAM    
Severity: normal CC: kde, mustafa1024m, simonandric5, subdiff
Priority: NOR    
Version: 5.13.1   
Target Milestone: ---   
Platform: Debian testing   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Shmerl 2018-07-01 21:24:10 UTC
I just tried to use Wayland session with KDE Plasma 5.13.1 and it's coring right after login and falling back to sddm.

That's what I see in dmesg:

[  176.359816] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:956
[  176.814144] QThread[2620]: segfault at f ip 00007f30f30b0e60 sp 00007f30e506e4f0 error 4 in libwayland-client.so.0.3.0[7f30f30a9000+d000]

Is it a Mesa problem of something with KWin?

Current Debian testing x86_64, Plasma 5.13.1 pulled from unstable. GPU: AMD Vega 56, connected over DisplayPort.

OpenGL renderer string: Radeon RX Vega (VEGA10, DRM 3.25.0, 4.17.0-trunk-amd64, LLVM 6.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.1.2
Comment 1 Martin Flöser 2018-07-02 04:16:37 UTC
Please try to get a backtrace from KWin.
Comment 2 Shmerl 2018-07-02 04:21:34 UTC
(In reply to Martin Flöser from comment #1)
> Please try to get a backtrace from KWin.

What is a good way to attach debugger to it before it crashes?
Comment 3 Shmerl 2018-07-02 22:17:02 UTC
I wrote a script what intercepts kwin_wayland launch and attaches gdb to it. But when I do continue in it, it simply exists rather that segfaults.

Where can I find KWin logs?

That's what I see in: $HOME/.local/share/sddm/wayland-session.log

startplasmacompositor: Starting up...
dbus-daemon[1791]: [session uid=1000 pid=1791] Activating service name='org.freedesktop.systemd1' requested by ':1.1' (uid=1000 pid=1893 comm="dbus-update-activation-environment --systemd --all")
dbus-daemon[1791]: [session uid=1000 pid=1791] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
dbus-update-activation-environment: warning: error sending to systemd: org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1
No backend specified through command line argument, trying auto resolution
FATAL ERROR: backend failed to initialize, exiting now
startplasmacompositor: Shutting down...
startplasmacompositor: Done.
Comment 4 Martin Flöser 2018-07-03 04:25:08 UTC
The log says that the platform plugin failed to initialize. Could it be that systemd/logind don't function correctly?
Comment 5 Shmerl 2018-07-03 04:30:17 UTC
It functions OK for X11 session at least. And that Wayland client core did appear in dmesg before.
Comment 6 Shmerl 2018-07-04 23:28:03 UTC
I tried to start Plasma session manually from tty, and got this:

startplasmacompositor: Starting up...
dbus-update-activation-environment: warning: error sending to systemd: org.freedesktop.DBus.Error.InvalidArgs: Invalid environment assignments
No backend specified through command line argument, trying auto resolution
FATAL ERROR: backend failed to initialize, exiting now
startplasmacompositor: Shutting down...
startplasmacompositor: Done.
Comment 7 David Edmundson 2018-07-04 23:35:24 UTC
try dbus-launch startplasmacompositor instead
Comment 8 Shmerl 2018-07-04 23:46:50 UTC
startplasmacompositor: Starting up...
dbus-update-activation-environment: warning: error sending to systemd: org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1
No backend specified through command line argument, trying auto resolution
FATAL ERROR: backend failed to initialize, exiting now
Segmentation fault
startplasmacompositor: Shutting down...
startplasmacompositor: Done.

I tried it a few times - that segfault doesn't happen every time, and I don't see any cores.
Comment 9 Shmerl 2018-07-05 00:47:57 UTC
Is there some way to force core generation? I'm not sure why it's not created, ulimit is set to unlimited.
Comment 10 Shmerl 2018-07-05 00:51:16 UTC
I also copied startplasmacompositor and edited kwin launch there to use:

gdb -ex run --args /usr/bin/kwin_wayland --xwayland --libinput --exit-with-session=/usr/lib/x86_64-linux-gnu/libexec/startplasma

It hung the system completely to me.
Comment 11 Martin Flöser 2018-07-05 04:23:59 UTC
According to the log KWin is not segfaulting but exiting due to drm Backend Not working.
Comment 12 Shmerl 2018-07-05 04:28:09 UTC
So what exactly can be segfaulting using QThread and libwayland-client.so.0.3.0?
Comment 13 Shmerl 2018-07-10 02:23:36 UTC
It could be a bug in amdgpu driver. I'd appreciate any other suggestion how to debug it.
Comment 14 Shmerl 2018-07-13 14:27:20 UTC
Corresponding amdgpu/dri bug: https://bugs.freedesktop.org/show_bug.cgi?id=107213
Comment 15 Shmerl 2018-09-14 13:10:04 UTC
I managed to make it produce a core. It's from kwin_wayland. After installing needed debug symbol packages, here is a backtrace:

Core was generated by `/usr/bin/kwin_wayland --xwayland --libinput --exit-with-session=/usr/lib/x86_64'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007eff59760f30 in wl_closure_init (message=message@entry=0x7, size=size@entry=52, num_arrays=num_arrays@entry=0x7eff5140858c, args=args@entry=0x0) at ../src/connection.c:562
562     ../src/connection.c: No such file or directory.
[Current thread is 1 (Thread 0x7eff51409700 (LWP 7249))]
(gdb) bt
#0  0x00007eff59760f30 in wl_closure_init (message=message@entry=0x7, size=size@entry=52, num_arrays=num_arrays@entry=0x7eff5140858c, args=args@entry=0x0) at ../src/connection.c:562
#1  0x00007eff59761aa0 in wl_connection_demarshal (connection=0x7eff440053e0, size=size@entry=52, objects=objects@entry=0x7eff440052e8, message=0x7) at ../src/connection.c:698
#2  0x00007eff5975fae8 in queue_event (len=52, display=0x7eff44005270) at ../src/wayland-client.c:1364
#3  read_events (display=0x7eff44005270) at ../src/wayland-client.c:1466
#4  wl_display_read_events (display=display@entry=0x7eff44005270) at ../src/wayland-client.c:1549
#5  0x00007eff59760169 in wl_display_dispatch_queue (display=0x7eff44005270, queue=0x7eff44005338) at ../src/wayland-client.c:1788
#6  0x00007eff5d123933 in KWayland::Client::ConnectionThread::Private::<lambda()>::operator() (__closure=0x7eff44009550) at ./src/client/connection_thread.cpp:129
#7  QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, KWayland::Client::ConnectionThread::Private::setupSocketNotifier()::<lambda()> >::call (arg=<optimized out>, 
    f=...) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobjectdefs_impl.h:128
#8  QtPrivate::Functor<KWayland::Client::ConnectionThread::Private::setupSocketNotifier()::<lambda()>, 0>::call<QtPrivate::List<>, void> (arg=<optimized out>, f=...)
    at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobjectdefs_impl.h:238
#9  QtPrivate::QFunctorSlotObject<KWayland::Client::ConnectionThread::Private::setupSocketNotifier()::<lambda()>, 0, QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void **, bool *) (which=<optimized out>, this_=0x7eff44009540, r=<optimized out>, a=<optimized out>, ret=<optimized out>)
    at /usr/include/x86_64-linux-gnu/qt5/QtCore/qobjectdefs_impl.h:421
#10 0x00007eff5e606910 in QtPrivate::QSlotObjectBase::call (a=0x7eff514087d0, r=0x564a80be84f0, this=0x7eff44009540) at ../../include/QtCore/../../src/corelib/kernel/qobjectdefs_impl.h:376
#11 QMetaObject::activate(QObject*, int, int, void**) () at kernel/qobject.cpp:3754
#12 0x00007eff5e606dd7 in QMetaObject::activate (sender=sender@entry=0x7eff44009440, m=m@entry=0x7eff5e863c60 <QSocketNotifier::staticMetaObject>, 
    local_signal_index=local_signal_index@entry=0, argv=argv@entry=0x7eff514087d0) at kernel/qobject.cpp:3633
#13 0x00007eff5e611ff9 in QSocketNotifier::activated (this=this@entry=0x7eff44009440, _t1=<optimized out>, _t2=...) at .moc/moc_qsocketnotifier.cpp:136
#14 0x00007eff5e612341 in QSocketNotifier::event (this=0x7eff44009440, e=0x7eff51408a30) at kernel/qsocketnotifier.cpp:266
#15 0x00007eff5e9cb4a1 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#16 0x00007eff5e9d2ae0 in QApplication::notify(QObject*, QEvent*) () from /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#17 0x00007eff5e5dd579 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at ../../include/QtCore/5.11.1/QtCore/private/../../../../../src/corelib/thread/qthread_p.h:307
#18 0x00007eff5e62fe4a in QCoreApplication::sendEvent (event=0x7eff51408a30, receiver=<optimized out>) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:234
#19 socketNotifierSourceDispatch(_GSource*, int (*)(void*), void*) () at kernel/qeventdispatcher_glib.cpp:106
#20 0x00007eff5a647287 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#21 0x00007eff5a6474c0 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#22 0x00007eff5a64754c in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#23 0x00007eff5e62f223 in QEventDispatcherGlib::processEvents (this=0x7eff44000b20, flags=...) at kernel/qeventdispatcher_glib.cpp:423
#24 0x00007eff5e5dc24b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at ../../include/QtCore/../../src/corelib/global/qflags.h:140
#25 0x00007eff5e42b176 in QThread::exec() () at ../../include/QtCore/../../src/corelib/global/qflags.h:120
#26 0x00007eff5e434d47 in QThreadPrivate::start(void*) () at thread/qthread_unix.cpp:367
#27 0x00007eff5efb5f2a in start_thread (arg=0x7eff51409700) at pthread_create.c:463
#28 0x00007eff5e0fdedf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Comment 16 Shmerl 2018-09-14 15:10:09 UTC
Corresponding wayland-client bug: https://gitlab.freedesktop.org/wayland/wayland/issues/56
Comment 17 Roman Gilg 2018-09-14 23:58:18 UTC
This is likely an upstream bug. If they say something differently, pls reopen.
Comment 18 Shmerl 2018-09-26 03:19:51 UTC
It can be a downstream issue with Debian specifically (since according to Mesa developers, it doesn't happen on Arch). So I also opened a Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=909636
Comment 19 Mustafa Muhammad 2018-10-10 08:06:00 UTC
(In reply to Shmerl from comment #18)
> It can be a downstream issue with Debian specifically (since according to
> Mesa developers, it doesn't happen on Arch). So I also opened a Debian bug:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=909636

I'm not a developer, but I suggest you try booting the latest Neon ISO to check if it works on your hardware, if it does, it is upstream or distribution issue.
Comment 20 Shmerl 2018-10-11 16:44:01 UTC
It looks like it's related to https://bugs.freedesktop.org/show_bug.cgi?id=107978

My monitor (Dell U2413) has a setting for toggling DisplayPort 1.2. When I disable it, Wayland Plasma session isn't crashing anymore and is logging in properly!
Comment 21 Martin Flöser 2018-10-11 17:01:25 UTC
Great that you finally found the reason.
Comment 22 Shmerl 2018-12-07 06:34:40 UTC
It's fixed now for amdgpu in Linux kernel master and the fix should be available in 4.20 release.
Comment 23 Shmerl 2018-12-09 18:04:44 UTC
It is still a KWin bug though, since the session shoulnd't be crashing when there are no outputs found (which is the case here for a split moment caused by amdgpu bug above).
Comment 24 Martin Flöser 2018-12-09 18:53:00 UTC
(In reply to Shmerl from comment #23)
> It is still a KWin bug though, since the session shoulnd't be crashing when
> there are no outputs found (which is the case here for a split moment caused
> by amdgpu bug above).

No, this is working as intended. KWin exits gracefully as on the DRM platform we require screens to be present. KWin cannot start up without screens.