Bug 395381 - Crashing Xwayland locks-up kwin_wayland keboard and mouse
Summary: Crashing Xwayland locks-up kwin_wayland keboard and mouse
Status: CLOSED UPSTREAM
Alias: None
Product: kwin
Classification: Plasma
Component: wayland-generic (show other bugs)
Version: 5.13.0
Platform: Arch Linux Linux
: NOR crash
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
: 395322 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-06-14 16:58 UTC by James
Modified: 2018-06-23 20:17 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description James 2018-06-14 16:58:34 UTC
Arch Linux
 kwin 5.13.0-1
 xorg-server-xwayland 1.20.0-7
 Intel Graphics i915/i965

startplasmacompositor starts fine.  Repeated selection of Application Launcher from Panel eventually results in Xwayland crash and cursor lock-up.  Log has:

 plasmashell[922]: org.kde.plasmaquick: Applet "Application Launcher" loaded after 0 msec
 plasmashell[922]: org.kde.plasmaquick: Increasing score for "Application Launcher" to 100
 systemd[1]: Created slice system-systemd\x2dcoredump.slice.
 systemd[1]: Started Process Core Dump (PID 1328/UID 0).
 kuiserver5[1083]: The X11 connection broke (error 1). Did the X11 server die?
...
 kdeinit5[882]: kdeinit5: sending SIGTERM to children.
 kdeinit5[882]: kdeinit5: Exit.
...
 systemd-coredump[1329]: Process 835 (Xwayland) of user 1000 dumped core.
...


Remote login and kill -9 are required to kill kwin_wayland and return a virtual terminal.

Is this an Xwayland issue?
Comment 1 James 2018-06-14 17:19:19 UTC
Also, "startplasmacompositor" has this on the virtual terminal screen:

Xwayland: ../xorg-server-1.20.0/hw/xwayland/xwayland-present.c:520: xwl_present_flips_stop: Assertion `xwl_window->present_window == window' failed.
Comment 2 Martin Flöser 2018-06-14 19:55:39 UTC
Please report to X developers.
Comment 3 James 2018-06-15 14:39:56 UTC
It is one thing for the Xwayland X server to be buggy and to crash.  It is another thing that the kwin_wayland Wayland compositor locks-up the user interface and becomes terminally unusable as a consequence of the otherwise unrelated Xwayland X server crashing.  In this respect, kwin_wayland is broken and needs to be fixed.

Additionally, there is the question of why Xwayland was not automatically restarted by kwin_wayland, when kwin_wayland is run with the "--xwayland" option.
Comment 4 James 2018-06-15 15:32:33 UTC
Freedesktop Bugs 106930 - xwl_present_flips_stop: Assertion `xwl_window->present_window == window' failed.
https://bugs.freedesktop.org/show_bug.cgi?id=106930
Comment 5 ahc 2018-06-15 16:28:13 UTC
*** Bug 395322 has been marked as a duplicate of this bug. ***
Comment 6 ahc 2018-06-15 16:39:10 UTC
James, I'm experiencing the same issue, and it takes more than 30 seconds for the plasma desktop to appear and once it does, when the application launcher is used twice then my computer freezes. I've updated xwayland to the latest git commit, but I am still experiencing the same issue. The funny thing is that xwayland version 1.20.0 used to work with plasma 5.12.5 before the upgrade to version 5.13.0. I think kwin_wayland might be causing the crash.
Comment 7 Martin Flöser 2018-06-15 19:17:56 UTC
There's nothing we can do about crashing x. We try to detect it, but if it happens during an X call KWin freezes and there's nothing we can do about it. The issue needs to be fixed by X, we cannot. If you are not satisfied with the quality of Xwayland you have the option to run KWin without it.
Comment 8 James 2018-06-15 22:22:20 UTC
> if it happens during an X call KWin freezes and there's nothing we can do about it.

That's an amazingly lame excuse!  kwin makes a call to an outside application, and what?  kwin blocks and just waits for it to return?  I wish you were just making a joke, but this is not acceptable.

And, what's at stake?  Locking-up the entire user interface hardware, so that the machine is unusable without a remote login?  And that's not important enough to exert some additional thought and a few extra lines of code?

You need to ask for help, or let someone else handle this issue.
Comment 9 Martin Flöser 2018-06-16 05:08:49 UTC
Resetting to upstream. Please do not reopen. The fault is not in our software and there's nothing we can do. I investigated this issue years ago and implemented what is possible.
Comment 10 ahc 2018-06-17 09:55:24 UTC
I've using kwin 5.13.0 with liquidshell and no Xwayland crash so far.
Comment 11 ahc 2018-06-21 17:53:53 UTC
Martin is right, it is a Xwayland bug, and nothing to do with kwin or plasma. I'm able to use plasma-5.13.1 without this issue by setting QT_QPA_PLATFORM to  QT_QPA_PLATFORM=wayland. For time being I've patched plasma-workspace to set QT_QPA_PLATFORM to wayland, while I'm waiting for the Xwayland bug to be fixed. Here is the patch:

--- a/shell/main.cpp	2018-06-21 19:32:15.233320641 +0200
+++ b/shell/main.cpp	2018-06-21 19:32:57.733320349 +0200
@@ -63,13 +63,8 @@
 
     QQuickWindow::setDefaultAlphaBuffer(true);
 
-    const bool qpaVariable = qEnvironmentVariableIsSet("QT_QPA_PLATFORM");
     KWorkSpace::detectPlatform(argc, argv);
     QApplication app(argc, argv);
-    if (!qpaVariable) {
-        // don't leak the env variable to processes we start
-        qunsetenv("QT_QPA_PLATFORM");
-    }
     KLocalizedString::setApplicationDomain("plasmashell");
 
     // The executable's path is added to the library/plugin paths.
--- a/krunner/main.cpp	2018-06-21 19:35:05.889986144 +0200
+++ b/krunner/main.cpp	2018-06-21 19:35:40.136652576 +0200
@@ -44,14 +44,9 @@
     qunsetenv("QT_DEVICE_PIXEL_RATIO");
     QCoreApplication::setAttribute(Qt::AA_DisableHighDpiScaling);
 
-    const bool qpaVariable = qEnvironmentVariableIsSet("QT_QPA_PLATFORM");
     KWorkSpace::detectPlatform(argc, argv);
     QQuickWindow::setDefaultAlphaBuffer(true);
     QApplication app(argc, argv);
-    if (!qpaVariable) {
-        // don't leak the env variable to processes we start
-        qunsetenv("QT_QPA_PLATFORM");
-    }
     KLocalizedString::setApplicationDomain("krunner");
 
     KQuickAddons::QtQuickSettings::init();
--- a/startkde/startplasmacompositor.cmake	2018-06-21 19:37:31.296651839 +0200
+++ b/startkde/startplasmacompositor.cmake	2018-06-21 19:38:21.243318161 +0200
@@ -218,6 +218,10 @@
 XDG_CURRENT_DESKTOP=KDE
 export XDG_CURRENT_DESKTOP
 
+#enforce wayland QPA
+QT_QPA_PLATFORM=wayland
+export QT_QPA_PLATFORM
+
 # kwin_wayland can possibly also start dbus-activated services which need env variables.
 # In that case, the update in startplasma might be too late.
 if which dbus-update-activation-environment >/dev/null 2>/dev/null ; then
Comment 12 James 2018-06-23 02:30:49 UTC
> Martin is right, it is a Xwayland bug, and nothing to do with kwin or plasma.
> I'm able to use plasma-5.13.1 without this issue by setting QT_QPA_PLATFORM
> to QT_QPA_PLATFORM=wayland.

Thanks for the work-around.  Still, would you please help me to understand what is going on here?

First, how does selecting a button on the Plasma Panel - presumably a Wayland-based app - have anything to do with calling-out to Xwayland, an X11 server emulator?

Am I basically misunderstanding kwin_wayland?  Is kwin_wayland actually just kwin running over Xwayland?

Second, if kwin_wayland is *not* simply kwin running over Xwayland, then why would a call to Xwayland not simply be run in a non-blocking manner?

And third, how does setting QT_QPA_PLATFORM=wayland bypass the Xwayland bug, in this particular instance, for example, with selection of the Application Launcher from the Panel?  Is the KDE Panel an X11 application?

I did find "Progress on Plasma Wayland for 5.13" by Roman Gilg, and Martin's "Unsetting QT_QPA_PLATFORM environment variable by default".  My more selfish take-away from Martin's comments would be that everyone is dealing with a work-around for a Qt bug, one that has not been resolved even in Qt 5.11.  Even then, though, that does not explain why kwin_wayland is making blocking calls to Xwayland, to say nothing of the security implications, the chance of completely locking the user interface.
Comment 13 Martin Flöser 2018-06-23 06:36:10 UTC
The QT_QPA_PLATFORM env variable has nothing to do with this issue. By having less applications run on X11 it's just less likely that the issue in X11 happens. It doesn't change anything. There's a bug in XWayland and we cannot workaround it.

Of course we are using the non-blocking xcb library to interact with XWayland. Unfortunately xcb is not robust for these kind of issues. Normally when X dies, all applications die as well, so that xcb is still blocking is not a problem. For KWin_Wayland it is a problem, but we cannot do anything about it.
Comment 14 James 2018-06-23 14:00:32 UTC
Obviously, the presumption that "we are using the non-blocking xcb library" is a false assumption, since, in what follows, the assertion is tacitly made that, in fact, the xcb library *is* blocking.  It is a contradiction to say that "the xcb library is non-blocking" and then that "the xcb library is blocking".

Saying both that "xcb is not robust for these kind of issues" and then that "xcb is still blocking is not a problem" is also a contradiction in reasoning.

"xcb is not robust" *is* a problem.  This is not the 20th Century any more.  The world has moved along, and no matter that xcb was released in 2001.  X11 is out.  Wayland is in.  xcb needs to grow-up.

And then saying "we cannot do anything about it" is just not true.  KDE is not the bastard step child of xcb, just lucky to survive on crumbs.  Don't put-up with brain-dead problems caused by xcb.  Tell them to fix it!

Or, do not use xcb.  Only punk mechanics blame their tools for bad worksmanship!  Use something that works, and stop blaming other people.

Still, none of that explains why the kwin_wayland calls to xcb are blocking.  Nor does it explain why the Plasma Panel is calling out to xcb in the first place.
Comment 15 ahc 2018-06-23 15:00:05 UTC
In my case I did patch plasma-workspace, ie krunner, to not call Xwayland by defaulting to QT_QPA_PLATFORM=wayland, which is why I'm not experiencing any Xwayland crash. When you run startplasmacompositor, the script invokes xwayland: "/bin/kwin_wayland --xwayland --libinput --exit-with-session=/lib/startplasma". I must point out that I don't use any Xorg server. Plasma doesn't run completely on wayland, so Xwayland is required I guess. for instance, I had to patch splasma-desktop to build kcm_mouse cutting cutting out the Xorg code. This has been a long waited feature for me.
Comment 16 Martin Flöser 2018-06-23 20:17:34 UTC
If you are not satisfied with the quality of Xwayland you have the option to run KWin without it.