Bug 392376

Summary: Wayland socket buffer gets filled up and application terminates when GUI thread was blocked
Product: [Plasma] kwin Reporter: Martin Kostolný <clearmartin>
Component: compositingAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED UPSTREAM    
Severity: normal CC: johan.helsing, magiblot, nate
Priority: NOR    
Version: git master   
Target Milestone: ---   
Platform: Archlinux   
OS: Linux   
URL: https://gitlab.freedesktop.org/wayland/wayland/-/issues/159
See Also: https://bugs.kde.org/show_bug.cgi?id=433218
Latest Commit: Version Fixed In:

Description Martin Kostolný 2018-03-26 20:01:01 UTC
Original bug report here: https://bugreports.qt.io/browse/QTBUG-66997

Please read the original report, there is also a minimal application with steps to reproduce the issue. The cmake version of the minimal app is "qt-application.tar.gz".

This ticket here is more a discussion starter. Unfortunately I don't understand the specifics so there is probably a question whether the issue should be fixed in compositor (kwin), Qt, or both...
Comment 1 Martin Flöser 2018-03-27 04:22:01 UTC
For KWin there's nothing to do here. It's the task of the client to ensure it handles the events. KWin does not even know that the client stopped processing.
Comment 2 Johan Klokkhammer Helsing 2018-03-27 08:23:16 UTC
Maybe you can send more ping events and stop sending pointer events when a client doesn't answer? I think this is what Weston does. It would probably solve the problem in almost all cases. (I was not able to reproduce a crash on Weston)

It's not currently (without significant hacks) possible to read wayland events without also dispatching them, so there's not much we can really do except tell application code to stop blocking the GUI thread.

From what I can tell, you're also going to have the same problem with blocking GTK clients as they handle events the same way we do.
Comment 3 Martin Flöser 2018-03-27 15:51:15 UTC
Sending all input events is IMHO a feature. The application has a chance to catch up on the events, after being unblocked and nothing is lost.

We can add more pings, sure, but what does it help? Instead of the app crashing, it gets kill -9 by the user. We ping and provide a guitar to kill the application.

IMHO this is neither a problem with the toolkit nor with the compositor. Doing freezing tasks in the main gui thread was a bad idea 15 years ago and still is. And especially Qt makes it extremely easy to move the heavy computations out into a thread. QtConcurrent::run in combination with a qfuturewatcher eliminates all gui freezing.
Comment 4 Martin Flöser 2018-03-27 15:52:06 UTC
Interesting auto completion: gui becomes guitar
Comment 5 Martin Kostolný 2018-05-13 20:54:03 UTC
Thanks for investigating! And sorry for my late response. This is more an informative update.

I've tried a few things recommended by Johan Helsing (https://bugreports.qt.io/browse/QTBUG-66997).

1) Increasing max_dgram_qlen seemed not to help
2) Proposed temporary fix in qtwayland improved the situation but not entirely

Just for info: I'm sure the issue is happening on Weston as well.

I also agree it is application's responsibility to stay responsive. But I fear even though all heavy lifting is done outside GUItar:) thread there may still be situations when this use-case happens. For example I get crashes when I open bigger text file in Kate.

It gets worse when CPU is already under load because of different hungry processes - and in such situation if one performs GUI demanding tasks like moving mouse up/down on Kate minimap, which constantly generates tooltip with text preview, Kate crashes as well. But maybe that can also be fixed by the proposed QtConcurrent::run & qfuturewatcher usage. I'm not sure.

Anyway it seems I'm the only Wayland user hit by this issue so we can probably wait if somebody else will complain. I was merely trying to tell about this issue before Plasma Wayland hits more audience :).
Comment 6 magiblot 2020-04-21 04:53:37 UTC
This issue can be easily hit by anyone with a HDD. Dolphin, Systemsettings, Kate, Falkon... almost every KDE application is vulnerable to this. This is my most frequent crash in Wayland sessions.

I don't know if this adds anything new to the discussion, but I tracked down the origin of the crash in libwayland.

Qt applications crash from QWaylandDisplay::checkError() after wl_display_dispatch_pending returns negative. So I looked into the library to see what was going on.

The error within libwayland takes place when recvmsg returns -1 with errno = 104 ("Connection reset by peer") in wl_os_recvmsg_cloexec (wayland-os.c). This result goes through wl_connection_read (connection.c) until it is handled by read_events (wayland-client.c).

I don't know how wayland or kwin work, so my questions might not make a lot of sense: does the above mean that the connection is reset voluntarily by Kwin, or is it a consequence of the buffer filling up? Can Kwin do anything to prevent the connection from breaking?
Comment 7 magiblot 2020-06-09 13:26:14 UTC
I guess the following comment by Pekka Paalanen from https://gitlab.freedesktop.org/wayland/wayland/-/issues/159 can be considered the opinion of Wayland developers on the issue:

> I still think the first step is to ensure the ping/pong protocol works,
> detects stalls fast enough (e.g. ping should be triggered by first input
> event since the last pong + small timeout), and actually leads to stopping
> input events in the compositor. That is relatively easy to do and should
> go a long way.
This would also make it possible to show "Unresponsive application" dialogs for Wayland clients (assuming it is not implemented yet).