Bug 498837 - A notification Xwayland has crashed was sometimes shown when Plasma 6.2.90 started even though Xwayland didn't appear to have crashed
Summary: A notification Xwayland has crashed was sometimes shown when Plasma 6.2.90 st...
Status: RESOLVED FIXED
Alias: None
Product: kwin
Classification: Plasma
Component: xwayland (other bugs)
Version First Reported In: 6.2.90
Platform: Fedora RPMs Linux
: NOR crash
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords: qt6
: 499473 (view as bug list)
Depends on:
Blocks:
 
Reported: 2025-01-18 07:53 UTC by Matt Fagnani
Modified: 2025-02-04 21:57 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Journal of Plasma startup with Xwayland has crashed notification and debugging enabled (289.98 KB, text/plain)
2025-01-21 23:55 UTC, Matt Fagnani
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Fagnani 2025-01-18 07:53:38 UTC
SUMMARY

I started Plasma 6.2.90 on Wayland using the Fedora Rawhide KDE live images Fedora-KDE-Desktop-Live-Rawhide-20250113.n.0.x86_64.iso and Fedora-KDE-Desktop-Live-Rawhide-20250118.n.0.x86_64.iso in QEMU/KVM VMs in GNOME Boxes and on bare metal. A notification Xwayland has crashed was sometimes shown when Plasma 6.2.90 started even though Xwayland didn't appear to have crashed. I saw this notification several times out of about 35 times I started Plasma. The notification popup had KWin Window Manager as its title so I'm assigning this to kwin. coredumpctl and the journal didn't show any crashes. ps aux | grep Xwayland showed Xwayland was running. Xwayland might not have started by the time the check for the notification was done at times in a race condition. I didn't see this notification with Plasma 6.2.5 or earlier.

STEPS TO REPRODUCE
1. Download the Fedora Rawhide live image Fedora-KDE-Desktop-Live-Rawhide-20250118.n.0.x86_64.iso from https://koji.fedoraproject.org/koji/buildinfo?buildID=2629884
2. In a Fedora 41 KDE installation, start GNOME Boxes
3. Boot Fedora-KDE-Desktop-Live-Rawhide-20250118.n.0.x86_64.iso in a QEMU/KVM VM in GNOME Boxes with UEFI enabled, 3D acceleration enabled, and 4 GiB RAM.
4. If the notification wasn't shown, log out of Plasma in the VM and log in to Plasma until it does. Alternatively, reboot the VM until the notification appears.

OBSERVED RESULT
A notification Xwayland has crashed was sometimes shown when Plasma 6.2.90 started even though Xwayland didn't appear to have crashed

EXPECTED RESULT
No notification about Xwayland crashing should've been shown.

SOFTWARE/OS VERSIONS

Linux/KDE Plasma: Fedora Rawhide/42
KDE Plasma Version: 6.2.90
KDE Frameworks Version: 6.10.0
Qt Version: 6.8.1

ADDITIONAL INFORMATION
Comment 1 Vlad Zahorodnii 2025-01-21 10:07:56 UTC
> started even though Xwayland didn't appear to have crashed

How did you confirm that Xwayland hasn't crashed?
Comment 2 Matt Fagnani 2025-01-21 18:00:25 UTC
coredumpctl and the journal didn't show any crashes. ps aux | grep Xwayland showed Xwayland was running.

The openQA test KDE-live-iso desktop_notifications_live@uefi failed with Xwayland has crashed in the Notifications applet at https://openqa.fedoraproject.org/tests/3151128#step/desktop_notifications/19
Comment 3 Matt Fagnani 2025-01-21 23:55:20 UTC
Created attachment 177580 [details]
Journal of Plasma startup with Xwayland has crashed notification and debugging enabled

The Xwayland has crashed notification appears to be from XwaylandLauncher::handleXwaylandFinished https://invent.kde.org/plasma/kwin/-/blob/master/src/xwayland/xwaylandlauncher.cpp#L270

void XwaylandLauncher::handleXwaylandFinished(int exitCode, QProcess::ExitStatus exitStatus)
{
    qCDebug(KWIN_XWL) << "Xwayland process has quit with exit status:" << exitStatus << "exit code:" << exitCode;

#if KWIN_BUILD_NOTIFICATIONS
    KNotification::event(QStringLiteral("xwaylandcrash"), i18n("Xwayland has crashed"));
#endif

In Plasma 6.2.90 in a VM, I put QT_LOGGING_RULES="*.debug=true;qt*.debug=false" in /etc/environment and I restarted Plasma until the Xwayland has crashed notification was shown. There was a fatal server error in Xwayland "Cannot write display number to fd 72", kwin failed to establish the the XCB connection, and Xwayland quit with the code 1.

Jan 21 23:22:47 kwin_wayland_wrapper[9053]: (EE)
Jan 21 23:22:47 kwin_wayland_wrapper[9053]: Fatal server error:
Jan 21 23:22:47 kwin_wayland_wrapper[9053]: (EE) Cannot write display number to fd 72
Jan 21 23:22:47 kwin_wayland_wrapper[9053]: (EE)
Jan 21 23:22:47 kwin_wayland[8992]: kwin_xwl: Failed to establish the XCB connection (error 1)
Jan 21 23:22:47 kwin_wayland[8992]: kwin_xwl: Xwayland process has quit with exit status: QProcess::NormalExit exit code: 1

Xwayland didn't create a core dump. I'm attaching the journal from the Plasma session with the Xwayland has crashed notification.
Comment 5 Adam Williamson 2025-02-03 18:23:30 UTC
*** Bug 499473 has been marked as a duplicate of this bug. ***
Comment 6 Adam Williamson 2025-02-03 18:24:43 UTC
Filed https://bugzilla.redhat.com/show_bug.cgi?id=2343580 to track this downstream (and propose as a release blocker, as we have a rule about no crash notifications).
Comment 7 Adam Williamson 2025-02-03 18:41:56 UTC
Notes on the notification itself: even if it's true that XWayland exited abnormally, this seems like a very poor notification. What is a user supposed to *do* with this - and only this - information? Some users won't know what XWayland is. Even those who do may not know what it exiting abnormally means for them in practical terms. On my test system, when reproducing this issue and then checking running processes, there *is* a running XWayland process, which seems to imply that maybe it is automatically restarted and that restart works fine - but the notification doesn't convey that information. Think about, for instance, someone who's using Bazzite because they learned about it on Youtube, they go to desktop mode, they see this notification: that's not a great experience!

If the notification:
i) appeared after we'd given up restarting xwayland(? assuming that's what happens)
ii) didn't lie and say it had crashed when it hadn't
iii) told me, or linked to somewhere that told me, what xwayland not running means in practical terms and what I might be able to do about it

...then it'd be a lot more useful and a better experience.
Comment 8 Adam Williamson 2025-02-03 20:54:55 UTC
I think there's definitely an underlying bug here with the "Cannot write display number" messages. In Fedora openQA testing we're also getting frequent failures of the user switch test. I tried that out manually too, and saw the same issue - when I switch users, it takes a long time for the second user's session to start (so long that openQA gives up, but in local testing I found it *does* come up after a while). In the logs I see that error four times, then a message from kwin saying it's giving up on Xwayland after four tries - and indeed the second user's session has no Xwayland process. And, of course, sometimes I get at least some of those errors just logging in as the first user (which is when we see this notification).

I tested an F41 install for comparison. There, on two attempts, user switching worked flawlessly, both sessions have an Xwayland process, and there were zero "Cannot write display number" errors in either user session on either attempt. So we should probably dig further into what's going on there, and how it's changed since F41.
Comment 9 Vlad Zahorodnii 2025-02-03 21:01:10 UTC
(In reply to Adam Williamson from comment #7)
> Notes on the notification itself: even if it's true that XWayland exited
> abnormally, this seems like a very poor notification. What is a user
> supposed to *do* with this - and only this - information? Some users won't

The goal is to give the user some feedback why some of the opened applications disappeared for some reason.

Either way, after a painful debugging session, I think I know what regressed. I'll make a MR tomorrow.
Comment 10 Adam Williamson 2025-02-03 21:14:04 UTC
oh, awesome, I'll look forward to that with interest, then.
Comment 11 Vlad Zahorodnii 2025-02-04 09:17:43 UTC
Git commit 60275f4dbe1a7648b855de3fa76b2f6b378d086b by Vlad Zahorodnii.
Committed on 04/02/2025 at 09:03.
Pushed by vladz into branch 'master'.

xwayland: Keep ready fd until Xwayland is stopped

Xwayland writes the display name in two steps:

- first write the display name
- then write "\n"

Xwayland will quit if either write() fails. If the XwaylandLauncher
closes its endpoint of the displayfd after the first write(), then the
second write() may fail.

Given that we just want to know when Xwayland is ready, the ready fd
can be kept around until Xwayland is stopped.

The issue was introduced by 8e42599149bbd6e4e76baddd04b1c403131354ba
(kind of). Prior to that commit, kwin had been leaking the read endpoint
of the displayfd to Xwayland. So, even if kwin closes its endpoint,
Xwayland has it too so both write()s will succeed.

M  +7    -8    src/xwayland/xwaylandlauncher.cpp
M  +0    -2    src/xwayland/xwaylandlauncher.h

https://invent.kde.org/plasma/kwin/-/commit/60275f4dbe1a7648b855de3fa76b2f6b378d086b
Comment 12 Vlad Zahorodnii 2025-02-04 10:53:10 UTC
Git commit 8502be80d0c6f8b3e081efb8390b9793b43cca41 by Vlad Zahorodnii.
Committed on 04/02/2025 at 09:35.
Pushed by vladz into branch 'Plasma/6.3'.

xwayland: Keep ready fd until Xwayland is stopped

Xwayland writes the display name in two steps:

- first write the display name
- then write "\n"

Xwayland will quit if either write() fails. If the XwaylandLauncher
closes its endpoint of the displayfd after the first write(), then the
second write() may fail.

Given that we just want to know when Xwayland is ready, the ready fd
can be kept around until Xwayland is stopped.

The issue was introduced by 8e42599149bbd6e4e76baddd04b1c403131354ba
(kind of). Prior to that commit, kwin had been leaking the read endpoint
of the displayfd to Xwayland. So, even if kwin closes its endpoint,
Xwayland has it too so both write()s will succeed.


(cherry picked from commit 60275f4dbe1a7648b855de3fa76b2f6b378d086b)

Co-authored-by: Vlad Zahorodnii <vlad.zahorodnii@kde.org>

M  +7    -8    src/xwayland/xwaylandlauncher.cpp
M  +0    -2    src/xwayland/xwaylandlauncher.h

https://invent.kde.org/plasma/kwin/-/commit/8502be80d0c6f8b3e081efb8390b9793b43cca41
Comment 13 Adam Williamson 2025-02-04 21:57:30 UTC
Cool, thanks! I'll get that backported to Fedora and see if the tests clear up. I still think the notification could be improved, but that looks good.