Bug 442846

Summary: Blocking calls to Xwayland can make kwin freeze
Product: [Plasma] kwin Reporter: p d <pizzadude>
Component: wayland-genericAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: auxsvr, chermnykh2001, dev+kde, fanzhuyifan, fincer89, fleury.corentin, heri+kde, imilarsky, jay, kde, kdebugreport, kdedev, kode54, leftcrane, m, mosaulp, nate, nathaniel.graham, nicolas.fella, philipp.reichmuth, postix, sjurberengal+kde, team, wengxt, xaver.hugl, ximwix, zwjmazza
Priority: HI Keywords: wayland
Version: 5.22.5   
Target Milestone: ---   
Platform: Fedora RPMs   
OS: Linux   
See Also: https://bugs.kde.org/show_bug.cgi?id=449948
Latest Commit: Version Fixed In: 6.2.3
Sentry Crash Report:
Attachments: kwin_wayland backtrace
journalctl log
kwin_wayland backtrace
Xwayland backtrace

Description p d 2021-09-23 15:38:47 UTC
SUMMARY
I use Fedora 34 KDE on a Thinkpad T580 with an i5-7300U CPU and Intel HD Graphics 620. When I lock the screen, and leave the laptop for a long period of time (a few hours), there is a 50% chance when I come back, and after unlocking the lock screen, my whole system will be in a frozen state. This only happens on wayland. When the system is in a frozen state, I can't ctrl+alt+f[number] either to access a tty. There is no response from any key I press. If I'm lucky, and I spam a bunch of keys, the system will unfreeze. Most of the time I have to force poweroff though.

The problem doesn't happen if I don't keep the system on the lockscreen for a long period of the time. The problem doesn't happen if I close the laptop lid and let it sleep instead of keeping it open.

STEPS TO REPRODUCE
1. On wayland, Leave intel hd graphics laptop on lock screen for long perioid of time
2. Come back to computer and unlock the screen
3. Full system freeze

OBSERVED RESULT

System freeze upon unlock after idle for a few hours


EXPECTED RESULT

System not to freeze

SOFTWARE/OS VERSIONS

Operating System: Fedora 34
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.85.0
Qt Version: 5.15.2
Kernel Version: 5.13.19-200.fc34.x86_64 (64-bit)
Graphics Platform: Wayland
Processors: 4 × Intel® Core™ i5-7300U CPU @ 2.60GHz
Memory: 7.6 GiB of RAM
Graphics Processor: Mesa Intel® HD Graphics 620

ADDITIONAL INFORMATION
I use BTRFS.
Comment 1 Zamundaaa 2021-09-24 06:29:14 UTC
Can you still log in remotely via ssh?
Comment 2 p d 2021-09-25 21:41:37 UTC
I will let you know if I can login via ssh next time I encounter the issue.
Comment 3 Zachary Mazza 2021-10-20 18:59:52 UTC
Commenting to confirm that I also experience this issue. 

When I encounter this issue, if I wait long enough (about 10 minutes), my desktop will usually start responding again. I assume this has to due with the length of time I leave my computer idle, as the few times I've been forced to restart it were when it was idle for the entire night, rather than being idle for less than an hour.

Basically, the less time my computer spends idling with a locked screen, the faster (and more likely) it will stop being frozen after I log back in.
Comment 4 p d 2021-10-22 10:49:38 UTC
I haven't had this issue in a while.

Sometimes I launch a separate wayland session via (Ctrl + Alt + F<number> and log in as different user, startplasma-wayland), then switch back. I wonder if that has something to do with the issue.
Comment 5 Vlad Zahorodnii 2021-10-27 07:57:29 UTC
(In reply to p d from comment #4)
> I haven't had this issue in a while.
> 
> Sometimes I launch a separate wayland session via (Ctrl + Alt + F<number>
> and log in as different user, startplasma-wayland), then switch back. I
> wonder if that has something to do with the issue.

Maybe.. It will be helpful to see kwin_wayland's backtrace when session is frozen.
Comment 6 Zachary Mazza 2021-11-05 23:42:38 UTC
Created attachment 143264 [details]
kwin_wayland backtrace

I tried creating a backtrace for kwin_wayland. Let me know if I did anything wrong or if it's not helpful.

I should've probably mentioned earlier that I am experiencing this issue on a dedicated AMD GPU (AMD Radeon RX 5700), so this is not an Intel only issue.
Comment 7 Berengal 2021-11-09 11:44:41 UTC
I have the same issue on Arch. I don't need the system to be locked for very long either. Sometimes it stays frozen until I restart, but sometimes it only stays frozen for a few seconds.

I enabled sshd before the last time it happened and managed to log in during the event. The ssh session worked fine and only the graphics were frozen. I saw in top that XWayland was using 100% CPU. I tried to kill it but it didn't respond. Killing it with kill -9 didn't restore the system, it was still frozen. journalctl and dmesg didn't print anything suspicious, and I couldn't see anything else weird in top either. I only had a phone available to ssh with, so poking around in a terminal was painful. Restarting sddm worked, and I got the login screen, but after logging into the plasma wayland session in I only got a black screen with no output, and I had to restart the PC to get it working properly again.

Operating System: Arch Linux
KDE Plasma Version: 5.23.1
KDE Frameworks Version: 5.87.0
Qt Version: 5.15.2
Kernel Version: 5.14.16-arch1-1 (64-bit)
Graphics Platform: Wayland
Processors: 24 × AMD Ryzen 9 5900X 12-Core Processor
Memory: 62,7 GiB of RAM
Graphics Processor: AMD Radeon RX 6900 XT
Comment 8 Manuel de la Fuente 2021-11-12 23:38:44 UTC
Can reproduce using Ryzen+Radeon. It hard locks the computer, you can't change into another TTY and can't reisub either. You can only hard reset, and that seemingly breaks both X and SDDM so you're stuck with startplasma-wayland. It still happens ever after doing a rollback with Snapper.

Disabling the automatic lock screen after $i++ minutes will actually still freeze the computer. Plasmashell dies and the background is black, in my case QtWidget apps switch between the regular font and a smaller bitmap one every second. You can only move the cursor and the cursor states will actually change and UI elements in the apps themselves will switch on hover after a bit of a delay but you can't interact with anything. TTY switching and reisub still doesn't work for some reason though.)

Operating System: openSUSE Tumbleweed
CPU: AMD Ryzen 9 5900X
GPU: AMD Radeon RX6700XT
RAM: 31.6 GiB

sorry for not adding more information about the software versions and such, I'll be adding it tomorrow in case it's necessary but it's the most recent stable release since it's Tumbleweed.
Comment 9 Zachary Mazza 2021-11-20 01:14:03 UTC
I believe this issue has something to do with xWayland and/or Firefox under xWayland, as after I set Firefox to run using Wayland by default, I stopped experiencing this freezing entirely.
Comment 10 p d 2021-11-22 12:11:02 UTC
Maybe that is the case, because in past few months I also set Firefox to run in Wayland and haven't experienced the crash since.
Comment 11 Nate Graham 2021-11-22 14:43:35 UTC
Interesting info, thanks.

FWIW I also use Firefox in native wayland mode and can't reproduce the issue.
Comment 12 Corentin Fleury 2021-11-28 08:55:14 UTC
Created attachment 144024 [details]
journalctl log

I don't think it's related to Firefox since I was able to reproduce this by just logging in and staying idle for 10 minutes. Here's the journalctl log.
Comment 13 Elias 2021-12-02 18:35:47 UTC
I can reproduce this bug. But I think the reason for this happening has nothing to do with the screen locker itself. The freeze happens just in the moment when the monitor is sent to sleep (default after 10 min). So if you set this to 1 minute and de-activate screen locking the bug happens without a lock screen involved. Everything freezes, mouse cursor is still movable.

The strange thing is that it does not happen on all monitors:
- I can reproduce this problem on a DELL S2421HN connected via HDMI to an AMD graphics card. 
- On an older DELL model everything is fine. Also on a very old Fujitsu screen connected with DVI.

Workaround:
De-activate automatic monitor power saving.
Comment 14 kdebugreport 2022-01-20 00:17:12 UTC
I believe this is also happening to me. I don't lock the screen, so I get freezes when I wake my monitors from standby. I notice that XWayland is maxing out one thread when this happens. Even on the latest beta. 

I'd love to help solve this, any diagnostic info I can provide? Using a 6900XT with two high refresh monitors connected via DP.
Comment 15 Vlad Zahorodnii 2022-01-20 13:48:31 UTC
(In reply to Zachary Mazza from comment #6)
> Created attachment 143264 [details]
> kwin_wayland backtrace
> 
> I tried creating a backtrace for kwin_wayland. Let me know if I did anything
> wrong or if it's not helpful.
> 
> I should've probably mentioned earlier that I am experiencing this issue on
> a dedicated AMD GPU (AMD Radeon RX 5700), so this is not an Intel only issue.

Interesting...

Thread 1 (Thread 0x7fde9dff1980 (LWP 1805) "kwin_wayland"):
#0  0x00007fdea33e1cdf in __GI___poll (fds=0x7ffc04a87168, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fdea4545c1a in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1
#2  0x00007fdea4547d0f in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1
#3  0x00007fdea4547e25 in xcb_wait_for_reply () from /lib/x86_64-linux-gnu/libxcb.so.1
#4  0x00007fdea4f47803 in ?? () from /lib/x86_64-linux-gnu/libKF5WindowSystem.so.5
#5  0x00007fdea4f4994d in NETWinInfo::update(QFlags<NET::Property>, QFlags<NET::Property2>) () from /lib/x86_64-linux-gnu/libKF5WindowSystem.so.5
#6  0x00007fdea4f4b810 in NETWinInfo::event(xcb_generic_event_t*, QFlags<NET::Property>*, QFlags<NET::Property2>*) () from /lib/x86_64-linux-gnu/libKF5WindowSystem.so.5
#7  0x00007fdea53a7d97 in KWin::X11Client::windowEvent(xcb_generic_event_t*) () from /lib/x86_64-linux-gnu/libkwin.so.5
#8  0x00007fdea53a95e2 in KWin::Workspace::workspaceEvent(xcb_generic_event_t*) () from /lib/x86_64-linux-gnu/libkwin.so.5

it appears like kwin_wayland waits for a reply from xwayland, but there's no any. ideally, kwin shouldn't make any blocking calls to xwayland, but it's not doable atm
Comment 16 kdebugreport 2022-01-22 05:50:29 UTC
(In reply to Vlad Zahorodnii from comment #15)
> (In reply to Zachary Mazza from comment #6)
> > Created attachment 143264 [details]
> > kwin_wayland backtrace
> > 
> > I tried creating a backtrace for kwin_wayland. Let me know if I did anything
> > wrong or if it's not helpful.
> > 
> > I should've probably mentioned earlier that I am experiencing this issue on
> > a dedicated AMD GPU (AMD Radeon RX 5700), so this is not an Intel only issue.
> 
> Interesting...
> 
> Thread 1 (Thread 0x7fde9dff1980 (LWP 1805) "kwin_wayland"):
> #0  0x00007fdea33e1cdf in __GI___poll (fds=0x7ffc04a87168, nfds=1,
> timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
> #1  0x00007fdea4545c1a in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1
> #2  0x00007fdea4547d0f in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1
> #3  0x00007fdea4547e25 in xcb_wait_for_reply () from
> /lib/x86_64-linux-gnu/libxcb.so.1
> #4  0x00007fdea4f47803 in ?? () from
> /lib/x86_64-linux-gnu/libKF5WindowSystem.so.5
> #5  0x00007fdea4f4994d in NETWinInfo::update(QFlags<NET::Property>,
> QFlags<NET::Property2>) () from /lib/x86_64-linux-gnu/libKF5WindowSystem.so.5
> #6  0x00007fdea4f4b810 in NETWinInfo::event(xcb_generic_event_t*,
> QFlags<NET::Property>*, QFlags<NET::Property2>*) () from
> /lib/x86_64-linux-gnu/libKF5WindowSystem.so.5
> #7  0x00007fdea53a7d97 in KWin::X11Client::windowEvent(xcb_generic_event_t*)
> () from /lib/x86_64-linux-gnu/libkwin.so.5
> #8  0x00007fdea53a95e2 in
> KWin::Workspace::workspaceEvent(xcb_generic_event_t*) () from
> /lib/x86_64-linux-gnu/libkwin.so.5
> 
> it appears like kwin_wayland waits for a reply from xwayland, but there's no
> any. ideally, kwin shouldn't make any blocking calls to xwayland, but it's
> not doable atm

I also confirmed this bug, setting firefox to use wayland instead of xwayland has completely eliminated the crashes.
Comment 17 Vlad Zahorodnii 2022-01-26 15:07:32 UTC
*** Bug 449084 has been marked as a duplicate of this bug. ***
Comment 18 Vlad Zahorodnii 2022-01-26 15:13:42 UTC
Can somebody get the backtrace of Xwayland when kwin_wayland is frozen please? Maybe both kwin_wayland and Xwayland are in a deadlock
Comment 19 Dereck 2022-02-03 04:04:02 UTC
Created attachment 146208 [details]
kwin_wayland backtrace
Comment 20 Dereck 2022-02-03 04:04:27 UTC
Created attachment 146209 [details]
Xwayland backtrace
Comment 21 Dereck 2022-02-03 04:09:58 UTC
(In reply to Vlad Zahorodnii from comment #18)
> Can somebody get the backtrace of Xwayland when kwin_wayland is frozen
> please? Maybe both kwin_wayland and Xwayland are in a deadlock

I have added a kwin_wayland backtrace and Xwayland backtrace from when I have encountered this issue after unlocking.  In my case, I had an application running via mono while the desktop was locked.  (I can reliably reproduce this issue by leaving this mono application running, locking the computer, and coming back after 10 or so minutes; I also have a backtrace of that mono process if it would be useful).
Comment 22 Vlad Zahorodnii 2022-02-04 15:14:29 UTC
I wonder what Xwayland does here

#0  0x000055fb476d78bf in ?? ()
#1  0x000055fb476cf9ac in ?? ()
#2  0x000055fb4761ba42 in ?? ()
#3  0x000055fb4761bb03 in ?? ()
Comment 23 David Edmundson 2022-03-30 14:44:33 UTC
*** Bug 451570 has been marked as a duplicate of this bug. ***
Comment 24 David Edmundson 2022-03-30 14:44:49 UTC
*** Bug 425779 has been marked as a duplicate of this bug. ***
Comment 25 Weng Xuetian 2022-04-16 00:27:59 UTC
It seems that I'm also affected by this.
For some other probably unrelated reason, unplug external monitor would cause SIGSEGV on Xwayland on my setup,  but I observed that kwin freeze due to this in this case. 

I gdb attached to Xwayland and noticed Xwayland received a sigsegv, after detach from gdb, noticed Xwayland become a zombie process.

Relevant stack trace in kwin, looks similar to the existing ones.
It seems that kwin is blocked on some X event request and never returns.

I'd say we should try to manage Xwayland process in a separate thread, otherwise we may fail to collect xwayland process when making a blocking call.
#0  0x00007f18b7bd32af in poll () at /usr/lib/libc.so.6
#1  0x00007f18b8d8963b in  () at /usr/lib/libxcb.so.1
#2  0x00007f18b8d8b08f in  () at /usr/lib/libxcb.so.1
#3  0x00007f18b8d8b1a2 in xcb_wait_for_reply () at /usr/lib/libxcb.so.1
#4  0x00007f18ba58339a in KWin::Workspace::updateXStackingOrder() () at /usr/lib/libkwin.so.5
#5  0x00007f18ba583479 in KWin::Workspace::xStackingOrder() const () at /usr/lib/libkwin.so.5
#6  0x00007f18ba512243 in KWin::Compositor::windowsToRender() const () at /usr/lib/libkwin.so.5
#7  0x00007f18ba5127be in KWin::Compositor::composite(KWin::RenderLoop*) () at /usr/lib/libkwin.so.5
#8  0x00007f18b84af463 in  () at /usr/lib/libQt5Core.so.5
#9  0x00007f18ba4d07a7 in KWin::RenderLoop::frameRequested(KWin::RenderLoop*) () at /usr/lib/libkwin.so.5
#10 0x00007f18ba5a1c18 in KWin::RenderLoopPrivate::dispatch() () at /usr/lib/libkwin.so.5
#11 0x00007f18b84af463 in  () at /usr/lib/libQt5Core.so.5
#12 0x00007f18b84b169f in QTimer::timeout(QTimer::QPrivateSignal) () at /usr/lib/libQt5Core.so.5
#13 0x00007f18b84a2766 in QObject::event(QEvent*) () at /usr/lib/libQt5Core.so.5
#14 0x00007f18b8f3c1c6 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/libQt5Widgets.so.5
#15 0x00007f18b847e5aa in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib/libQt5Core.so.5
#16 0x00007f18b84c9dd5 in QTimerInfoList::activateTimers() () at /usr/lib/libQt5Core.so.5
#17 0x00007f18b84ca272 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5
#18 0x0000557e3c5a97e2 in  ()
#19 0x00007f18b847688b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5
#20 0x00007f18b8481fd7 in QCoreApplication::exec() () at /usr/lib/libQt5Core.so.5
#21 0x0000557e3c4b832a in  ()
#22 0x00007f18b7afa310 in __libc_start_call_main () at /usr/lib/libc.so.6
#23 0x00007f18b7afa3c1 in __libc_start_main_impl () at /usr/lib/libc.so.6
#24 0x0000557e3c4b9ab5 in  ()
Comment 26 Weng Xuetian 2022-04-16 00:34:35 UTC
Also another observation is that, if Xwayland is killed or crashed, all new X client will just blocking on X. I don't think this should happen.

I'm thinking that kwin might not release the socket fd or sth properly, so x client are still indefinitely waiting for X socket, including kwin itself.
Comment 27 Dennis Schridde 2022-04-18 20:17:14 UTC
(In reply to Weng Xuetian from comment #25)
> It seems that I'm also affected by this.
> For some other probably unrelated reason, unplug external monitor would
> cause SIGSEGV on Xwayland on my setup,  but I observed that kwin freeze due
> to this in this case. 
> [...]

https://bugs.kde.org/show_bug.cgi?id=449948 might be related.
Comment 28 Nate Graham 2022-04-19 15:01:56 UTC
Seems related indeed.
Comment 29 Vlad Zahorodnii 2022-11-16 09:45:49 UTC
*** Bug 461755 has been marked as a duplicate of this bug. ***
Comment 30 leftcrane 2023-02-07 11:33:29 UTC
I've been observing this issue for the past year on my kde install. I can only unlock the computer after wake 50% of the time, the other times input gets ignored. This means that when using KDE I have to do a hard reboot at least once a day, luckily most of my work is in the cloud so I use it as a browser terminal basically. If I used it as a desktop it would be impossible to get anything done on it because you'd constantly be losing all your work due to hard reboots.

Also there is a ten percent chance of the laptop failing sleep after you tell the session to tell it to go to sleep. This means that there is a 100% percent chance of your laptop eventually frying in your backpack if you're the type to close the lid and forget it hoping it does what a normal machine always would.
Comment 31 Zamundaaa 2024-09-17 11:07:18 UTC
*** Bug 475322 has been marked as a duplicate of this bug. ***
Comment 32 Zamundaaa 2024-09-24 13:19:52 UTC
*** Bug 492428 has been marked as a duplicate of this bug. ***
Comment 33 Bug Janitor Service 2024-10-27 23:26:22 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/6705
Comment 34 Vlad Zahorodnii 2024-10-28 11:51:22 UTC
Git commit 40d202976d832adcd720c933c2cf21e755a65475 by Vlad Zahorodnii.
Committed on 28/10/2024 at 11:41.
Pushed by vladz into branch 'master'.

xwayland: Fix a couple of file descriptor leaks

The -wm file descriptor leak prevents the poll() function from returning
POLLHUP when Xwayland dies.

The POLLHUP status will be set when all file descriptors for the other
end point are closed.

M  +6    -0    src/xwayland/xwaylandlauncher.cpp

https://invent.kde.org/plasma/kwin/-/commit/40d202976d832adcd720c933c2cf21e755a65475
Comment 35 Vlad Zahorodnii 2024-10-28 12:03:38 UTC
Git commit 80cd83abeafae11ae2dc22622186910a17c7e7ab by Vlad Zahorodnii.
Committed on 28/10/2024 at 11:54.
Pushed by vladz into branch 'Plasma/6.2'.

xwayland: Fix a couple of file descriptor leaks

The -wm file descriptor leak prevents the poll() function from returning
POLLHUP when Xwayland dies.

The POLLHUP status will be set when all file descriptors for the other
end point are closed.


(cherry picked from commit 40d202976d832adcd720c933c2cf21e755a65475)

Co-authored-by: Vlad Zahorodnii <vlad.zahorodnii@kde.org>

M  +6    -0    src/xwayland/xwaylandlauncher.cpp

https://invent.kde.org/plasma/kwin/-/commit/80cd83abeafae11ae2dc22622186910a17c7e7ab