SUMMARY (I assume) Some kind of internal effects change causes animations to be added with a null EffectWindow* key in m_animations. I've also observed crashes in AnimationEffect::paintWindow() - however I don't have a stack trace or line number for this, as I was using the ArchLinux 5.21.90 packages without debug info. STEPS TO REPRODUCE 1. Use an xwayland jetbrains IDE for some time 2. Eventually right clicking the source code to bring up the context menu will crash kwin 3. The crash can take minutes or hours of coding to occur, but it's almost always triggered for me by right clicking in the jetbrains IDE. OBSERVED RESULT She's dead, Jim. Core was generated by `kwin_wayland --wayland_fd 4 --xwayland /usr/lib/startplasma-waylandsession'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fa0303c6853 in KWin::AnimationEffect::updateLayerRepaints (this=this@entry=0x55e9754325f0) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/libkwineffects/kwinanimationeffect.cpp:925sometimes 925 const QRect geo = entry.key()->expandedGeometry(); [Current thread is 1 (Thread 0x7fa02953b840 (LWP 1747))] (gdb) bt #0 0x00007fa0303c6853 in KWin::AnimationEffect::updateLayerRepaints() (this=this@entry=0x55e9754325f0) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/libkwineffects/kwinanimationeffect.cpp:925 #1 0x00007fa0303c74cf in KWin::AnimationEffect::postPaintScreen() (this=0x55e9754325f0) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/libkwineffects/kwinanimationeffect.cpp:703 #2 0x00007fa0306a8351 in KWin::EffectsHandlerImpl::postPaintScreen() (this=0x55e974fb35d0) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/effects.cpp:419 #3 0x00007fa03073f0a7 in KWin::Scene::paintScreen(int*, QRegion const&, QRegion const&, QRegion*, QRegion*, KWin::RenderLoop*, QMatrix4x4 const&) (this=this@entry=0x55e9747823c0, mask=mask@entry=0x7ffd2f6b78b4, damage=..., repaint=..., updateRegion=updateRegion@entry=0x7ffd2f6b78b8, validRegion=validRegion@entry=0x7ffd2f6b78c0, renderLoop=<optimized out>, projection=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/scene.cpp:206 #4 0x00007fa0263f31f1 in KWin::SceneOpenGL::paint(int, QRegion const&, QList<KWin::Toplevel*> const&, KWin::RenderLoop*) (this= 0x55e9747823c0, screenId=0, damage=<optimized out>, toplevels=<optimized out>, renderLoop=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/plugins/scenes/opengl/scene_opengl.cpp:696 #5 0x00007fa03066d388 in KWin::Compositor::composite(KWin::RenderLoop*) (this=<optimized out>, renderLoop=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/composite.cpp:628 #6 0x00007fa02eba83bb in () at /usr/lib/libQt5Core.so.5 #7 0x00007fa03061cb72 in KWin::RenderLoop::frameRequested(KWin::RenderLoop*) (this=<optimized out>, _t1=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/build/src/kwin_autogen/EWIEGA46WW/moc_renderloop.cpp:206 #8 0x00007fa03072b703 in KWin::RenderLoopPrivate::dispatch() (this=0x55e974f66ed0) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/renderloop.cpp:162 #9 0x00007fa02eba83bb in () at /usr/lib/libQt5Core.so.5 #10 0x00007fa02ebac38b in QTimer::timeout(QTimer::QPrivateSignal) () at /usr/lib/libQt5Core.so.5 #11 0x00007fa02eb9e07f in QObject::event(QEvent*) () at /usr/lib/libQt5Core.so.5 #12 0x00007fa02faffd62 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/libQt5Widgets.so.5 #13 0x00007fa02eb7100a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib/libQt5Core.so.5 #14 0x00007fa02ebc8fad in QTimerInfoList::activateTimers() () at /usr/lib/libQt5Core.so.5 #15 0x00007fa02ebc7489 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5 #16 0x000055e972c2d47e in QUnixEventDispatcherQPA::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () #17 0x00007fa02eb6f97c in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5 #18 0x00007fa02eb77ee4 in QCoreApplication::exec() () at /usr/lib/libQt5Core.so.5 #19 0x000055e972bc73c6 in main(int, char**) (argc=<optimized out>, argv=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/main_wayland.cpp:719 (gdb) EXPECTED RESULT kwin_wayland doesn't crash. SOFTWARE/OS VERSIONS Operating System: Arch Linux KDE Plasma Version: 5.21.90 KDE Frameworks Version: 5.82.0 Qt Version: 5.15.2 Kernel Version: 5.12.9-arch1-1 (64-bit) Graphics Platform: Wayland Processors: 12 × AMD Ryzen 5 PRO 4650U with Radeon Graphics Memory: 14.9 GiB of RAM Graphics Processor: AMD RENOIR ADDITIONAL INFORMATION Speculation: I know Vlad has been doing a lot of work for 5.22 in the area of effects and scene graph changes, I assume a bug has crept in from this work at some point. 5.21 never had this problem for me. If I get further traces with different error locations related to effects/animations, I'll post them with this ticket. I've built the current master with debug in case the issue from 5.21.90 had been fixed upstream, but it hasn't (and I needed a backtrack), which is why I'm reporting this against current master.
Another related backtrace. Similar cause, the line in question is "entry.key()->addLayerRepaint(entry->second);" This time the trigger for the crash was a code completion popup in PyCharm. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000300000001 in ?? () [Current thread is 1 (Thread 0x7f50f658b840 (LWP 12571))] (gdb) bt #0 0x0000000300000001 in () #1 0x00007f50fd4174c3 in KWin::AnimationEffect::postPaintScreen() (this=0x55fa962bccb0) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/libkwineffects/kwinanimationeffect.cpp:713 #2 0x00007f50fd6f8351 in KWin::EffectsHandlerImpl::postPaintScreen() (this=0x55fa95f75110) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/effects.cpp:419 #3 0x00007f50fd78f0a7 in KWin::Scene::paintScreen(int*, QRegion const&, QRegion const&, QRegion*, QRegion*, KWin::RenderLoop*, QMatrix4x4 const&) (this=this@entry=0x55fa9552d240, mask=mask@entry=0x7fff6b332894, damage=..., repaint=..., updateRegion=updateRegion@entry=0x7fff6b332898, validRegion=validRegion@entry=0x7fff6b3328a0, renderLoop=<optimized out>, projection=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/scene.cpp:206 #4 0x00007f50f466b1f1 in KWin::SceneOpenGL::paint(int, QRegion const&, QList<KWin::Toplevel*> const&, KWin::RenderLoop*) (this= 0x55fa9552d240, screenId=0, damage=<optimized out>, toplevels=<optimized out>, renderLoop=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/plugins/scenes/opengl/scene_opengl.cpp:696 #5 0x00007f50fd6bd388 in KWin::Compositor::composite(KWin::RenderLoop*) (this=<optimized out>, renderLoop=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/composite.cpp:628 #6 0x00007f50fbbf83bb in () at /usr/lib/libQt5Core.so.5 #7 0x00007f50fd66cb72 in KWin::RenderLoop::frameRequested(KWin::RenderLoop*) (this=<optimized out>, _t1=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/build/src/kwin_autogen/EWIEGA46WW/moc_renderloop.cpp:206 #8 0x00007f50fd77b703 in KWin::RenderLoopPrivate::dispatch() (this=0x55fa95e12a90) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/renderloop.cpp:162 #9 0x00007f50fbbf83bb in () at /usr/lib/libQt5Core.so.5 #10 0x00007f50fbbfc38b in QTimer::timeout(QTimer::QPrivateSignal) () at /usr/lib/libQt5Core.so.5 #11 0x00007f50fbbee07f in QObject::event(QEvent*) () at /usr/lib/libQt5Core.so.5 #12 0x00007f50fcb4fd62 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/libQt5Widgets.so.5 #13 0x00007f50fbbc100a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib/libQt5Core.so.5 #14 0x00007f50fbc18fad in QTimerInfoList::activateTimers() () at /usr/lib/libQt5Core.so.5 #15 0x00007f50fbc17489 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5 #16 0x000055fa9400747e in QUnixEventDispatcherQPA::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () #17 0x00007f50fbbbf97c in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5 #18 0x00007f50fbbc7ee4 in QCoreApplication::exec() () at /usr/lib/libQt5Core.so.5 #19 0x000055fa93fa13c6 in main(int, char**) (argc=<optimized out>, argv=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/main_wayland.cpp:719 (gdb)
I'm not sure if it's related - but sometimes kwin_wayland won't crash - but instead all my xwayland clients die (I guess xwayland itself crashes), and I'm not able to start an xwayland client again without logging out of my session and back in. Perhaps if the xwayland server crashes midway through a render xwayland related animations lose their EffectWindow? I don't know how that works internally, I'm merely speculating - and xwayland dying may be unrelated to this ticket.
As is, I don't see how kwin can crash in AnimationEffect::postPaintScreen(). EffectWindow should always be valid. Unless xwayland crashes for some reason and no EffectsHandler::windowDeleted signal is emitted.
What can I do to help provide more details/debug info that might help with this?
Annoyingly, I'm still seeing weird behaviour after time with xwayland and sometimes kwin crashing and restarting the session - however, I no longer get coredumps for kwin, and I've no idea why that is. The only evidence I have that a segfault occurred is in dmesg. I've set a huge max size on my core dumps, so no processes should be excluded from dumping, I'm not sure what else to do.
Ok, the issue with core dumps was that my HDD was almost full. I collected another trace today with the same problem, this time kwin was built from the 5.22 branch - commit: https://invent.kde.org/plasma/kwin/-/commit/9169445b9bbb91f27577baf7ecb247c2e31d5b0a Core was generated by `kwin_wayland --wayland_fd 4 --xwayland /usr/lib/startplasma-waylandsession'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f200b5e4c4d in KWin::AnimationEffect::postPaintScreen (this=0x55b1bd381e60) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/libkwineffects/kwinanimationeffect.cpp:713 warning: Source file is more recent than executable. b713 entry.key()->addLayerRepaint(entry->second); [Current thread is 1 (Thread 0x7f2003fb5240 (LWP 1757))] (gdb) bt #0 0x00007f200b5e4c4d in KWin::AnimationEffect::postPaintScreen() (this=0x55b1bd381e60) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/libkwineffects/kwinanimationeffect.cpp:713 #1 0x00007f200b8c12c1 in KWin::EffectsHandlerImpl::postPaintScreen() (this=0x55b1bcf2b6d0) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/effects.cpp:419 #2 0x00007f200b95843f in KWin::Scene::paintScreen(int*, QRegion const&, QRegion const&, QRegion*, QRegion*, KWin::RenderLoop*, QMatrix4x4 const&) (this=this@entry=0x55b1bc6bf580, mask=mask@entry=0x7ffc6f25b034, damage=..., repaint=..., updateRegion=updateRegion@entry=0x7ffc6f25b038, validRegion=validRegion@entry=0x7ffc6f25b040, renderLoop=<optimized out>, projection=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/scene.cpp:209 #3 0x00007f2001510981 in KWin::SceneOpenGL::paint(int, QRegion const&, QList<KWin::Toplevel*> const&, KWin::RenderLoop*) (this= 0x55b1bc6bf580, screenId=0, damage=<optimized out>, toplevels=<optimized out>, renderLoop=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/plugins/scenes/opengl/scene_opengl.cpp:695 #4 0x00007f200b8862f8 in KWin::Compositor::composite(KWin::RenderLoop*) (this=<optimized out>, renderLoop=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/composite.cpp:628 #5 0x00007f2009da13bb in () at /usr/lib/libQt5Core.so.5 #6 0x00007f200b8359d2 in KWin::RenderLoop::frameRequested(KWin::RenderLoop*) (this=<optimized out>, _t1=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/build/src/kwin_autogen/EWIEGA46WW/moc_renderloop.cpp:206 #7 0x00007f200b943453 in KWin::RenderLoopPrivate::dispatch() (this=0x55b1bc63f910) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/renderloop.cpp:162 #8 0x00007f2009da13bb in () at /usr/lib/libQt5Core.so.5 #9 0x00007f2009da538b in QTimer::timeout(QTimer::QPrivateSignal) () at /usr/lib/libQt5Core.so.5 #10 0x00007f2009d9707f in QObject::event(QEvent*) () at /usr/lib/libQt5Core.so.5 #11 0x00007f200ab46d62 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/libQt5Widgets.so.5 #12 0x00007f2009d6a00a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib/libQt5Core.so.5 #13 0x00007f2009dc1fad in QTimerInfoList::activateTimers() () at /usr/lib/libQt5Core.so.5 #14 0x00007f2009dc0489 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5 #15 0x000055b1bbcfc86e in QUnixEventDispatcherQPA::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () #16 0x00007f2009d6897c in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/libQt5Core.so.5 #17 0x00007f2009d70ee4 in QCoreApplication::exec() () at /usr/lib/libQt5Core.so.5 #18 0x000055b1bbc96406 in main(int, char**) (argc=<optimized out>, argv=<optimized out>) at /home/lewis/Documents/kwin-arch/kwin/src/kwin/src/main_wayland.cpp:727
Created attachment 139625 [details] Backtrace Have the same issue. As in Lewis' case it happens randomly when using CLion and any popup window is diplayed (context menu, code completion popup, etc). I've also had it happen once when opening a new window in Chromium. I am also attaching a more detailed backtrace. OS: Fedora 34 Frameworks: 5.83.0-1 Plasma: 5.22.1-1
Possibly related issue: https://bugs.kde.org/show_bug.cgi?id=440094
*** Bug 440094 has been marked as a duplicate of this bug. ***
Been going through it, it's crashing on `entry.key()->addLayerRepaint(entry->second);` kwinanimationeffect.cpp:668 in master, matched from Plasma/5.22 branch. I am unable to reproduce or see why it's crashing though. I tried Clion and it worked seemingly fine, I guess I'd have to use it for longer? @Lewis, you mentioned that you were using 5.21.90, maybe you could try 5.22.90? Could you mention which effects you have enabled?
It crashes seemingly randomly. Sometimes it takes just a few minutes, sometimes it takes several hours. For completion, I've been having the issue on all plasma versions 5.22.x. My desktop effects on Fedora 34 with Plasma 5.22.4 are (as reported by `qdbus-qt5 org.kde.KWin /Effects loadedEffects`: kwin4_effect_windowaperture kwin4_effect_translucency kwin4_effect_squash kwin4_effect_sessionquit zoom kwin4_effect_morphingpopups kwin4_effect_maximize kwin4_effect_logout kwin4_effect_login kwin4_effect_fullscreen kwin4_effect_frozenapp kwin4_effect_fadingpopups kwin4_effect_fade kwin4_effect_dimscreen kwin4_effect_dialogparent slidingpopups slide screenshot desktopgrid colorpicker wobblywindows presentwindows highlightwindow blur contrast startupfeedback kscreen
Created attachment 141776 [details] Backtrace on 5.22.4 I'm also attaching a new backtrace I got just today. Hopefully it helps too.
(In reply to Aleix Pol from comment #10) > @Lewis, you mentioned that you were using 5.21.90, maybe you could try > 5.22.90? > > Could you mention which effects you have enabled? Exactly the same symptoms for me with Pycharm and CLion. After some time (sometimes half an hour, sometimes several hours) triggering the intellisense popup will eventually trigger this error. I haven't seen this crash with any other xwayland applications however. I've been running kwin master (updated once or twice a week) more or less since I reported this bug, however since 5.22.90 was tagged I've changed to Plasma/5.23. My effects: $ qdbus-qt5 org.kde.KWin /Effects loadedEffects kwin4_effect_windowaperture kwin4_effect_squash zoom kwin4_effect_sessionquit kwin4_effect_morphingpopups kwin4_effect_maximize kwin4_effect_logout kwin4_effect_login kwin4_effect_fullscreen kwin4_effect_frozenapp kwin4_effect_fadingpopups kwin4_effect_fade kwin4_effect_dialogparent slidingpopups slide screenshot desktopgrid colorpicker presentwindows highlightwindow blur contrast startupfeedback screentransform kscreen
Created attachment 141780 [details] Backtrace from 5.22.90 This is a backtrace from Plasma 5.22.90 from the Arch Linux kde-unstable repo, with the KWin package rebuilt from source to provide debug symbols. The crash happened randomly while using Jetbrains IntelliJ IDEA. My effects setup is mostly default, I only enabled the new Overview effect. Effect list: - kwin4_effect_windowaperture - kwin4_effect_squash - kwin4_effect_sessionquit - kwin4_effect_morphingpopups - kwin4_effect_maximize - kwin4_effect_logout - kwin4_effect_login - kwin4_effect_fullscreen - kwin4_effect_frozenapp - kwin4_effect_fadingpopups - kwin4_effect_fade - kwin4_effect_dialogparent - zoom - slidingpopups - slide - screenshot - desktopgrid - colorpicker - presentwindows - overview - highlightwindow - blur - contrast - startupfeedback - screenedge - screentransform - kscreen I analysed the core dump with GDB - the crash happened in kwin4_effect_fade. Redacted log from the gdb session (at the innermost stack frame): (gdb) set print pretty on (gdb) set print object on (gdb) p *this $1 = (KWin::ScriptedEffect) { ... members of KWin::ScriptedEffect: ... m_effectName = { d = 0x55f534b8f590 }, ... } (gdb) p this->m_effectName->d $3 = (QString::Data *) 0x55f534b8f590 (gdb) set $data_start = (char*)this->m_effectName->d + this->m_effectName->d->offset (gdb) set $len = this->m_effectName->d->size (gdb) set $i = 0 (gdb) while $i < 2*$len >printf "%c", *($data_start + $i) >set $i = $i + 2 >end kwin4_effect_fade When the crash happens again, I'll check if it was the same effect. If so, I'll disable it and test some more. Some information about that EffectWindow pointer: (gdb) p *(entry.i->key) $45 = { <QObject> = {<No data fields>}, members of KWin::EffectWindow: static staticMetaObject = { d = { superdata = { direct = 0x7f9fae97c740 <QObject::staticMetaObject> }, stringdata = 0x7f9fafda5ae0 <qt_meta_stringdata_KWin__EffectWindow>, data = 0x7f9fafda55a0 <qt_meta_data_KWin__EffectWindow>, static_metacall = 0x7f9fafd8ced0 <KWin::EffectWindow::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)>, relatedMetaObjects = 0x0, extradata = 0x0 } }, d = { d = 0x20a00000282 } } (gdb) p &(entry.i->key->d) $33 = (QScopedPointer<KWin::EffectWindow::Private, QScopedPointerDeleter<KWin::EffectWindow::Private> > *) 0x55f53571a4d0 (gdb) p entry.i->key->d.d $34 = (KWin::EffectWindow::Private *) 0x20a00000282 (gdb) p *(entry.i->key->d.d) Cannot access memory at address 0x20a00000282
Additional information about that EffectWindow, as I forgot it's actually an instance of EffectWindowImpl and didn't dump members of the latter. Both the toplevel and sw pointers lead to inaccessible memory addresses. (gdb) p *(KWin::EffectWindowImpl *)entry.i->key ... members of KWin::EffectWindowImpl: static staticMetaObject = { d = { superdata = { direct = 0x7f9fafdb2900 <KWin::EffectWindow::staticMetaObject> }, stringdata = 0x7f9fb01ae880 <qt_meta_stringdata_KWin__EffectWindowImpl>, data = 0x7f9fb01ae840 <qt_meta_data_KWin__EffectWindowImpl>, static_metacall = 0x7f9faff9e1a0 <KWin::EffectWindowImpl::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)>, relatedMetaObjects = 0x0, extradata = 0x0 } }, toplevel = 0x2390000043d, sw = 0x20a00000282, dataMap = { { d = 0x2390000043d, e = 0x2390000043d } }, managed = false, waylandClient = false, x11Client = false ...
Okay, it crashed again. The backtrace looks the same as the last one. The crash is again in kwin4_effect_fade: (gdb) p this $3 = (KWin::ScriptedEffect * const) 0x55f805fa1440 (gdb) set $data_start = (char*)this->m_effectName->d + this->m_effectName->d->offset (gdb) set $len = this->m_effectName->d->size (gdb) set $i = 0 (gdb) while $i < 2*$len >printf "%c", *($data_start + $i) >set $i = $i + 2 >end kwin4_effect_fade I'll test the Glide window open/close animation now, as it's a native one and may behave differently. I suppose there's no point in testing Scale as it's a scripted effect and it will probably have the same problem Fade has.
This time, however, it seems EffectWindow was not completely destroyed. Unlike in the last crash, the data inside EffectWindow is not complete garbage. For instance, the addresses in toplevel and EffectWindow::Private pointers look more sensible, dataMap contains valid pointers to QHashData::shared_null, and x11Client is true. (gdb) p *(EffectWindowImpl*)(entry.i->key) $10 = { <KWin::EffectWindow> = { <QObject> = {<No data fields>}, members of KWin::EffectWindow: static staticMetaObject = { ... }, d = { d = 0x55f806b0fa80 } }, members of KWin::EffectWindowImpl: static staticMetaObject = { ... }, toplevel = 0x55f80706bcf0, sw = 0x0, dataMap = { { d = 0x7fb16aa6e5c0 <QHashData::shared_null>, e = 0x7fb16aa6e5c0 <QHashData::shared_null> } }, managed = true, waylandClient = false, x11Client = true } Inspection of toplevel: (gdb) p *(Toplevel*)$ew.toplevel { <QObject> = {<No data fields>}, members of KWin::Toplevel: static staticMetaObject = { ... }, m_frameGeometry = { x1 = 640, y1 = 547, x2 = 1030, y2 = 654 }, m_clientGeometry = { x1 = 640, y1 = 547, x2 = 1030, y2 = 654 }, m_bufferGeometry = { x1 = 640, y1 = 547, x2 = 1030, y2 = 654 }, m_visual = 59, bit_depth = 24, info = 0x55f806a48670, ready_for_painting = true, m_internalFBO = { value = 0x0, d = 0x0 }, m_internalImage = <incomplete type>, m_internalId = { data1 = 3760014763, data2 = 36097, data3 = 19896, data4 = "\246\035\236h\342\216t\223" }, m_client = { m_window = 18875715, m_destroy = false, m_logicGeometry = { x1 = 0, y1 = 0, x2 = -1, y2 = -1 } }, is_shape = false, effect_window = 0x0, m_shadow = 0x0, resource_name = { d = 0x55f806b26260 }, resource_class = { d = 0x55f805f58430 }, m_clientMachine = 0x55f806442ca0, m_wmClientLeader = 18874376, opaque_region = { d = 0x7fb16b3071e0 <QRegion::shared_empty> }, m_shapeRegion = { d = 0x55f806cc0a30 }, m_shapeRegionIsValid = true, m_output = 0x55f805b4d020, m_skipCloseAnimation = false, m_surfaceId = 0, m_surface = 0x0, m_screenScale = 1, m_opacity = 1, m_stackingOrder = 5 }
I didn't get a crash so far, but I revisited the previous coredump, as it seems weird that the EffectWindow was at least a bit intact this time. Disassembly of the current instruction and the one before it: 0x7fb16c02e4c2 <...postPaintScreenEv+1682> mov (%rdi),%rax > 0x7fb16c02e4c5 <...postPaintScreenEv+1685> call *0x90(%rax) The RDI register contained the address of the EffectWindow, same as in entry.key EffectWindow::addLayerRepaint is virtual, so the address of the actual function EffectWindowImpl::addLayerRepaint has to be read from the vtable. (In this case) the first few bytes of EffectWindow should contain a vtable ptr, which is then read into RAX by 'mov (%rdi),%rax'. The EffectWindowImpl::addLayerRepaint function pointer is then expected to exist at RAX + 0x90. It looks like the vtable pointer points to an invalid location: (gdb) p/x $rax $23 = 0x55f806f2e (gdb) x/x $rax 0x55f806f2e: Cannot access memory at address 0x55f806f2e And vtable+0x90 is also unreadable: (gdb) x/x $rax+0x90 0x55f806fbe: Cannot access memory at address 0x55f806fbe In the coredump from yesterday, the situation is the same: 0x7f9fafd984c2 <...postPaintScreenEv+1682> mov (%rdi),%rax > 0x7f9fafd984c5 <...postPaintScreenEv+1685> call *0x90(%rax) (gdb) p/x $rax $1 = 0x55f06b86305a (gdb) x/x $rax 0x55f06b86305a: Cannot access memory at address 0x55f06b86305a (gdb) x/x $rax+0x90 0x55f06b8630ea: Cannot access memory at address 0x55f06b8630ea Yesterday the EffectWindow was destroyed completely, and the member variables looked pretty much random. In today's crash it seems like the EffectWindow was in the process of getting deleted, as some of the member variables looked intact. The vtable however already got corrupted, which made the call result in a segfault.
Created attachment 141801 [details] Code completion popup window IDs in the debug console Update: It appears that the crashes are highly correlated with the amount of popup windows opened in total during a session of coding. Each new popup causes an increment in the hexadecimal window ID and the window's caption (win1, win2, win3, etc.) seen in the KWin debug console. For me, the crash happens somewhere around win300. I reproduced the crash three times in a row by typing a name of some object then repeatedly mashing dot and backspace keys, so that the autocompletion popups flash rapidly. Because the window IDs increase so quickly and the problem happens around a similar amount of open popup windows, could this be some sort of overflow problem? Maybe this could explain the weird corruption described in my previous two comments, where some fields of the EffectWindow even looked sensible, but the vtable located at the beginning of memory allocated for the EffectWindow was completely ruined.
This short script can be used to trigger a crash: #!/bin/bash win_id=$(sort <(xdotool search --name "Content window") <(xdotool search --class "jetbrains-idea") | uniq -d) sleep 10 while : do xdotool key --window $win_id period sleep 0.1 xdotool key --window $win_id BackSpace done After opening a project in IDEA and typing something like 'System' in a line, run the script and switch back to the IntelliJ IDEA window. For me, with automated keymashing it takes 420-450 popups to crash the Wayland session. On X11, it's completely stable even though the window identifiers also rapidly increase. I terminated the test after around 1000 popups.
This time KWin didn't crash completely, but entered a weird state. Plasmashell stopped working, but most of the applications were still running, including IDEA. I stopped the test, because if it continued it would most likely result in a crash. It is not possible to restart plasmashell right now, nor launch any new program. These errors appeared in wayland-session.log when plasmashell crashed: file descriptor expected, object (308), message get_icon(7h) error in client communication (pid 496515) QMetaProperty::read: Unable to handle unregistered datatype 'KWin::SessionState' for property 'KWin::EffectsHandlerImpl::sessionState' wl_display@1: error 1: invalid arguments for org_kde_plasma_window@308.get_icon This appeared in terminal when trying to restart plasmashell: wl_display@1: error 1: invalid arguments for wl_shm@81.create_pool The Wayland connection experienced a fatal error: Invalid argument And around the same time, this appeared in wayland-session.log: file descriptor expected, object (81), message create_pool(nhi) error in client communication (pid 498931) QMetaProperty::read: Unable to handle unregistered datatype 'KWin::SessionState' for property 'KWin::EffectsHandlerImpl::sessionState' While typing this message, Firefox and a bunch of other things have crashed. The following got logged to wayland-session.log: file descriptor expected, object (30), message add(huuuuu) error in client communication (pid 499807) wl_display@1: error 1: invalid arguments for zwp_linux_buffer_params_v1@30.add [266 00:35:02.451499] [glfw error 65544]: Wayland: fatal display error: Invalid argument file descriptor expected, object (30), message add(huuuuu) error in client communication (pid 499827) wl_display@1: error 1: invalid arguments for zwp_linux_buffer_params_v1@30.add [266 00:35:06.638735] [glfw error 65544]: Wayland: fatal display error: Invalid argument file descriptor expected, object (30), message add(huuuuu) error in client communication (pid 499851) file descriptor expected, object (30), message add(huuuuu) error in client communication (pid 499907) file descriptor expected, object (30), message add(huuuuu) error in client communication (pid 499954) wl_display@1: error 1: invalid arguments for zwp_linux_buffer_params_v1@30.add [266 00:35:18.863341] [glfw error 65544]: Wayland: fatal display error: Invalid argument file descriptor expected, object (63), message add(huuuuu) error in client communication (pid 499370) file descriptor expected, object (7), message create_pool(nhi) error in client communication (pid 500198) file descriptor expected, object (7), message create_pool(nhi) error in client communication (pid 500233) kwin_core: Could not trust "/usr/bin/plasma-browser-integration-host" sha "" "" This is followed by 6649 repetitions of this line: failed to accept: Too many open files
Created attachment 141806 [details] Screencast demonstrating the rising amount of open files by KWin Regarding the previous error about too many open files: This screencast shows the count of open files during the popup window test. It is steadily rising, and does not decrease after stopping the test, nor after quitting IntelliJ IDEA completely. If the test gets stopped at, say, 1000 open fds, after resuming it a crash happens pretty quickly. The screencast ends when KWin crashes, which happens after around 1400 file descriptors get opened. In the following message I will attach a .tar.gz archive containing the output of lsof -p $KWIN_PID taken every second since starting the test. The offending descriptors are mostly pipes.
Created attachment 141807 [details] Outputs of lsof -p $KWIN_PID taken every second, from the test start to KWin crash Additional logs for #22
Created attachment 141808 [details] Timelapse of lsof count, after increasing the open fd limit to 8192 This shows the test running for much longer (5 minutes), after max file descriptors for KWin were increased. It could run for a bit longer as fds were not exhausted yet. The interesting thing is how again plasmashell crashed early. This proves that things just go crazy after the file descriptor limit is exhausted, and there is no direct problem in AnimationEffect. KWin crashes in that place, but the real problem is the file descriptor leak and all kinds of chaotic behaviour it can cause. Are there any logs I can provide to help debug this issue?
This is pretty weird. I just tried running IDEA in KWin only, started from a TTY. The file descriptor amount does not skyrocket, and everything is stable. It also works fine in nested KWin inside a Plasma session. Seems like some other Plasma component is causing KWin to leak file descriptors. I tried killing various combinations of Plasma processes, and it looks like killing plasmashell, xdg-desktop-portal, and xdg-desktop-portal-kde helps. KWin does not leak file descriptors anymore, and it does not crash. I guess plasmashell and xdg-desktop-portal-kde could be trying to read some information about the popup window, but failure might not be handled correctly and new pipe descriptors get created over and over during each retry attempt until KWin crashes.
Firstly can I just say Ash, thanks for your in depth analysis! It's been super educational for me to see your process and how you use GDB to debug the issue. Nice find with the fds, I can confirm that that is the same for me. Watching the fd count while opening/closing popups in pycharm shows that fds are increasing until they reach ~the soft limit of 1024 on my system before plasma goes crazy/kwin crashes. I've just set the soft/hard fd limit to a million and will see if that prevents pycharm/clion crashing kwin as regularly. I'll post my findings after some testing. (Less than ideal workaround for sure, but if that temporarily resolves this problem until it can be fixed - I can live with that!) > wl_display@1: error 1: invalid arguments for > zwp_linux_buffer_params_v1@30.add > [266 00:35:18.863341] [glfw error 65544]: Wayland: fatal display error: > Invalid argument Looks exactly like the same symptoms I had for https://bugs.kde.org/show_bug.cgi?id=438381, so I'd go so far as to say that issue is related to this issue.
The file descriptor flood comes from PlasmaWindow::Private::iconChangedCallback (kwayland/src/client/plasmawindowmanagement.cpp:655, master branch). This can be easily verified by catching pipe2 in the plasmashell process - the backtrace will show this function. Occurences of pipe2 in the strace output for plasmashell correlate with the amount of KWin's file descriptors. After calling the org_kde_plasma_window_get_icon interface, the FDs get copied over to KWin's process using Wayland's proxy magic, which seems to be using the sendmsg and recvmsg syscalls with SCM_RIGHTS ancillary messages, which enable passing open descriptors between processes. The function which gets called after marshaling the FDs by the Wayland's proxy thingy is PlasmaWindowInterfacePrivate::org_kde_plasma_window_get_icon (kwayland-server/src/server/plasmawindowmanagement_interface.cpp:437) Things get really odd now. Both of those functions look fine to me - it doesn't look like either the server side or the client side would leave a descriptor open after finishing its work. I walked through both the icon read and write in GDB, and everything was seemingly handled correctly for the cases I observed. Despite no errors, many of these supposedly closed descriptors did still appear in /proc/$KWIN_PID/fd. I have no idea what could be going on. For a test, I replaced org_kde_plasma_window_get_icon with a function that only calls close(fd). Somehow, the KWin process still had rapidly increasing amounts of FDs, even though this time it was just supposed to close them ASAP.
Ok, so setting the hard/soft fd limit to a high value appears to be an effective solution for me for now until this issue is resolved. Using Ash's xdotool script modified for pycharm, I got the kwin fd count up over 50,000 (took ~35 minutes) and no kwin crash, plasmashell is also still working fine before I stopped the xdotool script. I measured the fd's open by kwin using watch -n 1 "sudo ls -la /proc/`pidof kwin_wayland`/fd | wc -l" $ ulimit -Sn 1048576 $ ulimit -Hn 1048576 $ sudo ls -la /proc/`pidof kwin_wayland`/fd | wc -l 50074 For posterity sake, I changed my limits by editing /etc/security/limits.conf and adding the following lines to the end and rebooting. lewis soft nofile 1048576 lewis hard nofile 1048576
Created attachment 141874 [details] A C program that reproduces the bug I figured out which behaviour of the Jetbrains IDEs causes the bug to occur and I wrote a program that replicates it, so the bug can be reproduced reliably and without the need to install any software. This program will repeatedly create a specified amount of windows, waiting some time before destroying the current window and creating the next one. After that, it will wait a specified amount of time and repeat the whole thing. The amount of windows and the time constants are optional program arguments. The default settings are supposed to replicate a realistic scenario that could happen during the IDE usage. They are as follows: - (arg1) burst size: 2 - (arg2) window lifetime (ms): 16 - (arg3) post-burst wait time (ms): 3000 The short burst length of 2 was motivated by observing the popup windows flicker without any content sometimes, before the actual popup appeared. On my machine, this burst size and window lifetime causes the KWin's file descriptor count to increase by 3 each repetition, same as when a popup is opened normally in a Jetbrains IDE. Of course, if such settings do not cause bug reproduction in your environment, start by increasing the burst size. If the bug still isn't reproduced with a high amount of created windows, start decreasing the window lifetime. Some observations: 1. If the window lifetime is large enough to reliably observe the window's icon appearing on the taskbar, this bug does not occur. On my machine, a window lifetime of 60ms is the minimum one that does not cause KWin's file descriptor amount to notably rise at any point in time, even with a burst size of 99999 windows. 2. Window lifetimes less or equal to 4ms will cause KWin to lag, up to the point of a complete freeze (which resolves during the wait time). Of course this is a terribly unrealistic scenario and no application would ever do that, but watch out when testing. Compilation: $ gcc reproduce_438097.c -lX11 -o reproduce_438097 Usage $ ./reproduce_438097 [burst size] [window lifetime] [post-burst wait time]
> The short burst length of 2 was motivated by observing the popup windows > flicker without any content sometimes, before the actual popup appeared. Post scriptum: in fact, when GDB is attached to plasmashell in non-stop mode and a breakpoint is set in PlasmaWindow::Private::iconChangedCallback with commands specified so that it does not pause execution, for instance: break PlasmaWindow::Private::iconChangedCallback commands silent printf "hello\n" end , we can observe the string "hello" appearing twice when a single popup window appears, and also twice when a single popup disappears, suggesting there was another window that appeared for several miliseconds before getting destroyed.
(In reply to Ash Blake from comment #30) > break PlasmaWindow::Private::iconChangedCallback > commands > silent > printf "hello\n" > end Oops, forgot a continue there. break PlasmaWindow::Private::iconChangedCallback commands silent printf "hello\n" cont end
Created attachment 141875 [details] A Python script that parses strace output for FD information I wrote a script to parse strace output and abbreviate it, displaying only close() calls and recvmsg() calls, but filtered by cmsg_type=SCM_RIGHTS and abbreviated so that only the cmsg_data part containing newly received file descriptors is visible. After terminating the script with Ctrl+C, it will display all the file descriptors that have been received by the KWin process, but not closed. This is the output in a bug-free situation (the reproducing program was ran with a burst size of 20 and window lifetime of 100ms, which does not trigger the bug) $ sudo strace -e trace=recvmsg,close -p `pidof kwin_wayland` 2>&1 | python process_strace.py close(5670) = 0 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670],...) = 84 recvmsg(48,...,cmsg_data=[5670],...) = 8 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 recvmsg(44,...,cmsg_data=[5670],...) = 152 close(5670) = 0 recvmsg(48,...,cmsg_data=[5670, 5675],...) = 16 ^CThese descriptors were left open: 5670, 5675 The mentioned descriptors have been however closed shortly after. This is the output with also 20 windows, but a lifetime of 16ms: $ sudo strace -e trace=recvmsg,close -p `pidof kwin_wayland` 2>&1 | python process_strace.py recvmsg(44,...,cmsg_data=[5696],...) = 152 close(5696) = 0 recvmsg(48,...,cmsg_data=[5696, 5697],...) = 16 recvmsg(44,...,cmsg_data=[5696],...) = 152 close(5696) = 0 recvmsg(48,...,cmsg_data=[5696, 5697],...) = 16 recvmsg(44,...,cmsg_data=[5700],...) = 152 close(5700) = 0 recvmsg(48,...,cmsg_data=[5700, 5701],...) = 16 recvmsg(44,...,cmsg_data=[5700],...) = 152 close(5700) = 0 recvmsg(44,...,cmsg_data=[5700],...) = 152 close(5700) = 0 recvmsg(48,...,cmsg_data=[5700, 5701],...) = 16 recvmsg(44,...,cmsg_data=[5702],...) = 152 close(5702) = 0 recvmsg(44,...,cmsg_data=[5702],...) = 152 close(5702) = 0 recvmsg(48,...,cmsg_data=[5702, 5703],...) = 16 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 close(5704) = 0 recvmsg(44,...,cmsg_data=[5704],...) = 152 recvmsg(48,...,cmsg_data=[5705, 5706],...) = 16 close(5704) = 0 ^CThese descriptors were left open: 5696, 5697, 5700, 5701, 5702, 5703, 5705, 5706 All of these descriptors were open after waiting a few seconds and checking, and are still open while I'm typing this.
Thanks for the great analysis, Ash! This is definitely very strange. At quick glance, I don't see how file descriptors can be leaked in kwin.
(In reply to Vlad Zahorodnii from comment #33) > Thanks for the great analysis, Ash! This is definitely very strange. At > quick glance, I don't see how file descriptors can be leaked in kwin. Well, it doesn't seem to be KWin that leaks them. It happens along the way in libwayland. This one is perhaps more useful - same test as above, but only one window gets created with a 16ms lifetime. It appears this is actually enough to induce the bug, and patterns are more visible. close(4071) = 0 recvmsg(44,...,cmsg_data=[4071],...) = 152 close(4071) = 0 recvmsg(44,...,cmsg_data=[4034],...) = 152 close(4034) = 0 recvmsg(48,...,cmsg_data=[4034, 4035],...) = 16 recvmsg(44,...,cmsg_data=[4071],...) = 152 close(4071) = 0 recvmsg(48,...,cmsg_data=[4071, 4079],...) = 16 recvmsg(44,...,cmsg_data=[4150],...) = 152 close(4150) = 0 recvmsg(48,...,cmsg_data=[4150, 4608],...) = 16 recvmsg(44,...,cmsg_data=[4610],...) = 152 close(4610) = 0 recvmsg(48,...,cmsg_data=[4610, 4611],...) = 16 recvmsg(44,...,cmsg_data=[4612],...) = 152 close(4612) = 0 recvmsg(48,...,cmsg_data=[4612, 4615],...) = 16 recvmsg(44,...,cmsg_data=[4618],...) = 152 close(4618) = 0 recvmsg(48,...,cmsg_data=[4618, 4619],...) = 16 recvmsg(44,...,cmsg_data=[4620],...) = 152 close(4620) = 0 recvmsg(48,...,cmsg_data=[4620, 4621],...) = 16 ^CThese descriptors were left open: 4034, 4035, 4071, 4079, 4150, 4608, 4610, 4611, 4612, 4615, 4618, 4619, 4620, 4621 KWin does close every descriptor it receives. However, sometimes the process does not receive one descriptor, but two - and KWin itself is not aware of the second one, nor is it supposed to. This is the bug-free scenario (100ms lifetime): close(4622) = 0 recvmsg(44,...,cmsg_data=[4622],...) = 152 close(4622) = 0 recvmsg(48,...,cmsg_data=[4622, 5710],...) = 16 recvmsg(44,...,cmsg_data=[4622],...) = 152 close(4622) = 0 recvmsg(48,...,cmsg_data=[4622, 5710],...) = 16 recvmsg(44,...,cmsg_data=[4622],...) = 152 close(4622) = 0 recvmsg(48,...,cmsg_data=[4622, 5710],...) = 16 recvmsg(44,...,cmsg_data=[4622],...) = 152 close(4622) = 0 recvmsg(48,...,cmsg_data=[4622, 5710],...) = 16 recvmsg(44,...,cmsg_data=[4622],...) = 152 close(4622) = 0 recvmsg(48,...,cmsg_data=[4622, 5710],...) = 16 recvmsg(44,...,cmsg_data=[4622],...) = 152 close(4622) = 0 recvmsg(48,...,cmsg_data=[4622, 5710],...) = 16 recvmsg(44,...,cmsg_data=[4622],...) = 152 close(4622) = 0 recvmsg(48,...,cmsg_data=[4622, 5710],...) = 16 ^CThese descriptors were left open: 4622, 5710 (And those two descriptors got closed shortly after) Here the same thing happens, however things happen slowly enough so that the descriptor can get reused, hence the fd amount is not rising. It seems like this is not KWin's fault, but it's Wayland that is doing something weird when marshaling those descriptors. org_kde_plasma_window_get_icon is supposed to accept one file descriptor, and plasmashell does give it exactly one descriptor. Things get messed up along the way, and this is not an issue anywhere in the KDE code. While researching the topic of SCM_RIGHTS, I stumbled upon this link: https://gist.github.com/kentonv/bc7592af98c68ba2738f4436920868dc This part sounds like it could be a problem here: > However, as always, recvmsg() calls on the receiving end don't necessarily > map 1:1 to sendmsg() calls. Messages can be coalesced or split. Sounds like things can get mixed up when messages are getting sent fast enough.
Aha, I know what it is... It's a bug in our handling of inert resources. If a plasma window is deleted, our wrapper code won't call the handler and therefore the file descriptor will be leaked.
Created attachment 141876 [details] Potential solution (untested) If you apply the attached patch, does the issue go away?
(In reply to Vlad Zahorodnii from comment #36) > Created attachment 141876 [details] > Potential solution (untested) > > If you apply the attached patch, does the issue go away? Yes, I just tested it with 1000 windows and there are no leaked descriptors. In the strace output, there are now close() calls for every FD. When using a Jetbrains IDE everything works fine as well.
+1, your patch prevents the uncontrolled growth of fds for me Vlad! Thanks!
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwayland-server/-/merge_requests/299
Git commit ca07c765f50f37bdf5d213234ef6c012af794587 by Vlad Zahorodnii. Committed on 27/09/2021 at 18:02. Pushed by vladz into branch 'master'. Automatically close fds for inert resources If the plasma window resource is inert, the associated get_icon handler won't be invoked. This results in leaked file descriptors. With this change, the corresponding close() function call will be generated to avoid leaking file descriptors and potentially crashing the session due to kwin running out of fds. M +7 -0 src/tools/qtwaylandscanner.cpp https://invent.kde.org/plasma/kwayland-server/commit/ca07c765f50f37bdf5d213234ef6c012af794587
Git commit 208b7bb4ed911913ba76b73370310703e0032e81 by Vlad Zahorodnii. Committed on 27/09/2021 at 19:23. Pushed by vladz into branch 'Plasma/5.23'. Automatically close fds for inert resources If the plasma window resource is inert, the associated get_icon handler won't be invoked. This results in leaked file descriptors. With this change, the corresponding close() function call will be generated to avoid leaking file descriptors and potentially crashing the session due to kwin running out of fds. (cherry picked from commit ca07c765f50f37bdf5d213234ef6c012af794587) M +7 -0 src/tools/qtwaylandscanner.cpp https://invent.kde.org/plasma/kwayland-server/commit/208b7bb4ed911913ba76b73370310703e0032e81
*** Bug 438381 has been marked as a duplicate of this bug. ***