Bug 349921 - KWin periodically hangs, mostly when opening a lots of windows at once
Summary: KWin periodically hangs, mostly when opening a lots of windows at once
Status: RESOLVED DUPLICATE of bug 351839
Alias: None
Product: kwin
Classification: Plasma
Component: compositing (show other bugs)
Version: git master
Platform: Compiled Sources Linux
: NOR crash
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-05 09:41 UTC by Armin K.
Modified: 2016-01-19 11:12 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
qdbus-qt5 org.kde.KWin /KWin supportInformation (5.47 KB, text/plain)
2015-07-05 09:46 UTC, Armin K.
Details
gdb backtrace of the crash (2.69 KB, text/plain)
2015-07-05 10:16 UTC, Armin K.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Armin K. 2015-07-05 09:41:27 UTC
The hang happens periodically, as in I can't figure out what really triggers it.

I have found out that at login when I start quite a number of apps at once, kwin will freeze while trying to display them. I can kill/restart kwin and everything will go back to normal. Same thing happens with  Plasma 5.3.x (x = 0, 1 and 2) and Frameworks/Plasma git master from 2 days ago.

Reproducible: Sometimes

Steps to Reproduce:
1. Log in
2. Click Firefox, Thunderbird and Konsole from the taskbar
3. You can move the cursor, but the skeleton window of one of the apps will remain there and nothing can be done, including clicking on the taskbar and moving the cursor top-left corner doesn't bring the overview or whatever that's called.
4. Switch to tty2, log in as a normal user, killall -9 kwin_x11
5. Switch to Plasma session
6. Alt+Space -> kwin_x11 -> all normal.



I am also using Qt-5.5.0 (I was using 5.4.1 and 5.4.2 too, it happened with those versions too), xorg-server git master from 2 days ago (Updated when I updated Plasma to git master, but same thing happened with 1.17.1 and 1.17.2), Mesa-10.6.1 (again, same thing happened with 10.5.x series), xf86-video-intel-2.99.917 with UXA acceleration, and Linux-4.1.0 (although it also happened with 4.0 and 3.19 series).
Comment 1 Armin K. 2015-07-05 09:46:18 UTC
Created attachment 93481 [details]
qdbus-qt5 org.kde.KWin /KWin supportInformation

Here's also the output of qdbus-qt5 org.kde.KWin /KWin supportInformation
Comment 2 Thomas Lübking 2015-07-05 09:56:47 UTC
can you suspend/resume the compositor (SHIFT+Alt+F12) and does that clear the stage as well?
Comment 3 Armin K. 2015-07-05 09:57:36 UTC
I haven't tried that, but I will next time the hang occours.
Comment 4 Armin K. 2015-07-05 10:05:35 UTC
Ok, logged out, logged back in, tried to start the same apps, and as expected, kwin hung, and suspending the compositor didn't work.

But, I've learned something rather interesting. It doesn't hang, it crashes and it fails to properly restart itself.

I have seen the following in the "ps aux" output:

armin     1802  0.0  0.4 412868 27696 ?        S    11:58   0:00 /usr/bin/kwin_x11 --crashes 1
armin     1803  0.0  0.6 475612 36572 ?        S    11:58   0:00 /usr/lib/libexec/drkonqi -platform xcb -display :0 --appname kwin_x11 --apppath /usr/bin --signal 11 --pid 1606 --appversion 5.3.90 --programname KWin --bugaddress submit@bugs.kde.org --startupid 0

But, given that the WM is unavailable, I can't get to drkonqi at all, and restarting kwin_x11 makes drkonqi disappear.

Also, while in frozen state, the qdbus-qt5 command doesn't work. Or rather, it does, but it outputs the following:

Error: org.freedesktop.DBus.Error.NameHasNoOwner
Name org.kde.KWin is currently not owned by anyone.
Comment 5 Armin K. 2015-07-05 10:16:46 UTC
Created attachment 93483 [details]
gdb backtrace of the crash

Here's also the gdb backtrace when kwin crashes/hangs
Comment 6 Armin K. 2015-07-05 10:25:55 UTC
Also, an interesting observation. I have found out that opening a lots of windows at once isn't the only way to reproduce the problem.

When I log into Plasma and switch to tty2 to attach kwin_x11 to gdb, it instantly crashes and I can get the backtrace (I got the backtrace that way, I'm lucky drkonqi keeps it alive). But, after restarting kwin_x11 after the first crash, the problem isn't reproducible anymore, neither by opening a lots of windows at once nor switching to another VT.

It's possible that kwin_x11 tries to use something that wasn't initialized at that moment, either from Qt or somewhere else.
Comment 7 Thomas Lübking 2015-07-05 11:39:07 UTC
(In reply to Armin K. from comment #4)
> But, given that the WM is unavailable, I can't get to drkonqi at all, and
> restarting kwin_x11 makes drkonqi disappear.
I'd rather say this is (if at all) a bug #348834 in drkonqi - re/starting the WM doesn't make any windows disappear.


> Name org.kde.KWin is currently not owned by anyone.
Since there's no running kwin process, that's not much of a surprise.

> Here's also the gdb backtrace when kwin crashes/hangs
The backtrace looks like the actual crash is in another thread.

> It's possible that kwin_x11 tries to use something that wasn't initialized
> at that moment, either from Qt or somewhere else.
I'll bet your right arm on EGL (see the other bug) - please try GLX ("kcmshell5 kwincompositing")
Comment 8 Armin K. 2015-07-05 12:13:38 UTC
(In reply to Thomas Lübking from comment #7)
> (In reply to Armin K. from comment #4)
> > But, given that the WM is unavailable, I can't get to drkonqi at all, and
> > restarting kwin_x11 makes drkonqi disappear.
> I'd rather say this is (if at all) a bug #348834 in drkonqi - re/starting
> the WM doesn't make any windows disappear.
> 

Well, killing the process it's trying to debug/report certainly killed drkonqi too. At least in my case. 

> 
> > Name org.kde.KWin is currently not owned by anyone.
> Since there's no running kwin process, that's not much of a surprise.
> 

There is, as shown in comment 4. But I believe that's an already crashed process trapped by drkonqi/kcrash. What made no sense to me is why it didn't restart kwin properly. Bug or feature?

> > Here's also the gdb backtrace when kwin crashes/hangs
> The backtrace looks like the actual crash is in another thread.
> 
> > It's possible that kwin_x11 tries to use something that wasn't initialized
> > at that moment, either from Qt or somewhere else.
> I'll bet your right arm on EGL (see the other bug) - please try GLX
> ("kcmshell5 kwincompositing")

Okay, switched to GLX from EGL (I have no idea why I selected EGL). So far, trying the usual stuff that crashed kwin didn't trigger any crash.

Still, EGL stuff ought to be fixed sometimes, even if switching to GLX fixes my problem (or note that using EGL is buggy, whatever seems easier).
Comment 9 Thomas Lübking 2015-07-05 12:55:32 UTC
(In reply to Armin K. from comment #8)
> There is, as shown in comment 4.
*running* - as opposed to "stopped" ;-)

> restart kwin properly. Bug or feature?
From the dupe, the crashed kwin process (at least in this zombie mode) fails to release the WM selection; therefore the new kwin process cannot take it and immediately stops (that's a requirement for WMs since it needs to be a singleton process) => the bug is that for whatever reason the kwin process held by drkonqi doesn't get rid of the WM selection.

Usually this should happen automagically, because the selection is claimed via a helper window and when kwin crashes, the connection to the X11 server gets cut and the X11 server would remove that window - and the selection with it...

> Still, EGL stuff ought to be fixed sometimes
We'd need a backtrace for that (the thread you presented just informs us that something could not connect the X11 server, and that's simply because kwin just crashed) - at least to see whether that's something in kwin or in the driver.
Comment 10 Armin K. 2015-07-05 13:07:48 UTC
(In reply to Thomas Lübking from comment #9)
> (In reply to Armin K. from comment #8)
> > There is, as shown in comment 4.
> *running* - as opposed to "stopped" ;-)
> 
> > restart kwin properly. Bug or feature?
> From the dupe, the crashed kwin process (at least in this zombie mode) fails
> to release the WM selection; therefore the new kwin process cannot take it
> and immediately stops (that's a requirement for WMs since it needs to be a
> singleton process) => the bug is that for whatever reason the kwin process
> held by drkonqi doesn't get rid of the WM selection.
> 
> Usually this should happen automagically, because the selection is claimed
> via a helper window and when kwin crashes, the connection to the X11 server
> gets cut and the X11 server would remove that window - and the selection
> with it...
> 

I have "disabled" drkonqi by exporting KDE_DEBUG=1 as advised on irc. Now, kwin crashed again, and no drkonqi instance was started. I saw kwin restarted itself (ps shows kwin_x11 started with --crashes 1), but it wasn't working properly, ie no borders, decorations, etc. I had to kill it myself and start it again from krunner. Is this related to what you said here, or is it something else entirely?

> > Still, EGL stuff ought to be fixed sometimes
> We'd need a backtrace for that (the thread you presented just informs us
> that something could not connect the X11 server, and that's simply because
> kwin just crashed) - at least to see whether that's something in kwin or in
> the driver.

Ok, I'll try utilizing systemd coredumps for this.
Comment 11 Thomas Lübking 2015-07-18 14:39:23 UTC
(In reply to Armin K. from comment #10)

> it wasn't working properly, ie no borders, decorations, etc.
Sounds (if you could still Alt+left mouse button move windows) like no proper decoration plugin lib was found - what's rather weird if it works when restarting it by hand...

I don't think this has direct relation, but could indicate a broken installation where different plugin libs are resolved depending on the startup environment.

> Ok, I'll try utilizing systemd coredumps for this.
Any success on this?
Comment 12 Armin K. 2015-07-21 12:27:16 UTC
(In reply to Thomas Lübking from comment #11)
> (In reply to Armin K. from comment #10)
> 
> > it wasn't working properly, ie no borders, decorations, etc.
> Sounds (if you could still Alt+left mouse button move windows) like no
> proper decoration plugin lib was found - what's rather weird if it works
> when restarting it by hand...
> 
> I don't think this has direct relation, but could indicate a broken
> installation where different plugin libs are resolved depending on the
> startup environment.
> 

I've seen kwin crashing but properly restarting itself with drkonqi disabled. It seems the issue went away.

> > Ok, I'll try utilizing systemd coredumps for this.
> Any success on this?

I was away for some time and didn't use my linux install. I've built all plasma packages from git on Sunday and have now switched KWin to use EGL instead of GLX.

But, I did notice one thing: KWin crash doesn't produce something like

Jul 19 18:19:02 krejzi kernel: kactivitymanage[11604]: segfault at 7f25b19e2770 ip 00007f25b1c28711 sp 00007ffeec54c3d8 error 4 in libQt5Sql.so.5.5.0[7f25b1c14000+3f000]

In the system log, and systemd-coredump only catches these afaik. Still, I'll try and see what can be done (unless the issue went away with latest snapshot that is).
Comment 13 Armin K. 2015-07-23 08:55:48 UTC
(In reply to Armin K. from comment #12)
> (In reply to Thomas Lübking from comment #11)
> > (In reply to Armin K. from comment #10)
> > 
> > > it wasn't working properly, ie no borders, decorations, etc.
> > Sounds (if you could still Alt+left mouse button move windows) like no
> > proper decoration plugin lib was found - what's rather weird if it works
> > when restarting it by hand...
> > 
> > I don't think this has direct relation, but could indicate a broken
> > installation where different plugin libs are resolved depending on the
> > startup environment.
> > 
> 
> I've seen kwin crashing but properly restarting itself with drkonqi
> disabled. It seems the issue went away.
> 
> > > Ok, I'll try utilizing systemd coredumps for this.
> > Any success on this?
> 
> I was away for some time and didn't use my linux install. I've built all
> plasma packages from git on Sunday and have now switched KWin to use EGL
> instead of GLX.
> 
> But, I did notice one thing: KWin crash doesn't produce something like
> 
> Jul 19 18:19:02 krejzi kernel: kactivitymanage[11604]: segfault at
> 7f25b19e2770 ip 00007f25b1c28711 sp 00007ffeec54c3d8 error 4 in
> libQt5Sql.so.5.5.0[7f25b1c14000+3f000]
> 
> In the system log, and systemd-coredump only catches these afaik. Still,
> I'll try and see what can be done (unless the issue went away with latest
> snapshot that is).

Ok, ignore this comment. I was able to get kwin_x11 to crash and generate a backtrace using the core obtained from coredumpctl.

I did rebuild Qt5 (most of it, anyways) with debug symbols (although in release mode as debug mode would make other KDE components crash, so it was built with -O2 -g and not stripped). Same goes for kwin, except it was built in true debug mode, no optimization. Still, I'm missing some symbols and have no idea where they come from. If anyone has any pointers about them, please let me know what I need to rebuild/install.

Core was generated by `kwin_x11'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fe1bdd1d469 in raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/pt-raise.c:36
36      ../sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
(gdb) bt
#0  0x00007fe1bdd1d469 in raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/pt-raise.c:36
#1  0x00007fe1c39d8986 in KCrash::defaultCrashHandler (sig=11) at /home/armin/src/frameworks/kcrash-5.12.0/src/kcrash.cpp:409
#2  <signal handler called>
#3  ref (this=0x0) at /usr/include/qt5/QtCore/qrefcount.h:54
#4  toQString (this=<optimized out>) at jsruntime/qv4string_p.h:81
#5  toQString (this=<optimized out>) at jsruntime/qv4string_p.h:141
#6  QV4::Heap::StringObject::StringObject (this=<optimized out>, engine=0x28ccc40, val=...) at jsruntime/qv4stringobject.cpp:91
#7  0x00007fe1c28566e3 in alloc<QV4::StringObject, QV4::ExecutionEngine*, QV4::Value> (arg2=..., arg1=0x28ccc40, this=<optimized out>)
    at jsruntime/qv4mm_p.h:117
#8  QV4::ExecutionEngine::newStringObject (this=0x28ccc40, value=...) at jsruntime/qv4engine.cpp:552
#9  0x00007fe1c28edc7f in QV4::Runtime::getProperty (engine=0x28ccc40, object=..., nameIndex=<optimized out>) at jsruntime/qv4runtime.cpp:679
#10 0x00007fe10e762089 in ?? ()
#11 0x00007fe1c34ba6a2 in QQuickItem::staticMetaObject () from /usr/lib/libQt5Quick.so.5
#12 0x00007fe1071b5238 in ?? ()
#13 0x0000000002e66e10 in ?? ()
#14 0x00007fe1071b5230 in ?? ()
#15 0x00000000028ccc40 in ?? ()
#16 0x00007fe1bbe8b199 in CallConstructor (this=<synthetic pointer>, tc=<synthetic pointer>)
    at ../../include/QtCore/5.5.0/QtCore/private/../../../../../src/corelib/kernel/qvariant_p.h:339
#17 FilteredConstructor (this=<optimized out>, tc=<synthetic pointer>)
    at ../../include/QtCore/5.5.0/QtCore/private/../../../../../src/corelib/kernel/qvariant_p.h:362
#18 delegate<QIcon> (this=<synthetic pointer>) at ../../include/QtCore/5.5.0/QtCore/private/../../../../../src/corelib/kernel/qvariant_p.h:383
#19 switcher<void, QVariantConstructor<(anonymous namespace)::GuiTypesFilter> > (data=0x0, type=<optimized out>, logic=<synthetic pointer>)
    at ../../include/QtCore/5.5.0/QtCore/private/../../../../../src/corelib/kernel/qmetatypeswitcher_p.h:68
#20 (anonymous namespace)::construct (x=0x2ba9620, copy=0x7ffc9df477a0) at kernel/qguivariant.cpp:101
#21 0x00000000028ccc40 in ?? ()
#22 0x00000000028ccc40 in ?? ()
#23 0x0000000002e9dc78 in ?? ()
#24 0x0000000000000003 in ?? ()
#25 0x00000000028a8a10 in ?? ()
#26 0x00007fe1c2cdb520 in ?? () from /usr/lib/libQt5Qml.so.5
#27 0x00007fe1071b51d0 in ?? ()
#28 0x00000000028ccc40 in ?? ()
#29 0x00007ffc9df48220 in ?? ()
#30 0x00007fe10e2c40f0 in ?? ()
#31 0x0000000000000000 in ?? ()

Coredumpctl output reports something less longer:

                Stack trace of thread 618:
                #0  0x00007fe1bdd1d469 raise (libpthread.so.0)
                #1  0x00007fe1c39d8986 _ZN6KCrash19defaultCrashHandlerEi (libKF5Crash.so.5)
                #2  0x00007fe1ba3874a0 __restore_rt (libc.so.6)
                #3  0x00007fe1c28ae192 _ZN9QtPrivate8RefCount3refEv (libQt5Qml.so.5)
                #4  0x00007fe1c28566e3 _ZN3QV413MemoryManager5allocINS_12StringObjectEPNS_15ExecutionEngineENS_5ValueEEEPNT_4DataET0_T1_ (libQt5Qml.so.5)
                #5  0x00007fe1c28edc7f _ZN3QV47Runtime11getPropertyEPNS_15ExecutionEngineERKNS_5ValueEi (libQt5Qml.so.5)
                #6  0x00007fe10e762089 n/a (n/a)
Comment 14 Thomas Lübking 2015-07-23 13:07:10 UTC
The problem is (likely)
> Plugin: org.kde.kwin.aurorae
> Theme: kwin4_decoration_qml_plastik

but the particular crash could be introduced with Qt5.5 (at least there *was* a regression in alpha versions:
https://bugreports.qt.io/browse/QTCREATORBUG-14595

it's supposed to be fixed in the release, but that doesn't mean there's no other bug introduced in that context.

=> switch to the breeze decoration (the supportInformation say that you're using the breeze plugin, NOT the aurorae plugin with some breeze theme) and see whether you can re-cause
a) the crash (unlikely)
b) the stall (most interesting question)
Comment 15 Armin K. 2015-07-23 13:18:06 UTC
(In reply to Thomas Lübking from comment #14)
> The problem is (likely)
> > Plugin: org.kde.kwin.aurorae
> > Theme: kwin4_decoration_qml_plastik
> 
> but the particular crash could be introduced with Qt5.5 (at least there
> *was* a regression in alpha versions:
> https://bugreports.qt.io/browse/QTCREATORBUG-14595
> 
> it's supposed to be fixed in the release, but that doesn't mean there's no
> other bug introduced in that context.
> 

I was bitten by that bug in a way that plasma wouldn't start. I can confirm the issue was fixed in Qt-5.5-rc. However, my original issue happened with Qt-5.4.1 and Qt-5.4.2 too, although I'm not sure if the backtrace is similar.

> => switch to the breeze decoration (the supportInformation say that you're
> using the breeze plugin, NOT the aurorae plugin with some breeze theme) and
> see whether you can re-cause
> a) the crash (unlikely)
> b) the stall (most interesting question)

How do I switch to breeze decoration? I have selected breeze in systemsettings wherever it was possible. I do remember a similar bug due to using aurorae engine which was around before kdecoration2 port

https://bugs.kde.org/show_bug.cgi?id=341110

But doesn't seem the same issue.
Comment 16 Armin K. 2015-07-23 13:22:57 UTC
(In reply to Armin K. from comment #15)
> (In reply to Thomas Lübking from comment #14)
> > => switch to the breeze decoration (the supportInformation say that you're
> > using the breeze plugin, NOT the aurorae plugin with some breeze theme) and
> > see whether you can re-cause
> > a) the crash (unlikely)
> > b) the stall (most interesting question)
> 
> How do I switch to breeze decoration?

Ok, nevermind. I found it. It's really ugly and unusable for me, but it does say org.kde.breeze now instead of aurorae. I can handle it long enough to try and reproduce the crash.
Comment 17 Thomas Lübking 2015-07-23 13:24:41 UTC
you can configure it to a certain degree - notably turn the buttons smaller ;-)
Comment 18 Armin K. 2015-07-31 10:10:54 UTC
After some time of using the breeze instead of aurorae engine or whatever it's called, there wasn't a single crash.
Comment 19 Thomas Lübking 2015-07-31 14:37:34 UTC
and about hangs (ie. the original report)?
Comment 20 Armin K. 2015-08-04 21:05:01 UTC
(In reply to Thomas Lübking from comment #19)
> and about hangs (ie. the original report)?

No. The hangs stopped as soon as I disabled drkonqi.
Comment 21 Thomas Lübking 2016-01-19 11:12:53 UTC
Marking as dupe for the core segfault, drkonqi issue is hopefully resolved in 5.6

*** This bug has been marked as a duplicate of bug 351839 ***