Bug 346337

Summary: kwin exitted when I toggled compositing off
Product: [Plasma] kwin Reporter: Peter Cordes <peter>
Component: compositingAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED WORKSFORME    
Severity: normal Keywords: triaged
Priority: NOR    
Version: 5.2.2   
Target Milestone: ---   
Platform: Kubuntu   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Peter Cordes 2015-04-18 13:57:15 UTC
I'm on Kubuntu 15.04  5.2.2a-0ubuntu1

I had mpv (fork of mplayer) running fullscreen on screen1, not composited (because of enabling the "don't composite fullscreen windows" setting), and I had a bunch of windows open.  Including a windowed OpenGL program (xmoto) on screen0.  I forget which window had the focus when I pressed ctrl+alt+f12.

My first reaction was "wow, KDE is really shit without compositing", but then I realized that actually the window manager had exitted, and it was a bug. :P

~/.xsession-errors has:
(early stuff from session init:

kwin_core: screens:  2 desktops:  1
kwin_core: Done.
Trying to use rootObject before initialization is completed, whilst using setInitializationDelayed. Forcing completion
kwin_core: Initializing OpenGL compositing
completeShutdownOrCheckpoint called
kf5.kinit.klauncher: appId= ":1.20" newAppId= ":1.20" pendingAppId= "*.polkit-kde-authentication-agent-1"
kf5.kinit.klauncher: appName= "20"
kf5.kinit.klauncher: appId= ":1.21" newAppId= ":1.21" pendingAppId= "*.polkit-kde-authentication-agent-1"
kf5.kinit.klauncher: appName= "21"
kwin_core: Choosing GLXFBConfig 0xb0 X visual 0x216 depth 24 RGBA 8:8:8:0 ZS 0:0
kf5.kinit.klauncher: appId= "org.kde.ActivityManager" newAppId= "org.kde.ActivityManager" pendingAppId= "*.polkit-kde-authenticat
ion-agent-1"
kf5.kinit.klauncher: appName= "ActivityManager"
org.kde.kactivities.activities: Starting the KDE Activity Manager daemon QDateTime("2015-04-16 21:44:00.011 ADT Qt::LocalTime")
OpenGL vendor string:                   X.Org
OpenGL renderer string:                 Gallium 0.4 on AMD BARTS
OpenGL version string:                  3.0 Mesa 10.5.2
OpenGL shading language version string: 1.30
Driver:                                 R600G
GPU class:                              NI
OpenGL version:                         3.0
GLSL version:                           1.30
Mesa version:                           10.5.2
X server version:                       1.17.1
Linux kernel version:                   3.19
Requires strict binding:                no
GLSL shaders:                           yes
Texture NPOT support:                   yes
Virtual Machine:                        no
Direct rendering: true 

kwin_core: 0x0: OpenGL debug output initialized
kwin_core: Color correction: false
New PolkitAgentListener  0x10d2b20

...

kwin_core: Creating window pixmap failed:  8
kwin_core: Creating window pixmap failed:  8
kwin_core: Creating window pixmap failed:  8
kwin_core: Creating window pixmap failed:  8
   (this seems to happen all the time, with no obvious ill effects)
...
kwin_core: Unredirecting: 'ID: 102760450 ;WMCLASS: "mpv" : "gl" ;Caption: "mpv - foo.mp4" ' 
 (This was probably a few minutes before kwin died, since I assume this was done when I fullscreened mpv.)
kf5.kiconthemes: Warning: could not find "unknown" icon for size 16
kf5.kiconthemes: Warning: could not find "unknown" icon for size 16
kf5.kiconthemes: Warning: could not find "unknown" icon for size 16
kf5.kiconthemes: Warning: could not find "unknown" icon for size 16
kwin_x11: Couldn't find current GLX or EGL context.

And this was the last I heard from that kwin.  I assume that error was what led it to exit.

I was able to fix my desktop by running a new kwin_x11.

I can't reproduce the problem, even with all the same windows open.  With/without mpv having the focus.  Kwin just prints   
kwin_core: Releasing compositor selection

and everything works.

Reproducible: Couldn't Reproduce
Comment 1 Thomas Lübking 2015-04-18 15:27:09 UTC
> I can't reproduce the problem

Did you (the system) at any point run an update during that session (pot. involving glibc, Mesa, Qt or KDE libraries)?
Comment 2 Peter Cordes 2015-04-18 16:14:27 UTC
I looked at the upgrades done between booting up and when I saw the problem, and according to /var/log/aptitude, and /var/log/syslog, I did find a couple potential suspects:

[INSTALL] kdemultimedia-kio-plugins:amd64

[UPGRADE] libglib2.0-0:amd64 2.44.0-1 -> 2.44.0-1ubuntu2

and

[UPGRADE] libgudev-1.0-0:amd64 1:219-7ubuntu1 -> 1:219-7ubuntu2
[UPGRADE] libmysqlclient18:amd64 5.6.23-1~exp1~ubuntu5 -> 5.6.24-0ubuntu1
[UPGRADE] libpam-systemd:amd64 219-7ubuntu1 -> 219-7ubuntu2
[UPGRADE] libsystemd0:amd64 219-7ubuntu1 -> 219-7ubuntu2
[UPGRADE] libudev1:amd64 219-7ubuntu1 -> 219-7ubuntu2
[UPGRADE] mysql-client-core-5.6:amd64 5.6.23-1~exp1~ubuntu5 -> 5.6.24-0ubuntu1
[UPGRADE] mysql-common:amd64 5.6.23-1~exp1~ubuntu5 -> 5.6.24-0ubuntu1
[UPGRADE] mysql-server-core-5.6:amd64 5.6.23-1~exp1~ubuntu5 -> 5.6.24-0ubuntu1
[UPGRADE] systemd:amd64 219-7ubuntu1 -> 219-7ubuntu2
[UPGRADE] systemd-sysv:amd64 219-7ubuntu1 -> 219-7ubuntu2
[UPGRADE] udev:amd64 219-7ubuntu1 -> 219-7ubuntu2


Also, holy crap, Kubuntu logs my Alt+Tabs to syslog??

I didn't get a "kwin has crashed, report this bug" popup or anything (although IDK if that was impossibled because of having no window manager).
Comment 3 Thomas Lübking 2015-04-18 20:01:06 UTC
"Normally", KWin should have performed a crash-restart (and you gotten a crash report)
Since neither happened, I thought some update may have junked mapped memory.

libglib would be a candidate, it's linked by libQt5Core

About everything that was kDebug in SC4 is now either qCDebug or qDebug - I'm not sure whether we've a GUI for qCDebug right now.

See http://doc.qt.io/qt-5/qloggingcategory.html for details, kwin_x11 is (mostly) the KWIN_CORE category.
Comment 4 Peter Cordes 2015-04-19 18:56:49 UTC
> "Normally", KWin should have performed a crash-restart (and you gotten a crash report)
> Since neither happened, I thought some update may have junked mapped memory.

dpkg replaces files by rename(2)ing new versions on top of old ones.  Existing processes still have the code from the deleted files mapped.  Debian/Ubuntu systems would break horribly if they over-wrote libraries in place while they were being used, and dpkg wouldn't be able to upgrade itself.  (ETXTBUSY).

It's possible that when kwin switched to non-compositing mode, it did something that depended on a newly-execced process accessing the internals of something using / created by an old process.  That's the only way I can see that this would explain it.

Hmm, I just reproduced this again.  But it was after KDE had become semi-wedged already.  (Clicks on the panel, and right-click on the desktop, did nothing.  I got into this state after some messing around with the panel config / sizing.)

I'm getting kwin_x11 stderr output like:

kwin_core: No QQuickWindow assigned yet
QXcbConnection: XCB error: 3 (BadWindow), sequence: 53686, resource id: 46150097, major code: 18 (ChangeProperty), minor code: 0
QXcbConnection: XCB error: 3 (BadWindow), sequence: 53692, resource id: 46150097, major code: 3 (GetWindowAttributes), minor code: 0
QXcbConnection: XCB error: 9 (BadDrawable), sequence: 53693, resource id: 46150097, major code: 14 (GetGeometry), minor code: 0
QXcbConnection: XCB error: 3 (BadWindow), sequence: 53696, resource id: 46150097, major code: 3 (GetWindowAttributes), minor code: 0
QXcbConnection: XCB error: 9 (BadDrawable), sequence: 53697, resource id: 46150097, major code: 14 (GetGeometry), minor code: 0
areKeySymXsDepressed:  any of  2
0 : keySymX=0x "ffe9"  i= 8  mask=0x "1"  keymap[i]=0x "1

when I alt+tab (using the "grid" window display effect).  Errors like this were all over the place in .xsession-errors before I tried disabling compositing to see if it would crash kwin_x11.  (I hadn't previously used that keybind this session, but I had been running mpv fullscreen on one of my monitors sometimes.  No OpenGL programs were running this time when I disabled compositing.)

My desktop is usable, since alt-tab with the grid switcher works.  The panel is updating when windows minimize / unminimize, but it's not accepting clicks.  (not even in the launcher.)  Fortunately I don't care about that, since I start everything from screen(1) in a konsole anyway. :P

So I can leave this running for a while.  Should I open a new bug for the panel-not-responding thing?  And to save a round-trip, what processes should I look for that might be missing if something crashed?  So I can try restarting something to see if that brings the panel back.
Comment 5 Thomas Lübking 2015-04-19 19:57:24 UTC
(In reply to Peter Cordes from comment #4)
> > "Normally", KWin should have performed a crash-restart (and you gotten a crash report)
> > Since neither happened, I thought some update may have junked mapped memory.
> 
> dpkg replaces files by rename(2)ing new versions on top of old ones. 

Do you have deeper insight here?
Given the tremendous amount of bugreports à al "xyz crashed while updating" KDE receives all over the place, one (me, for one) frankly suspects that dpkg/anything in the (it *does* seem ubuntu specific) update stack truncates files on updates (causing "strange" sefaults the next time anything from that file needs to be loaded into the cache)

> I'm getting kwin_x11 stderr output like:

Bad window errors mean that some client (possibly kwin) attempted operation on an already destroyed window. That's somehow "ok" given that X11 is an async protocol (we'd have to perform a horrible amount of syncs and server grabs to prevent that) and generally harmless.

Stacktraces would be better, but a not crashrestarting kwin sound bad enough by it's own (and i cannot really explain that except for corrupted ((disk)) memory...)
 
> My desktop is usable, since alt-tab with the grid switcher works. 
That means kwin did NOT crash (or restarted implicitly)

> is updating when windows minimize / unminimize, but it's not accepting
> clicks.
That means either
- the pointer is grabbed (run 'xdotool key "XF86LogGrabInfo"' and check /var/log/Xorg.0.log)
- plasmashell (?) does not receive/process X11 events
- the taskbar plasmoid is broken

Do other parts in the desktop/panel interpret mouseevents?

> Should I open a new bug for the panel-not-responding thing?
Yes please, the panel/taskbar doesn't belong to kwin (but plasmashell)

> I look for that might be missing if something crashed?
kwin_x11 is the windowmanager (move, restack, switch windows etc.), plasmashell the desktop+panels

> restarting something to see if that brings the panel back.
-> restart plasmashell AFTER checking the cause ;-)
Comment 6 Peter Cordes 2015-04-19 22:00:06 UTC
> Do you have deeper insight here?
> Given the tremendous amount of bugreports à al "xyz crashed while updating" KDE receives all
> over the place, one (me, for one) frankly suspects that dpkg/anything in the (it *does* seem 
> ubuntu specific) update stack truncates files on updates (causing "strange" sefaults the next
 > time anything from that file needs to be loaded into the cache)

 I can guarantee that your guess about this is wrong.  lsof after an upgrade will show lots of processes with mappings for /usr/lib/* (deleted).  Unix filesystem semantics support doing this perfectly well.  You'd only have a problem with when an already-running (with old libraries) loaded a new module or something with dlopen.  Or if some inter-process communication thing had an ABI change.

The usual reason for problems after upgrading is that the running code gets out of sync with data files on disk.  e.g. a firefox update might bring changes to non-executable files on disk, along with code changes to expect the new file format.

 IDK KDE well enough to suggest anything about how KDE might tend to break from upgrades, and I'm not too surprised to hear it happens, but modifying mapped libraries isn't the reason.  Dynamic linking is all done before main() is called; there's no more opening files by name to resolve new symbols after that.  (except for dlopen).
Comment 7 Peter Cordes 2015-04-19 22:10:05 UTC
> > My desktop is usable, since alt-tab with the grid switcher works. 
> That means kwin did NOT crash (or restarted implicitly)

Oh, sorry that's what I get for not just opening a new bug.  I meant the panel breakage wasn't blocking me from using my desktop.  Of course all my window decorations and task switching ability went away when kwin_x11 crashed.  And when I started it again from a konsole, it would have told me there was already a window manager.

I tried again to reproduce it while I had strace -f attached to kwin, but it didn't crash when toggling compositing this time.  Next time I'm going to test it, I'll attach strace or gdb first.

I guess I should also turn on more debugging in kdebugdialog.
Comment 8 Peter Cordes 2015-04-19 23:42:02 UTC
> Yes please, the panel/taskbar doesn't belong to kwin (but plasmashell)

Thanks for the pointer to save me hunting down what component to file the bug against. :)  And for suggesting some ideas to help me add useful details.

reported as https://bugs.kde.org/show_bug.cgi?id=346379
Comment 9 Martin Flöser 2015-04-20 06:08:59 UTC
>  IDK KDE well enough to suggest anything about how KDE might tend to break
> from upgrades, and I'm not too surprised to hear it happens, but modifying
> mapped libraries isn't the reason.  Dynamic linking is all done before
> main() is called; there's no more opening files by name to resolve new
> symbols after that.  (except for dlopen).

could it be loading of plugins? E.g. a plugin got updated and then the new one 
gets load, which was compiled against the newer versions?
Comment 10 Peter Cordes 2015-04-20 14:01:10 UTC
> could it be loading of plugins? E.g. a plugin got updated and then the new one
> gets load, which was compiled against the newer versions?

 Yes, absolutely.  Any KDE processes that load/unload code after startup (with dlopen) are highly vulnerable to breakage from ABI skew on upgrades.  I assume plugins can access a lot of internals, so their ABI isn't limited to public library ABIs that get kept stable.
Comment 11 Thomas Lübking 2015-04-20 21:05:04 UTC
(In reply to Peter Cordes from comment #10)
> highly vulnerable to breakage from ABI skew on upgrades.
I doubt this is any ABI related, but what could happen is that

foo links bar.so -> bar.so.1
bar.so gets updated to bar.so.2
foo dlopens plugin.so
plugin.so links bar.so -> bar.so.2

We'd then (because bar.so.2 isn't already opened) have resolved the same symbols from bar.so.1 and bar.so.2.

This does however not explain this kind of bugs *shrug*
https://bugs.kde.org/show_bug.cgi?id=298219
Comment 12 Peter Cordes 2015-04-21 00:50:34 UTC
(In reply to Thomas Lübking from comment #11)
> (In reply to Peter Cordes from comment #10)
> > highly vulnerable to breakage from ABI skew on upgrades.
> I doubt this is any ABI related, but what could happen is that
 
> foo links bar.so -> bar.so.1
> bar.so gets updated to bar.so.2
> foo dlopens plugin.so
> plugin.so links bar.so -> bar.so.2

 
 Right, I guess ABI isn't the right word, when it's different versions of a library being incompatible with itself.  (e.g. a new member  added to an internal data structure between bar.so.1 and bar.so.2.)

But yea, this would lead to problems.

Oh, I just looked at the man page for dlopen().  I think lazy binding is the default.  I assume it doesn't have to re-open the file by name to resolve function names when they're first used, though.  If it did, performance would probably suck too much.

You're right that external references in dlopen()ed objects are resolved from that libraries dependency list, before using existing symbols in the calling process.  So a modules that pass (pointers to) opaque data structures between itself and the main process could break when those opaque data structures are handled by code in a library that got upgraded.
Comment 13 Andrew Crouthamel 2018-09-25 21:45:46 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least 15 days. Please provide the requested information as soon as possible and set the bug status as REPORTED. Due to regular bug tracker maintenance, if the bug is still in NEEDSINFO status with no change in 30 days, the bug will be closed as RESOLVED > WORKSFORME due to lack of needed information.

For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please set the bug status as REPORTED so that the KDE team knows that the bug is ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 14 Andrew Crouthamel 2018-10-27 03:33:35 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least 30 days. The bug is now closed as RESOLVED > WORKSFORME due to lack of needed information.

For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

Thank you for helping us make KDE software even better for everyone!