Summary: | Frequent (un)-mapping of window causes memory-"leak" and crash | ||
---|---|---|---|
Product: | [Plasma] kwin | Reporter: | Clemens Eisserer <linuxhippy> |
Component: | general | Assignee: | KWin default assignee <kwin-bugs-null> |
Status: | RESOLVED UPSTREAM | ||
Severity: | crash | CC: | adaptee, cfeck, dl.zerocool, philipp-dev |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Fedora RPMs | ||
OS: | Unspecified | ||
Latest Commit: | Version Fixed In: | ||
Attachments: |
test causing kwin to struggle
screenshot of kwin profile while spinning sysprof screenshot sysprof profile of kwin beeing slow mapping/unmapping window An screenshot of kcachegrind dropping events massif output |
Description
Clemens Eisserer
2009-05-12 20:26:10 UTC
Created attachment 33589 [details]
test causing kwin to struggle
Do you use compositing? If yes is the behaviour reproducable without compositing? can't reproduce here (nvidia, 185.15 beta, todays trunk). neither w/ or w/o compositing. kwin sucks my cpu, but there's no increase on system or video memory (according to xrestop) usage -> deco problem? I see it without composition, with plastik decoration. I'll try another one. By the way, I don't see an memory increase in xrestop, but in top. also happens with "redmond" decoration "shrug" so either it has been meanwhile fixed (in KWin, Qt or X11 - used versions?) or... err... ...i guess you errr... know... valgrind? :-\ btw. filed a Qt bug about this sorry, wrong bug - kde's bugzilla sends me to the wrong bug-report after I press "Commit". Still happens with latest rawhide packages (KDE-4.2.90 / QT-4.5.1), I even switched to vesa+nofb to see wether it could be a driver related problem. When when runnning the map-demo for only a short time and killing it, kwin continues to suck 100% cpu. I ran sysprof while kwin behaved bad, and attached it as screenshot. Created attachment 34744 [details]
screenshot of kwin profile while spinning
Take a look if this is not related to this bug. https://bugs.kde.org/show_bug.cgi?id=201445 no, unfourtunatly not related ... seems like this will stay unfixed for a long time :-/ i wonder whether this could be related to the placement or focus strategy - so: which do you use? Unfourtunatly this still happens with KDE-4.5 beta 1, I did a sysprof profile while kwin was using 100% cpu and already 700mb RSS. (sysprof profile attached as well as screenshot). How can I find out the focus strategy? Created attachment 47655 [details]
sysprof screenshot
Created attachment 47656 [details]
sysprof profile of kwin beeing slow mapping/unmapping window
Just tried the test from comment #1. I went to Ctrl+Alt+F1 console, checked "top" and kwin sucked in all available memory (over 1.5 gig), until it got OOM killed (I have 2 gig RAM and no swap, so it got there pretty quickly). This is both with compositing disabled or enabled X11/intel drivers from openSUSE 11.2 update repo Qt 4.7 branch and KDE from trunk built daily any news on this? Created attachment 52415 [details]
An screenshot of kcachegrind
I have run a modified version of the test program (to only unmap/map 8000 times), and run kwin under callgrind. The high cpu happens for me with and without effects. But it does not eats memory.
The cpu usage is mainly in KWin::updateXTime().
I can not attach the callgrind file, it is > 1 Mib compressed.
In my experiment, the test program from comment #1 makes kwin consume huge amount of memory(observed through htop in tty). I'm using KDE SC 4.8 beta1, Qt 4.8 rc1 and nvidia driver 290.10 Wow, reported this bug 30 months ago. Ok, i meanwhile can reproduce it locally, maybe due to one of the many xserver updates in teh meantime - i do not know. First of all the testcase - while a valid stress test and w/o any critizism to Clemens who's testcase has already helped to fix another nasty issue (many thanks again, really) - is functionally malware. It pipes ("half a bazillion") maps & unmaps into the X11 server until the X event queue cannot bare them anymore and forces a flush. [1] Since the window wasn't ever mapped KWin receives only Map requests and all at once. THE ISSUE ON KWIN's side is, that for some reason i'm currently looking for, it isn't aware that the client is already created [2], so it creates an enormous amount of client objects for the very same window, which is then unmapped and released *once* - at best ;-) The "leak" is a natural result of the assumption that an unmap needs to remove one client object, not all matching, which is a false one by this very exploit. [1] append "XSync(display, False);" to the loop and the "memleak" is magically gone - yes, i can meanwhile reproduce it here :-) [2] "Client* c = findClient(WindowMatchPredicate(e->xmaprequest.window));" returns NULL Created attachment 66351 [details]
dropping events
Ok, the attached patch drops all maprequest events that occur in a row for the same window (add yourself "qDebug() << idiots;" before the return...)
BUT: that doesn't deal with that "leak"!
The former description was btw. not all correct.
What actually (reproducably) happens are two proper MapRequest / Client::UnmapNotifyEvent() calls and from then on Client::manage() refuses to deal with this client (createClient() fails, the client object should be deleted. I assume that this happens because it has apparently no direct impact on this "leak")
What is however remarkable is, that the patch _can_ hold back the leak for "some" time, ie. sometimes 100 ms, sometimes 100 secs :s
... seems valgrind is unavoidable *sigh*
Created attachment 66352 [details] massif output Ah, Blast! It's in Xlib, no fun and no cheers for me :( Attached is a massif report (that's btw. a valgrind tool - see comment #6. Use ms_print if you want to look into the report, more human compatible) A typical snapshot looks like: #----------- snapshot=81 #----------- time=3129431873 mem_heap_B=149992025 mem_heap_extra_B=11817175 mem_stacks_B=0 heap_tree=detailed n2: 149992025 (heap allocation functions) malloc/new/new[], --alloc-fns, etc. n1: 146925792 0x48FFE72: _XEnq (in /usr/lib/libX11.so.6.3.0) n2: 146925792 0x48FC9CA: ??? (in /usr/lib/libX11.so.6.3.0) n2: 93518672 0x48FD84E: _XReply (in /usr/lib/libX11.so.6.3.0) n2: 93506712 0x48E2506: XGetWindowProperty (in /usr/lib/libX11.so.6.3.0) n1: 93481336 0x50D6D38: NETWinInfo::update(unsigned long const*) (netwm.cpp:4073) n1: 93481336 0x50D9904: NETWinInfo::event(_XEvent*, unsigned long*, int) (netwm.cpp:3925) n1: 93481336 0x408B361: KWin::Client::windowEvent(_XEvent*) (events.cpp:563) n1: 93481336 0x408BABF: KWin::Workspace::workspaceEvent(_XEvent*) (events.cpp:291) n1: 93481336 0x407CBB0: KWin::Application::x11EventFilter(_XEvent*) (main.cpp:359) n1: 93481336 0x596F9A2: ??? (in /usr/lib/libQtGui.so.4.7.4) n1: 93481336 0x597CF35: QApplication::x11ProcessEvent(_XEvent*) (in /usr/lib/libQtGui.so.4.7.4) n1: 93481336 0x59A8F86: ??? (in /usr/lib/libQtGui.so.4.7.4) n1: 93481336 0x566EB6B: QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/libQtCore.so.4.7.4) n0: 93481336 0xBE9BE2D2: ??? n0: 23712 in 26 places, all below massif's threshold (01.00%) n0: 11960 in 7 places, all below massif's threshold (01.00%) n2: 53403064 0x48FD476: _XEventsQueued (in /usr/lib/libX11.so.6.3.0) n1: 53386424 0x48EDD66: XEventsQueued (in /usr/lib/libX11.so.6.3.0) n1: 53386424 0x59A9021: ??? (in /usr/lib/libQtGui.so.4.7.4) n1: 53386424 0x566EB6B: QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/libQtCore.so.4.7.4) n0: 53386424 0xBE9BE2D2: ??? n0: 16640 in 2 places, all below massif's threshold (01.00%) n0: 3066233 in 1325 places, all below massif's threshold (01.00%) ================ So it's likely in whatever there is in "???" between _XReply and _XEnq (or _XEnq) Equipped with this knowledge, i was curious whether the exploit could also harm -say eg.- openbox and guess what: it sucks up memory even much faster (what is -to be fair- because openbox can deal maprequests much faster than kwin) PS: memcheck does however not report significant leaks, like maybe 6kb when kwin took ~600MB will this be reported upstream, so the xcb/xlib devs can work on it? hmmmmm... googled around, look here for a historic document: http://lists.trolltech.com/qt-interest/1997-08/thread00077-0.html The mountainview oracle really knows many items on "gg:_XEnq leak" :) Given that this is (according to local memcheck, i'm willing to grant that kwin leaks some kb itself) no actual leak, i'd rather gather some more information before making a fool out of myself reporting this upstream. _XEnq obviously hooks events into the applications event queue. Now there's your little bastard which pushes those events at an enormous pace which no other ("real") client can keep up with, so the queue of unhandled events grows and grows and grows .... that's not nice, but quite expectable. Two questions remain: 1. why didn't it happen to me (or Jaime) then? (some former protection in X*Map which triggered a sync??) 2. while it's pretty expectable that the event queue grows, why are the events not dealt or XSync(dpy, True)'d afterwards (iff it's the events in the queue which occupy the memory) Additional information regarding (2) W/o the event dropper, kwin took a lot of CPU cycles after exploit was killed (presumingly to deal out the event queue) - at least *here* this is gone with the event dropper. KWin/Xlib sucks the memory, but if i don't let it swap out, the machine is calm as soon as i kill the exploit. (confirmation of this difference would be nice, wink-wink) More additional info (event cookies, apparently there's a cookie jar - could be totally unrelated, though) http://www.mail-archive.com/xorg-devel@lists.x.org/msg01155.html |