Bug 192470

Summary:	Frequent (un)-mapping of window causes memory-"leak" and crash
Product:	[Plasma] kwin	Reporter:	Clemens Eisserer <linuxhippy>
Component:	general	Assignee:	KWin default assignee <kwin-bugs-null>
Status:	RESOLVED UPSTREAM
Severity:	crash	CC:	adaptee, cfeck, dl.zerocool, philipp-dev
Priority:	NOR
Version First Reported In:	unspecified
Target Milestone:	---
Platform:	Fedora RPMs
OS:	Unspecified
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:
Attachments:	test causing kwin to struggle screenshot of kwin profile while spinning sysprof screenshot sysprof profile of kwin beeing slow mapping/unmapping window An screenshot of kcachegrind dropping events massif output

Description Clemens Eisserer 2009-05-12 20:26:10 UTC

Version:            (using KDE 4.2.2)
Installed from:    Fedora RPMs

Frequent unmapping/mapping of a window causes kwin to become completly unresponsive, and to eat memory like there's no tomorrow - see the attached testcase.
I ran it for some time, and it was killed when no free memory was available.

This is not a real-world use-case, however it shouldn't happen.

Comment 1 Clemens Eisserer 2009-05-12 20:28:11 UTC

Created attachment 33589 [details]
test causing kwin to struggle

Comment 2 Martin Flöser 2009-05-12 20:39:30 UTC

Do you use compositing? If yes is the behaviour reproducable without compositing?

Comment 3 Thomas Lübking 2009-05-12 20:55:48 UTC

can't reproduce here (nvidia, 185.15 beta, todays trunk). neither w/ or w/o compositing.
kwin sucks my cpu, but there's no increase on system or video memory (according to xrestop) usage
-> deco problem?

Comment 4 Clemens Eisserer 2009-05-12 20:58:45 UTC

I see it without composition, with plastik decoration. I'll try another one.
By the way, I don't see an memory increase in xrestop, but in top.

Comment 5 Clemens Eisserer 2009-05-12 21:36:24 UTC

also happens with "redmond" decoration

Comment 6 Thomas Lübking 2009-05-12 22:39:32 UTC

"shrug"
so either it has been meanwhile fixed (in KWin, Qt or X11 - used versions?) or... err... ...i guess you errr... know... valgrind? :-\

Comment 7 Clemens Eisserer 2009-05-25 22:27:34 UTC

btw. filed a Qt bug about this

Comment 8 Clemens Eisserer 2009-05-25 22:28:18 UTC

sorry, wrong bug - kde's bugzilla sends me to the wrong bug-report after I press "Commit".

Comment 9 Clemens Eisserer 2009-06-22 17:59:22 UTC

Still happens with latest rawhide packages (KDE-4.2.90 / QT-4.5.1), I even switched to vesa+nofb to see wether it could be a driver related problem.

When when runnning the map-demo for only a short time and killing it, kwin continues to suck 100% cpu.
I ran sysprof while kwin behaved bad, and attached it as screenshot.

Comment 10 Clemens Eisserer 2009-06-22 17:59:59 UTC

Created attachment 34744 [details]
screenshot of kwin profile while spinning

Comment 11 Yvan Da Silva 2009-07-27 23:59:50 UTC

Take a look if this is not related to this bug.
https://bugs.kde.org/show_bug.cgi?id=201445

Comment 12 Clemens Eisserer 2009-07-28 00:04:27 UTC

no, unfourtunatly not related ... seems like this will stay unfixed for a long time :-/

Comment 13 Thomas Lübking 2009-07-28 00:48:07 UTC

i wonder whether this could be related to the placement or focus strategy - so: which do you use?

Comment 14 Clemens Eisserer 2010-06-04 01:42:51 UTC

Unfourtunatly this still happens with KDE-4.5 beta 1, I did a sysprof profile while kwin was using 100% cpu and already 700mb RSS. (sysprof profile attached as well as screenshot).

How can I find out the focus strategy?

Comment 15 Clemens Eisserer 2010-06-04 01:45:50 UTC

Created attachment 47655 [details]
sysprof screenshot

Comment 16 Clemens Eisserer 2010-06-04 01:46:22 UTC

Created attachment 47656 [details]
sysprof profile of kwin beeing slow mapping/unmapping window

Comment 17 Christoph Feck 2010-06-04 02:13:22 UTC

Just tried the test from comment #1. I went to Ctrl+Alt+F1 console, checked "top" and kwin sucked in all available memory (over 1.5 gig), until it got OOM killed (I have 2 gig RAM and no swap, so it got there pretty quickly).

This is both with compositing disabled or enabled

X11/intel drivers from openSUSE 11.2 update repo
Qt 4.7 branch and KDE from trunk built daily

Comment 18 Clemens Eisserer 2010-09-24 12:21:28 UTC

any news on this?

Comment 19 Jaime Torres 2010-10-11 17:18:35 UTC

Created attachment 52415 [details]
An screenshot of kcachegrind

I have run a modified version of the test program (to only unmap/map 8000 times), and run kwin under callgrind. The high cpu happens for me with and without effects. But it does not eats memory.

The cpu usage is mainly in KWin::updateXTime().
I can not attach the callgrind file, it is > 1 Mib compressed.

Comment 20 Jekyll Wu 2011-12-02 12:29:43 UTC

In my experiment, the test program from comment #1 makes kwin consume huge amount of memory(observed through htop in tty).

I'm using KDE SC 4.8 beta1, Qt 4.8 rc1 and nvidia driver 290.10

Comment 21 Clemens Eisserer 2011-12-02 12:39:35 UTC

Wow, reported this bug 30 months ago.

Comment 22 Thomas Lübking 2011-12-02 19:44:36 UTC

Ok, i meanwhile can reproduce it locally, maybe due to one of the many xserver updates in teh meantime - i do not know.

First of all the testcase - while a valid stress test and w/o any critizism to Clemens who's testcase has already helped to fix another nasty issue (many thanks again, really) - is functionally malware.

It pipes ("half a bazillion") maps & unmaps into the X11 server until the X event queue cannot bare them anymore and forces a flush. [1]
Since the window wasn't ever mapped KWin receives only Map requests and all at once.

THE ISSUE ON KWIN's side is, that for some reason i'm currently looking for, it isn't aware that the client is already created [2], so it creates an enormous amount of client objects for the very same window, which is then unmapped and released *once* - at best ;-)

The "leak" is a natural result of the assumption that an unmap needs to remove one client object, not all matching, which is a false one by this very exploit.

[1] append "XSync(display, False);" to the loop and the "memleak" is magically gone - yes, i can meanwhile reproduce it here :-)
[2] "Client* c = findClient(WindowMatchPredicate(e->xmaprequest.window));" returns NULL

Comment 23 Thomas Lübking 2011-12-03 22:59:53 UTC

Created attachment 66351 [details]
dropping events

Ok, the attached patch drops all maprequest events that occur in a row for the same window (add yourself "qDebug() << idiots;" before the return...)

BUT: that doesn't deal with that "leak"!

The former description was btw. not all correct.
What actually (reproducably) happens are two proper MapRequest / Client::UnmapNotifyEvent() calls and from then on Client::manage() refuses to deal with this client (createClient() fails, the client object should be deleted. I assume that this happens because it has apparently no direct impact on this "leak")

What is however remarkable is, that the patch _can_ hold back the leak for "some" time, ie. sometimes 100 ms, sometimes 100 secs :s

... seems valgrind is unavoidable *sigh*

Comment 24 Thomas Lübking 2011-12-04 00:04:15 UTC

Created attachment 66352 [details]
massif output

Ah, Blast! It's in Xlib, no fun and no cheers for me :(

Attached is a massif report (that's btw. a valgrind tool - see comment #6.
Use ms_print if you want to look into the report, more human compatible)

A typical snapshot looks like:
#-----------
snapshot=81
#-----------
time=3129431873
mem_heap_B=149992025
mem_heap_extra_B=11817175
mem_stacks_B=0
heap_tree=detailed
n2: 149992025 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
 n1: 146925792 0x48FFE72: _XEnq (in /usr/lib/libX11.so.6.3.0)
  n2: 146925792 0x48FC9CA: ??? (in /usr/lib/libX11.so.6.3.0)
   n2: 93518672 0x48FD84E: _XReply (in /usr/lib/libX11.so.6.3.0)
    n2: 93506712 0x48E2506: XGetWindowProperty (in /usr/lib/libX11.so.6.3.0)
     n1: 93481336 0x50D6D38: NETWinInfo::update(unsigned long const*) (netwm.cpp:4073)
      n1: 93481336 0x50D9904: NETWinInfo::event(_XEvent*, unsigned long*, int) (netwm.cpp:3925)
       n1: 93481336 0x408B361: KWin::Client::windowEvent(_XEvent*) (events.cpp:563)
        n1: 93481336 0x408BABF: KWin::Workspace::workspaceEvent(_XEvent*) (events.cpp:291)
         n1: 93481336 0x407CBB0: KWin::Application::x11EventFilter(_XEvent*) (main.cpp:359)
          n1: 93481336 0x596F9A2: ??? (in /usr/lib/libQtGui.so.4.7.4)
           n1: 93481336 0x597CF35: QApplication::x11ProcessEvent(_XEvent*) (in /usr/lib/libQtGui.so.4.7.4)
            n1: 93481336 0x59A8F86: ??? (in /usr/lib/libQtGui.so.4.7.4)
             n1: 93481336 0x566EB6B: QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/libQtCore.so.4.7.4)
              n0: 93481336 0xBE9BE2D2: ???
     n0: 23712 in 26 places, all below massif's threshold (01.00%)
    n0: 11960 in 7 places, all below massif's threshold (01.00%)
   n2: 53403064 0x48FD476: _XEventsQueued (in /usr/lib/libX11.so.6.3.0)
    n1: 53386424 0x48EDD66: XEventsQueued (in /usr/lib/libX11.so.6.3.0)
     n1: 53386424 0x59A9021: ??? (in /usr/lib/libQtGui.so.4.7.4)
      n1: 53386424 0x566EB6B: QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (in /usr/lib/libQtCore.so.4.7.4)
       n0: 53386424 0xBE9BE2D2: ???
    n0: 16640 in 2 places, all below massif's threshold (01.00%)
 n0: 3066233 in 1325 places, all below massif's threshold (01.00%)


================

So it's likely in whatever there is in "???" between _XReply and _XEnq (or _XEnq)

Equipped with this knowledge, i was curious whether the exploit could also harm -say eg.- openbox and guess what:
it sucks up memory even much faster (what is -to be fair- because openbox can deal maprequests much faster than kwin)

Comment 25 Thomas Lübking 2011-12-04 00:06:21 UTC

PS:
memcheck does however not report significant leaks, like maybe 6kb when kwin took ~600MB

Comment 26 Clemens Eisserer 2011-12-04 00:06:58 UTC

will this be reported upstream, so the xcb/xlib devs can work on it?

Comment 27 Thomas Lübking 2011-12-04 01:03:41 UTC

hmmmmm... googled around, look here for a historic document:
http://lists.trolltech.com/qt-interest/1997-08/thread00077-0.html
The mountainview oracle really knows many items on "gg:_XEnq leak" :)

Given that this is (according to local memcheck, i'm willing to grant that kwin leaks some kb itself) no actual leak, i'd rather gather some more information before making a fool out of myself reporting this upstream.

_XEnq obviously hooks events into the applications event queue.
Now there's your little bastard which pushes those events at an enormous pace which no other ("real") client can keep up with, so the queue of unhandled events grows and grows and grows .... that's not nice, but quite expectable.

Two questions remain:
1. why didn't it happen to me (or Jaime) then? (some former protection in X*Map which triggered a sync??)
2. while it's pretty expectable that the event queue grows, why are the events not dealt or XSync(dpy, True)'d afterwards (iff it's the events in the queue which occupy the memory)

Additional information regarding (2)
W/o the event dropper, kwin took a lot of CPU cycles after exploit was killed (presumingly to deal out the event queue) - at least *here* this is gone with the event dropper. KWin/Xlib sucks the memory, but if i don't let it swap out, the machine is calm as soon as i kill the exploit. (confirmation of this difference would be nice, wink-wink)

More additional info (event cookies, apparently there's a cookie jar - could be totally unrelated, though)
http://www.mail-archive.com/xorg-devel@lists.x.org/msg01155.html