Bug 213669 - Significant CPU penalty for using kwin effects - CPU usage drops significantly when using compiz
Summary: Significant CPU penalty for using kwin effects - CPU usage drops significantl...
Status: CLOSED FIXED
Alias: None
Product: kwin
Classification: Plasma
Component: compositing (show other bugs)
Version: 4.3.2
Platform: Fedora RPMs Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-08 12:21 UTC by Björn Ruberg
Modified: 2018-03-25 19:08 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
sysprof profile with kwin running without dri and with shared memory (274.21 KB, application/octet-stream)
2010-03-13 12:54 UTC, Björn Ruberg
Details
sysprof profile with compiz running - video runs inside of me-tv (125.83 KB, application/octet-stream)
2010-03-15 00:03 UTC, Björn Ruberg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Björn Ruberg 2009-11-08 12:21:35 UTC
Version:            (using KDE 4.3.2)
Compiler:          gcc 4.4.2 
OS:                Linux
Installed from:    Fedora RPMs

(Probably this is known, but as I read more and more articles about enhancing kwin performance, I want to remember that there is still improvement needed)

I'm using kwin on i965 and i950. Performance has always been much worse than with compiz, but it got acceptable. But I have issues with video playback.

When watching dvb-t X and kwin produce a CPU load around 20% with desktop effects activated. Without effects the load is 10% (as it is when I use kwin). The desktop effects are slowed down a little - biggest issue here is that the video stutters as soon as there is an animation done.
Randomly (but much to often) the CPU-load suddenly climbs to 40% and stays there. Much of the load comes from X and kwin. By deactivating effects I can bring it back to 10%.

The situation gets more worse as soon as I use a multihead configuration. That brings up the load to above 50% (one core is fully used) and the desktop gets too slow to work.

All this is not happening when using compiz. I'm running Fedora 12 - but this is a problem I've seen since KDE 4.1
Comment 1 Martin Flöser 2009-11-08 13:24:43 UTC
are you watching fullscreen videos or videos in windows?
Comment 2 Björn Ruberg 2009-11-08 13:30:47 UTC
Both. Fullscreen or not does not make a difference. Just the size. A small video runs a little smoother than a big one.
Comment 3 Martin Flöser 2009-11-08 13:48:50 UTC
And in Compiz it's smooth with both fullscreen and window? Or is the performance better when watching fullscreen? Btw I expect you use the same video application when using Compiz or KWin, right?
Comment 4 Björn Ruberg 2009-11-08 13:58:02 UTC
Yes, on compiz the CPU load is the same when having fullscreen or windowed video. 
I use meTV with both window managers. (Had kaffeine running too, same problem).
Comment 5 Björn Ruberg 2010-02-28 01:05:23 UTC
I tried in KDE 4.4 and see the situation much improved. The performance is better now. 

The difference between activated effects and deactivated is: When having effects on there is less then 10% cpu usage caused by kwin. After shutting off effects this cpu usage by kwin is gone.

If this is normal, this can be closed.
Comment 6 Björn Ruberg 2010-03-02 13:19:20 UTC
Forget it. Just watching TV and X fully utilizes a CPU core when having kwin effects on. Deactivating effects reduces load to normal.
Comment 7 Thomas Lübking 2010-03-02 16:47:49 UTC
in ~/.kde/share/config/kwinrc, [Compositing] section, try setting
UnredirectFullscreen=true
then call "qdbus org.kde.kwin /KWin reconfigure" to apply changes.

this should improve the cpu load when playing fullscreen (i.e. also now windows on top) and at least used to be the default.

the load derives from the pixmap -> texture conversion (try the XRender backend, the cpu load should lower dramatically)
KWin knows 3 strategies for this, their performance depends on the systems architecture

in the advanced tab switch between "Texture From Pixmap" and "Shared Memory" (in your case the latter is likely better)

while at it, try the impact of toggling "Enable direct rendering", don't use trilinear filtering on this chip.

last: your KDE is a little dated and there've been bugs in some plugins that caused infinite repaints under certain circumstances.

to be sure, disable:
- minimize
- magic lamp
- slide back
- wobbly windows
Comment 8 Björn Ruberg 2010-03-02 21:49:04 UTC
Thomas, many thanks for looking at this. I'm currently running KDE 4.4.0 - not 4.3.2 anymore.

I go through your advises:

>in ~/.kde/share/config/kwinrc, [Compositing] section, try setting
>UnredirectFullscreen=true
>then call "qdbus org.kde.kwin /KWin reconfigure" to apply changes.

I had not compositing section - I added what you wrote and did the qdbus command. Nothing changed. 

> the load derives from the pixmap -> texture conversion (try the XRender
> backend, the cpu load should lower dramatically)

Right, switch to XRender helped lowering the cpu load. But graphic effects are generally slower using that extension, so it is no option.

> in the advanced tab switch between "Texture From Pixmap" and "Shared Memory"
> (in your case the latter is likely better)

I tried all three, but it didn't matter. CPU load remained high.

> while at it, try the impact of toggling "Enable direct rendering", don't use
> trilinear filtering on this chip.

DRI had no impact. I use "nearest" as filter method.

>last: your KDE is a little dated and there've been bugs in some plugins that
>caused infinite repaints under certain circumstances.

Probably not a problem in 4.4.0 anymore. But I deactivated them all and it didn't matter.
Comment 9 Thomas Lübking 2010-03-02 22:12:44 UTC
> I had not compositing section - I added what you wrote and did the qdbus
command. Nothing changed.

after changing various settings, check that you've it now - and several entries, otherwise you might face conflicting config dirs (~/.kde vs ~/.kde4 is pretty common)

However, unredirection should lower the X11/KWin cpu load.
Maybe meTV provides no real fullscreen mode. try e.g. mplayer and press "f"

Last resort: before playing fullscreen videos, press alt+shift+f12 to suspend compositing (and again to resume)
You could also automatize this via a script when starting meTV

does the "random" cpu load inc remain?
Comment 10 Björn Ruberg 2010-03-02 22:25:05 UTC
Hah, you are right! My config directory was .kde4, not .kde. UnredirectFullscreen=true indeed helped my to some point. The CPU usage of Xorg dropped from 50% to 30%. But that is still not perfect - deactivating effects make it drop to around 10%.

To make clear, I'm not watching in fullscreen at the moment, but have a tv-window above all others in the moment. If I log out of X and back again the X load would probably decrease - but just for some time, it goes back to this very high values after some time.
Comment 11 Björn Ruberg 2010-03-09 21:59:47 UTC
This is far from fixed. I just had 60% X usage (on a dual core!) - and deactivating effects made it drop as usal to 9%. UnredirectFullscreen=true is still part of my config.
Comment 12 Thomas Lübking 2010-03-09 22:43:49 UTC
well, sorry to say but video playback is stressfull the way X11 composite is designed - if i don't unredirect windows my cpu load is equally HIGH with either compiz or kwin (actually compiz needs a little more due to the indirect rendering)
your dual core will hardly help you, afaik X runs this single threaded

the unredirection can only take place for _real_ fullscreen mode, there must be no other window on top and the window must not be translucent.

in some sort of shameless self advertisement i'd ask you to download sth. called "beclock" from kde-look (it's a clock implemented as kwin composite plugin)
simple rule: if you can see the clock, you're not unredirecting.

if you are unredirecting, it's very unlikely that kwin causes any extra cpu load except for a plugin or glib running wild
(add "export QT_NO_GLIB=1" to ~/.xprofile, log in and out - glib sometimes keeps polling, you could also just run
QT_NO_GLIB=1 kwin --replace &
from a terminal)
Comment 13 Björn Ruberg 2010-03-09 22:47:33 UTC
I don't run fullscreen video and don't want to. Does using your clock still make sense then?

I stated in my inital report that compiz is much much better than kwin here. It makes 20% X load on the CPU. That is much better than kwin - and that's why I'm reporting this bug for kwin.
Comment 14 Thomas Lübking 2010-03-09 23:03:53 UTC
(In reply to comment #13)
> I don't run fullscreen video and don't want to. Does using your clock still
> make sense then?
nope.

> I stated in my inital report that compiz is much much better than kwin here. It
> makes 20% X load on the CPU.
What about the compiz load compared to the kwin load then (playing with the settings and nvidia driver option, i can shift the load between WM and X11 nearby at will)

> That is much better than kwin - and that's why I'm reporting this bug for kwin.
Last guess: compiz has some video playback plugin providing yuv12 - i've no idea what it does (apparently nothing here) buta
a) do you have it enabled
b) is it related?
Comment 15 Martin Flöser 2010-03-09 23:16:08 UTC
It's unfortunately much driver related. I saw it this week when updating my NVIDIA driver lowered the CPU usage for video watching significantly (X + kwin around 10 %, before it was > 20). I know it sound very much like "other peoples fault", but it's just that Compiz is much longer around and is a kind of regression test for driver developers. Most of this can only be done in the driver (espacially if you mention CPU usage of X). It's not kwin's fault if we uncover bugs in other parts of the stack.
Comment 16 Björn Ruberg 2010-03-09 23:50:02 UTC
So this has to be reported to the x11-intel-developers?
Comment 17 Fredrik Höglund 2010-03-10 00:30:55 UTC
Use sysprof to measure where the problem is. Guessing and speculating about it is pointless.
Comment 18 Björn Ruberg 2010-03-11 22:22:11 UTC
Okay, run sysprof. It mostly told me that playing a video with compiz is using 20 to 30% less CPU power. So far it was known.

There are several components causing this. There is ioctl - it uses about 10% more with kwin. It is obvious that driver libaries are using much more CPU power. libdrm_intel.so is draining 7% more CPU. libdri_core.so and i965_dri.so are adding 3 to 5% too. All that is not there in compiz.

But there is one process less used in kwin than in compiz. Memcpy. It uses 1.5% CPU with kwin and 3% with compiz. Could that tell something?

Compiz effects are much smoother by the way.
Comment 19 Martin Flöser 2010-03-11 22:33:57 UTC
I see two possible reasons:
1. the driver is optimized for Compiz. This is likely as Compiz was the first OpenGL based window manager.
2. you use different settings in Compiz and kwin. So please compare the settings for direct rendering, texture from pixmap etc. In case you activate direct rendering it might not be enough to just change the setting, but you have to restart kwin. In case of questions, Thomas is most often able to answer them ;-)
Comment 20 Thomas Lübking 2010-03-11 23:48:24 UTC
just to re-ensure:
all the mentioned cpu load (whether on the kernel or the driver) is induced by X11, right? we're not talking about kwin calls.

also (just to ensure as you didn't explicitly answer) the combined load (X11 + WM) is significantly higher with KWin, yes?

and:
you're running under an otherwise equal environment (i.e. starting compiz in a full kde session or kwin or compiz from a blank X server)
there's also no other terminal / debug output (of eg. the videoplayer - mplayer is extremely talkie and makes konsole stress the X server with output updates) in use.

finally: you're actually redirecing under both WMs (unlike kwin, compiz _can_ unredirect non fullscreen windows, they can just no more be translucent, _not_ on top, etc. then. to rule this out, try playing the movie with a windowalpha < 1.0)

given we have such fair test and running KWin causes more cpu load on X11 than compiz, there're three posibilities:

1) the intel driver does the quake optimization
2) compiz detects the driver and has better internal settings management (due to age and usebase)
3) compiz avoids (in this driver) expensive calls and kwin does not

3) 
- libdri and libdrm are the direct rendering libraries, they're most likely not in use with compiz as you will probably use indirect rendering there*
-> if dri is overly expensive, avoid it by not using direct rendering in kwin**
- the difference on iotcl can indicate that the driver uses memory mapping instead for compiz, because either 
a) you're running tfp on one and shm on the other** (uuunfaair... ;-) or 
b) this is due to the (different) way compiz and kwin initialize the gl context (no idea about that atm) or:
c) intel uses the quake trick -> 1)

1)
a very simple solution can be to rename "compiz" to "compiz.real", "kwin" to "compiz" and run that (BE WARNED, i've never tried it and kwin/kde actually might not like it, resulting in a segfault - should not, but could)
Though i heavily doubt the intel driver does such, detecting runnning processes was (and probably is) very common for GPU vendors to "optimize" for various games. the result is that one could speed up many "minor" games by renaming the executable to "quake.exe" :-\

2)
were the (for you) unfortunate reason, as it means you must run around, test settings and then can pass your observations upstream - or wait for somebody else with such chip to come around...

*you must run compiz w/o --indirect-rendering and not use effecs like (static) blurring. also the "smoothness" is a good hint for indirect rendering, though things will start to stick on high external cpu load, what's ideally not the case with direct rendering

**i do recall that you mentioned neither the texture generation nor the dri setting had  significant impact, but we don't have a usefull comparism w/o equal paths.
Comment 21 Björn Ruberg 2010-03-12 12:17:58 UTC
I can ensure what you stated. The bad CPU-load comes from the X process. But I test the WMs on the same session starting them from command line. It makes no difference to the CPU load whether the video window is transparent or not. It makes no difference for kwin either, whether I select "Direct Rendering enabled" or not. CPU usage is the same from my observation.

I further investigated now and can say that the problem cannot be reduced to video playback. When I start a fresh KDE session and play TV-video, the CPU usage with kwin effects is 5% higher than when using compiz.

But the very high cpu usages are not just caused by video. I now see that it is my browser (chromium now, but had the same sympthoms with firefox earlier) which makes X drain 13% CPU. When I switch off kwin effects or change to compiz that immediatly drops to 8%. 

I oberserved the following rule: The more windows I have open, the bigger the CPU penalty is for using kwin effects. Deactivating effects or switching to compiz (with its smooth and fast effects :) ) makes CPU usage always drop significantly.

One more overservation: If I restart kwin in a running KDE session, its CPU usage is higher then before. It is not cumulating, but the kwin instance started with KDE is always using less CPU.

And I tested for the "quake"-optimization. I cannot see a difference in X11-CPU usage no matter whether kwin is started as "kwin" or "compiz"
Comment 22 Thomas Lübking 2010-03-12 17:10:07 UTC
(In reply to comment #21)
> But the very high cpu usages are not just caused by video. I now see that it is
> my browser (chromium now, but had the same sympthoms with firefox earlier)
> which makes X drain 13% CPU
The source of the pixmap/texture conversion is irrelevant. (in the case of a browser i suspect there's some flash or animated gif to trigger constant updates, yesno? very fast terminal updates will cause the same problem)

could you run another sysprof (now that you're experienced ;-) for kwin without direct rendering and using SHM (supposing you used TFP before) and check the calls (notably the dri libs should no more be in use and maybe the ioctls are replaced by memory mappings - you had no significant ioctl with compiz at all, right? - so sth. else must cause the CPU load)

> I oberserved the following rule: The more windows I have open, the bigger the
> CPU penalty is for using kwin effects.
of course the costs scale with the amount of mapped windows. overscale in kwin could be induced by
a) some plugin (notably persistent as e.g. translucency or the shadows - esp. translucency will prevent clipping and large shadows extend the clip region)
b) the indirect deco rendering*

to rule out the latter you'd have to unborder all windows from the alt+f3 menu, "advanced" submenu.
as this is about (fast) pixmap reallocation, you should watch the output of xrestop for peaks during e.g. video playback.

> One more overservation: If I restart kwin in a running KDE session, its CPU
> usage is higher then before. It is not cumulating, but the kwin instance
> started with KDE is always using less CPU.
notice that compiz usually start a decoration subprocess (emerald, kde4-window-decorator) that is _not_ ended when replacing compiz (and could continue causing load on X11)

> And I tested for the "quake"-optimization. I cannot see a difference in X11-CPU
was just a shot in the dark - as mentioned: that's very bad style and hopefully not used in an OSS driver =D

one final thing about your testsuite: (just had another look at the OP)
do you test pure video playback or videoplayback during some (a particular) animation?

*some "cludge" to provide compositing as well as non compositing while at least emerald just paints the decos into the gl context, we shall get rid of it ;-)
Comment 23 Björn Ruberg 2010-03-13 12:54:22 UTC
Created attachment 41586 [details]
sysprof profile with kwin running without dri and with shared memory
Comment 24 Björn Ruberg 2010-03-13 12:54:43 UTC
I switched several times between tfp, shm and alternative, never had any effect. I now did run a sysprof without DRI and Shm. I simply attach its output here.

I tried to turn off ALL plugins - but that did not lower the cpu usage.

I tried to undecorate all windows as you suggested - but that did not lower CPU usage.

My video playback is TV drom a dvb-t stick.
Comment 25 Thomas Lübking 2010-03-14 15:12:13 UTC
X spends all time in libdrm and apparently for buffer allocation, notably flushing....???!

I assume that compiz (has a workaround and) just reuses textures for equal size updates (like the entire video window ;-) to avoid this, could you post a sysprof for compiz as well?
Comment 26 Björn Ruberg 2010-03-15 00:03:55 UTC
Created attachment 41639 [details]
sysprof profile with compiz running - video runs inside of me-tv
Comment 27 Thomas Lübking 2010-03-15 22:15:33 UTC
hummm, X spends all time in libdrm and apparently for buffer allocation, notably
flushing (tm) with compiz as well - just far less often (as i've to assume to be the reason for the lower cpu usage)

so

a) this seems to be a hog, watch out for libdrm updates (for better performance in general)
b) we need to check whether we can avoid re/allocations
Comment 28 Björn Ruberg 2010-03-16 16:01:18 UTC
What to do now?

I tried to investigate on which hardware and software this problem occours. Test-Setup: Watching TV in kaffeine under an KDE 4.4 session.

This bug is about my Latitude d630 with i965 gpu running Fedora 12. I have tried a Netbook with i950 chipset running Fedora 12 too. 
The problem exists there too, but different. X ist using 7% more CPU power when running kwin in comparison with compiz. This additional load comes from libglx.so 

On an older FSC laptop with the old i915 chipset I did not have this problem. compiz and kwin load on the cpu was the same. Actually compiz caused 10% more usage on X11 than kwin - but kwin itself used 10% cpu. All together both were the same. The kwin effects were smoother on that machine than on the newer chips too. But this on was running Ubuntu 9.10 with KDE 4.4

Fedora 12 and Ubuntu 9.10 both use intel driver 2.9. But Fedora has kernel 2.6.32 and xorg-server 1.7.5, while Ubuntu uses kernel 2.6.31 and xorg-server 1.6.5

Will try to get a comparison of Ubuntu 9.10 and Fedora 12 on the same hardware, if it is useful.
Comment 29 Alexander 2010-05-21 20:46:26 UTC
Hi, I have the same problem. VLC loads the CPU at 5-7% but kwin and X at 15-20%. After disabling effects VLC loads the CPU on the same 5% and X at 1-2%.

Arch Linux (x86_64) with latest updates on AMD and nvidia 9500GT with proprietary driver. Maybe it could be related to https://bugs.kde.org/show_bug.cgi?id=234463 ?
Comment 30 Björn Ruberg 2010-05-28 22:22:22 UTC
I noticed a significant improvement be:
- deactivating vertical synching
- updating to Fedora 13 and intel-driver 2.11 

Effects are much faster now either. I close this. Thanks for your attention!
Comment 31 Alexander 2010-05-29 07:41:12 UTC
Hey, and what about me? I have a different system, but problem still there. Vertical syncing already deactivated, but Kwin and X have stable CPU usage at 15-20%, when watching a video in VLC or Mplayer.
Comment 32 Björn Ruberg 2010-05-29 07:57:43 UTC
Just for giving the devs a chance to get a overview, you should open a new bug providing all the information I was asked for here.
Comment 33 Martin Flöser 2018-03-25 19:00:39 UTC
There's no reason to close such old bugs. In general we don't close bugs.
Comment 34 Gregor Mi 2018-03-25 19:08:26 UTC
(In reply to Martin Flöser from comment #33)
> There's no reason to close such old bugs. In general we don't close bugs.

Oh. I didn't know. Thanks for the hint. So RESOLVED and CLOSED mean the same in the KDE bugtracker?