Bug 258529

Summary: kde plasma framework completely crashed
Product: [Plasma] kwin Reporter: davidblunkett <dav1dblunk3tt>
Component: compositingAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: agateau
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description davidblunkett 2010-12-02 00:38:42 UTC
Version:           unspecified (using KDE 4.4.4) 
OS:                Linux

I'm not sure exactly what has happened - the panel has gone, the wallpaper has gone, X is still working, kompz is working when I turn it on (alt-shift-f12) but nothing else is running.  I can start applications from the command line ok but the desktop appears absent otherwise.  Strangely whatever happened took out the printing as well...

It seems to stem from the compositing having a bad day, more than 15 windows and one that was scrolling with a lot of text seemed to consume most of the CPU.  I turned off compositing which made things way faster and a minute or so later the panel blinked on and off a few times, restarted with a different colour scheme, the wallpeper blinked on and off and then both disappeared.

Reproducible: Sometimes

Steps to Reproduce:
Hard to reproduce but once you get few windows, especially busy konsole ones and a bit of opengl here and there the compositing tends to fall over after a few hours and things like this often follow but there is no hard and fast rule.  KDE 4 is very flaky

Actual Results:  
KDE gets more and more flaky over time and the compositing gets slower and slower until it is a significant frain on resources then things start crashing

Expected Results:  
kde should be fast and rock solid regardless of the number of windows, how much text is scrolling and how long the computer has been up.

Text scrolling in konsole should never slow the execution of a program in konsole but when compositing is on this is a major factor on speed
Comment 1 Thomas Lübking 2010-12-02 12:45:00 UTC
first of all: please do not collect bugs.
---------------------------------------
you observed that 
a) compositing slows down over the time
b) compositing has a general performance impact on screen updates
b) plasma-desktop crashed.

whatever you might think and w/o having seen a backtrace i'm pretty sure those are unrelated issues. putting them into one report makes it hard to track/deal them.

on the crash:
--------------
a) we need a backtrace to say any specific *
b) you toggled compositing what causes a theme update in plasma-desktop, which then terminated (probably rather segfaulted)
c) version is 4.4.4 -> w/o seing a trace i'd bet that it's the pixmapcache bug
(#182026 or #230170 - there seems a remaining issue)

*http://techbase.kde.org/Development/Tutorials/Debugging/How_to_create_useful_crash_reports

regarding compositing performance:
---------------------------------
if you use opengl compositing every screen update has to be converted from the native X11 format into an opengl texture. this is an expensive routine and depending on your CPU/GPU can have major speed impact.
you may try the XRender backend where such transformations are not required, but w/o detailed information on your GPU and kwin settings it's not possible to provide futher hints.

regarding compositing getting slower over the time:
-------------------------------------------------
this sounds like a memory leak, not necessarily in kwin but maybe the driver/Xorg
once again detailed information on your GPU/driver combination are required and you'll have to use "xrestop" as well as "top" to determine whether and in case where there is a leak.
Comment 2 davidblunkett 2010-12-02 19:40:38 UTC
I'm on a dual core Intel(R) Core(TM)2 CPU 6600 with 8Gb RAM and a GeForce 8800 GTS graphics card.  It is pretty fast and shouldn't be showing a slowdown. It is certainly has acceptable speed initially but grinds after a few hours.  No sign of a memory leak of any significance although disk usage in root seems to grow but I'm not sure where at the moment. 

I doubt it is the graphics driver at fault - the same nvidia driver under kde 3.5 stays fast for long periods of time (ie months of uptime normal cut short by power cuts, kde4 doesn't seem stable over 24hrs by comparison).

As far as further back tracing - well it is too much of a pain in the arse already and everything is too inconsistent and occurs after too long a period to diagnose systematically.  

As regards "don't collect bugs", do you mean you'd just prefer not to hear about bugs except for the easy ones? I don't think quality is served by not reporting.
Comment 3 Thomas Lübking 2010-12-02 20:12:56 UTC
(In reply to comment #2)
> I'm on a dual core Intel(R) Core(TM)2 CPU 6600 with 8Gb RAM and a GeForce 8800
> GTS graphics card.  It is pretty fast and shouldn't be showing a slowdown.

a) GeForce is a trademark as well... :-P
b) 

> It is certainly has acceptable speed initially but grinds after a few hours.  No
> sign of a memory leak of any significance although disk usage in root seems to
> grow but I'm not sure where at the moment. 
there're two different kinds of RAM. if your VRAM starts mapping to disk you're lost anyway.
try to call "nvidia-settings -a PixmapCache=0; nvidia-settings -a PixmapCache=1"
this flushes the cache nvidia uses for ARGB pixmaps.

> I doubt it is the graphics driver at fault - the same nvidia driver under kde
> 3.5 stays fast for long periods of time (ie months of uptime normal cut short
> by power cuts, kde4 doesn't seem stable over 24hrs by comparison).
FYI - kde4 makes completely different calls to the GPU driver than kde3 / kompmgr (was xrender only) or even compiz - that's no argument.

> As far as further back tracing - well it is too much of a pain in the arse
> already and everything is too inconsistent and occurs after too long a period
> to diagnose systematically.  
backtracing a crash requires you to click "details" in the dialog that shows up afterwards. if that's too much i'm sorry.

detecting the source of performance issues is more some kind of bisecting, for the beginning check whether the above mentioned flush does it (be warned that the cache refloods over the time, so you'll have to redo the call) the next step would be to identify a causing effect plugin.

> As regards "don't collect bugs", do you mean you'd just prefer not to hear
> about bugs except for the easy ones?

"don't collect bugs" means: please do not combine different issues in one report, it makes tracking unnecessarily hard and i thought the follow-up lines would have explained that.

> I don't think quality is served by not reporting.
actually i know nobody who'd even remotely think such nonsense.
welcome to the club then.
Comment 4 davidblunkett 2010-12-03 09:17:03 UTC
"backtracing a crash requires you to click "details" in the dialog that shows up
afterwards. if that's too much i'm sorry."

There was no dialog - nothing that resembled any reporting or crash handling occurred.  The panel blinked on and of and then disappeared then the wallpaper did the same. If there had of been any dialog then I'd have collected and reported the data but there wasn't.

As far as I know this is a single issue - it occurred at a single time and the compositing observation prior to the crash seem related - I have had similar problems before but it has never crashed like this without the compositing grinding to a halt first.

I'm not sure what you are on about ram / vram - but there is 8Gb RAM installed and only a small swap. In this case the point is there was plenty of RAM and no swapping going on so memory starvation is unlikely the cause even if there was a memory leak.

BTW I don't see a problem using trademarks here - it is after all what the GPU reports.
Comment 5 Thomas Lübking 2010-12-03 14:34:22 UTC
(In reply to comment #4)
> I'm not sure what you are on about ram / vram
VideoRAM - stuff on your GPU. As soon as data gets mapped from there to the system RAM (the 8GB) you'll notice a significant performance loss.
To monitor the current memory consumption of X11 you'll therefore have to use "xrestop" since "top" doesn't reflect the non system RAM usage. The OpenGL textures reside in the same physical RAM but are logically separated away.ö
As soon as the VRAM is however full, the GPU should start mapping memory to sysram (so you'll sooner or later notice such leaks in "top" as well)

> In this case the point is there was plenty of RAM and no
> swapping going on so memory starvation is unlikely the cause even if there was
> a memory leak.
Actually the dialog-less crash points different. The kernel might have shot the process to free memory.
Notice that pixmap data is pretty large and for a high update rate, a single leak can grow incredibly fast.

In case you believe the leak is in kwin compositing (directly or through some driver/lib) you should start out by disabling all effects (but keep compositing enabled) and check wheter the issue remains.
- If yes, try to disable trilinear filtering. (advanced tab)
- If not, you'll have to determine the troublemaker.
You can use the "show paint" plugin to figure whether you've constant screen updates in some region (though the content doesn't update)
This could hint the general slowdown cause.
Good candidates (aside TLF) for leaks are (persistant) shader effects - since 4.4 had no blur this would esp. be "sharpen" (which -in case you do- you shouldn't use anyway since the nvidia driver can do this far better internally)

> BTW I don't see a problem using trademarks here - it is after all what the GPU
> reports.
I was just making fun out of the "Intel(R) Core(TM)2" and the fact that you didn't grant nvidia this notes and that nvidia and intel don't like each other too much, sorry ;-)
Comment 6 davidblunkett 2010-12-18 19:07:55 UTC
This problem appears due to /tmp overfilling - it has been hard to diagnose because a reboot or emergency filesystem cleanup has been required.  However in the last incidence it was observed that /tmp filled because of a very large /tmp/qt_temp.XX1234 file.  

I think, but am not certain, that this might be the work of gwenview - mostly these are temporary but in this case /tmp overfilled due to a >8Gb in size (this is a mystery because there is nothing I was doing with gwenview that involved any file remotely this size).
Comment 7 Thomas Lübking 2010-12-18 22:04:48 UTC
(In reply to comment #6)
> /tmp/qt_temp.XX1234 file.

*sigh* - those files are usually created by kio, could be anything then...
wild shot given the size: do you run nepomuk/strigi?

-> you could add some sort of cronjob/minidaemon to check all /tmp/qt* file sizes and yell a warning if one gets quite huge, invoking losof|grep to figure what has access to it (best dump that info into a log, in case you loose your system the very next moment)
Comment 8 davidblunkett 2010-12-19 00:27:48 UTC
Neponuk / strigi - I removed this a long time ago.

I've been monitoring /tmp all day and done everything I can remember on the crash days and apart from gwenview (which creates and removes these files while converting formats for display) I've seen nothing yet.

I'll keep looking and let you know if I catch something.
Comment 9 davidblunkett 2010-12-20 01:55:09 UTC
Well no crash so far... I'm catching all qt_temp > 1Mb that exist for >1s and the only process that makes big files seems to be gwenview and gs (which I guess gwenview calls to render postscript).  Might it be possible that a something going wrong here could lead to a run away temp file?
Comment 10 Thomas Lübking 2010-12-20 14:28:53 UTC
adding gwenview maintainer:

@Aurelien:
please check comments #6 - #9
anything familiar? any idea?
Comment 11 Aurelien Gateau 2010-12-28 00:12:13 UTC
(In reply to comment #10)
> adding gwenview maintainer:
> 
> @Aurelien:
> please check comments #6 - #9
> anything familiar? any idea?

Gwenview relies on Qt image format plugins to decode images (or in this case .ps files). The bug probably resides in kimg_eps.
Comment 12 davidblunkett 2011-01-24 15:04:22 UTC
As a general comment in addition to the crash caused by /tmp filling I orignally experienced a slowing down and grinding to a hlat of the desktop especially with compositing turn on.  While the crash will cause the computer generally to grind to a halt the general slowing does seem to be an additional problem as I get this without /tmp filling. 

I've tried the suggested "nvidia-settings -a PixmapCache=0; nvidia-settings -a PixmapCache=1" which has a beneficial effect <sometimes> but even when used every few minutes I can still experience very poor compositing performance and compositing turning itself off.
Comment 13 davidblunkett 2011-08-23 08:00:27 UTC
This no longer occurs for me in kde 4.6