Bug 261323

Summary: EffectFrame can freeze desktop on NVIDIA
Product: [Plasma] kwin Reporter: Peacey <peaceyall>
Component: scene-openglAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED FIXED    
Severity: crash CC: adaptee, anish.7, barafu, bugz57, burn.till.skid, conardcox, courteville, dmitchgm, e.lex, erik.dobak, gbin, gpiez, jorge.adriano, KaiUweBroulik2, nils, perrantrevan, rm, steven.v.bael, taril_laszlo, thilo
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Archlinux   
OS: Linux   
Latest Commit: Version Fixed In: 4.6.3
Attachments: Backtrace for comment #33, option 2
nvidia-settings -q all
top -bid 1 > ~/cpu_use.log
top -bid 1 > ~/cpu_use.log
Flush buffer before delete of pixmaps

Description Peacey 2010-12-26 22:28:39 UTC
Version:           unspecified (using KDE 4.5.90) 
OS:                Linux

After activating flip switch and choosing a window, the desktop becomes unresponsive for a while. After some time, the animation starts ending very slowly. Even after the animation ends, the desktop becomes extremely slow that it is unusable.

This doesn't happen when I am cycling through windows in Flip Switch, only after you select a window. Also, this happens with Flip Switch only. It doesn't happen with any other effect.

The only way to fix this is to switch to vt1 by ctrl+alt+f1 then restarting kde.

Reproducible: Always

Steps to Reproduce:
Initiate Flip Switch. Choose a window.

Actual Results:  
The effect starts to end and the desktop becomes unresponsive for a while. Then the desktop becomes extremely slow.

Expected Results:  
The effect ends smoothly and the desktop is responsive.

Using NVIDIA 260.19.29
Comment 1 bill p. (aka google01103) 2011-01-06 21:50:56 UTC
ditto (?) openSuse 11.3 64bit Using NVIDIA 260.19.29

my entire system froze as soon as I hit ctrl+tab and I had to use the power button to reboot
Comment 2 Martin Flöser 2011-01-13 21:01:18 UTC
I am able to reproduce with fglrx (which unblaims the driver). It only happens for me if the system is currently on "Show desktop". So
1. Use Flip Switch to Show Desktop (works fine)
2. Use Flip Switch again and select the desktop (breaks)

This might be related to the fact that all windows are unminimized at that moment which causes the hang.
Comment 3 Martin Flöser 2011-01-13 21:02:42 UTC
I can also reproduce with box switch, which validates the "caused by unminimizing" hypothesis
Comment 4 Martin Flöser 2011-01-18 22:30:51 UTC
*** Bug 261069 has been marked as a duplicate of this bug. ***
Comment 5 Jekyll Wu 2011-01-27 09:31:53 UTC
I can reproduce this with flip switch in 4.6, which makse mouse, keyboard and screen totally unresponsive.

This happens only after you finally select one window.
Comment 6 Guillaume BINET 2011-01-27 18:14:09 UTC
I just reproduced at 100% it on 3 different machines on 4.6 on gentoo / 3 different nvidia cards: 
- activate flip switch as effect for window switching
- maintain alt-tab for several seconds
- release alt-tab

-> locks and flickings will appear
Comment 7 Thomas Lübking 2011-01-27 22:46:43 UTC
not here :-(
can you try to disable all other effects and only keep flipping active?
what about a not animated decoration (try eg. kde2 or laptop)
Comment 8 Guillaume BINET 2011-01-28 17:06:36 UTC
(In reply to comment #7)
> not here :-(
> can you try to disable all other effects and only keep flipping active?
> what about a not animated decoration (try eg. kde2 or laptop)

Just tried it with laptop theme and only this effect activated.
It still locks.
Just before the windows gets back at there final place (you can see the windows frozen with a little perspective on them)
Comment 9 Thomas Lübking 2011-01-30 00:54:37 UTC
*** Bug 264704 has been marked as a duplicate of this bug. ***
Comment 10 Martin Flöser 2011-01-30 12:32:11 UTC
*** Bug 264836 has been marked as a duplicate of this bug. ***
Comment 11 Thomas Lübking 2011-02-01 21:54:12 UTC
*** Bug 265116 has been marked as a duplicate of this bug. ***
Comment 12 Anish Bhatt 2011-02-02 00:19:07 UTC
If it helps, the nvidia module drops this in dmesg every time it happens 
Jan 28 00:05:43 cerveza kernel: NVRM: Xid (0001:00): 13, 0003 00000000 00008597 00001b0c 1000f010 00000040
Comment 13 Anish Bhatt 2011-02-02 01:29:59 UTC
May not be related, but a little googling shows that in the past users have gotten rid of the Xid messages by changing the gtk theme used. (Not using qtcurve for example)
Comment 14 Anish Bhatt 2011-02-22 01:17:13 UTC
Still persists with nvidia 260.19.36, seen with the nvidia-beta (270.26) driver as well.
Comment 15 Martin Flöser 2011-02-27 12:27:17 UTC
I am able to reproduce this issue with my NVIDIA system and just attached gdb 
to the hung process:
#0  0x00007f2af2e20650 in ?? () from /usr/lib/libGLcore.so.1
#1  0x00007f2af2e2079b in ?? () from /usr/lib/libGLcore.so.1
#2  0x00007f2af2b09b2f in ?? () from /usr/lib/libGLcore.so.1
#3  0x00007f2af26fce06 in ?? () from /usr/lib/libGLcore.so.1
#4  0x00007f2b0066d467 in KWin::SceneOpenGL::flushBuffer (this=0x1eddee0, 
mask=0, damage=...)
    at /opt/kde/src/KDE/git/kde-workspace/kwin/scene_opengl_glx.cpp:627
#5  0x00007f2b0066cf68 in KWin::SceneOpenGL::paint (this=0x1eddee0, 
damage=..., toplevels=...)
    at /opt/kde/src/KDE/git/kde-workspace/kwin/scene_opengl_glx.cpp:552
#6  0x00007f2b00654b16 in KWin::Workspace::performCompositing (this=0x1ef8cd0) 
at /opt/kde/src/KDE/git/kde-workspace/kwin/composite.cpp:419
#7  0x00007f2b00654628 in KWin::Workspace::timerEvent (this=0x1ef8cd0, 
te=0x7fff7bc44900) at /opt/kde/src/KDE/git/kde-
workspace/kwin/composite.cpp:367
#8  0x00007f2afdd4ad69 in QObject::event (this=0x1ef8cd0, e=0x1a) at 
kernel/qobject.cpp:1183
#9  0x00007f2afce82a8c in QApplicationPrivate::notify_helper (this=0x1e01040, 
receiver=0x1ef8cd0, e=0x7fff7bc44900) at kernel/qapplication.cpp:4396
#10 0x00007f2afce8862d in QApplication::notify (this=0x7fff7bc44cf0, 
receiver=0x1ef8cd0, e=0x7fff7bc44900) at kernel/qapplication.cpp:4277
#11 0x00007f2afeaf7479 in KApplication::notify (this=0x7fff7bc44cf0, 
receiver=0x1ef8cd0, event=0x7fff7bc44900)
    at /opt/kde/src/KDE/git/kdelibs/kdeui/kernel/kapplication.cpp:311
#12 0x00007f2b005db924 in KWin::Application::notify (this=0x7fff7bc44cf0, 
o=0x1ef8cd0, e=0x7fff7bc44900) at /opt/kde/src/KDE/git/kde-
workspace/kwin/main.cpp:364
#13 0x00007f2afdd39a0c in QCoreApplication::notifyInternal 
(this=0x7fff7bc44cf0, receiver=0x1ef8cd0, event=0x7fff7bc44900) at 
kernel/qcoreapplication.cpp:732
#14 0x00007f2afdd68cf2 in sendEvent (this=0x1e01a50) at 
../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:215
#15 QTimerInfoList::activateTimers (this=0x1e01a50) at 
kernel/qeventdispatcher_unix.cpp:602
#16 0x00007f2afdd68e2c in QEventDispatcherUNIX::processEvents (this=0x1ddd7f0, 
flags=DWARF-2 expression error: DW_OP_reg operations must be used either alone 
or in conjuction with DW_OP_piece or DW_OP_bit_piece.
) at kernel/qeventdispatcher_unix.cpp:923
#17 0x00007f2afcf35a2d in QEventDispatcherX11::processEvents (this=<value 
optimized out>, flags=DWARF-2 expression error: DW_OP_reg operations must be 
used either alone or in conjuction with DW_OP_piece or DW_OP_bit_piece.
) at kernel/qeventdispatcher_x11.cpp:152
#18 0x00007f2afdd38732 in QEventLoop::processEvents (this=<value optimized 
out>, flags=DWARF-2 expression error: DW_OP_reg operations must be used either 
alone or in conjuction with DW_OP_piece or DW_OP_bit_piece.
) at kernel/qeventloop.cpp:149
#19 0x00007f2afdd38b1c in QEventLoop::exec (this=0x7fff7bc44c20, flags=DWARF-2 
expression error: DW_OP_reg operations must be used either alone or in 
conjuction with DW_OP_piece or DW_OP_bit_piece.
) at kernel/qeventloop.cpp:201
#20 0x00007f2afdd3cbbb in QCoreApplication::exec () at 
kernel/qcoreapplication.cpp:1009
#21 0x00007f2b005dca10 in kdemain (argc=2, argv=0x7fff7bc45328) at 
/opt/kde/src/KDE/git/kde-workspace/kwin/main.cpp:522
#22 0x0000000000400956 in main (argc=2, argv=0x7fff7bc45328) at 
/opt/kde/build/KDE/git/kde-workspace/kwin/kwin_dummy.cpp:3

It seems to be stuck in an endless loop caused by flushBuffer.
Comment 16 Thomas Lübking 2011-02-27 14:24:20 UTC
tried to replace glXWitGL with glFinish or at least prepend a glFlush?
Comment 17 Martin Flöser 2011-02-27 16:01:48 UTC
On Sunday 27 February 2011 14:24:21 Thomas Lübking wrote:
> tried to replace glXWitGL with glFinish or at least prepend a
> glFlush?
just tried both without success
Comment 18 Dan Mitchell 2011-03-05 07:38:31 UTC
I have the same problem.  I'm running gentoo linux with KDE 4.6.  I get the same errors as Anish, and experience a "hard freeze" that requires me to turn off my PSU to reboot.  I can reproduce it in about 30 seconds from boot, and is definitely the cube causing the problem.  Nvidia-drivers default portage emerge.  

Sometimes ctrl-alt-f1 can be done before a total system hang.
Comment 19 Dan Mitchell 2011-03-05 07:52:21 UTC
Sorry my post was meant for but 266182.
Comment 20 perrantrevan 2011-03-05 14:37:19 UTC
I have a similar problem but it is triggered by the desktop cube effect. The effect launches ok and I can rotate the cube, however, when exiting the effect the slowness and lock-up happens.

This is reproducible with nvidia 260 and 270 drivers.
Comment 21 Thomas Lübking 2011-03-05 17:23:17 UTC
@perrantrevan:
see bug #266182

@Dan:
i'm  increasingly confident that these are dupes...
Comment 22 Martin Flöser 2011-03-06 09:16:41 UTC
If we don't get it fixed till 4.6.2 I'm going to disable Flip Switch and Cube 
for NVIDIA users.
Comment 23 Anish Bhatt 2011-03-09 06:36:07 UTC
That seems like a wrong approach, especially considering this problem didn't exist in 4.5.
Couldn't the code simply be reverted to the working copy from 4.5 ? For me at least the problem started on the 4.5->4.6 upgrade, not an nvidia upgrade
Comment 24 Martin Flöser 2011-03-09 07:08:02 UTC
If it were that simple... FlipSwitch has hardly seen any changes in 4.6. Most changes are in the underlying stack and with the changes after 4.6 there a much more than 100 commits to pick from. The reason for the misbehaviour must be found - if we cannot find it we must do compromises.
Comment 25 Thilo-Alexander Ginkel 2011-03-10 23:55:16 UTC
On my system (KDE 4.6.1 using the nvidia driver version 260.19.06), disabling direct rendering in the system settings works around the issue.

I am not sure whether that is related, but even with direct rendering disabled I can still trigger video corruption and 
  NVRM: Xid (0001:00): 13, 0004 00000000 00008297 00001514 00000000 00000040
log entries when selecting "System Settings" -> "Window Behavior" -> "Task Switcher" -> "Apply" (with the effect set to "Flip Switch").
Comment 26 Thomas Lübking 2011-03-11 00:28:17 UTC
Did you use trilinear filtering?
Do you use "accurate" scaling now?
Comment 27 Thilo-Alexander Ginkel 2011-03-11 07:33:42 UTC
> --- Comment #26 from Thomas Lübking <thomas luebking gmail com>  2011-03-11 00:28:17 ---
> Did you use trilinear filtering?

There is no "Trilinear Filtering" option in 4.6. If you let me know in
which config file this option manifests itself, I can double-check
using a backup created under 4.5.x which option I was using back then
(which did not freeze anything).

> Do you use "accurate" scaling now?

Yes.
Comment 28 Thomas Lübking 2011-03-11 13:44:24 UTC
[Compositing]
GLTextureFilter

It's the same key now and then, just the backing function changed (and thus the name)
If you did not change settings you likely used TLF then (but if you've a config backup to look up, that's oc much better ;-)

Another related thing is v'syncing - do you use it? What if you disable it (indirect rendering does not support v'syncing, so it's silently disabled)
Comment 29 Gunther Piez 2011-03-11 14:01:33 UTC
For me it seems only to happen in the final zoom after the effect ends. I can use the "desktop cube rotation" as effect for desktop switching without any problems, but as soon as I use the explicit cube or flip switch, the desktop hangs in the final phase of the animation, the "zoom back to original size".
Comment 30 Thomas Lübking 2011-03-11 14:13:50 UTC
a) we're (at least i am) not 100% sure that those are actually dupes - just a feeling
b) more interesting than the personal way to reproduce (it does not happen here, i've tried _really_ hard) would be the source of this.

->
a) try w/o v'sync
b) try with eg. an enabled "sharpen" effect (don't use it in general - the nvidia driver can do this better, just for testing)
c) try with "crisp" or "smooth" scaling (instead of accurate)
Comment 31 Thilo-Alexander Ginkel 2011-03-11 20:32:09 UTC
My config from end of January (KDE 4.5.x) looks like this:

[Compositing]
AnimationSpeed=3
Backend=OpenGL
CheckIsSafe=true
DisableChecks=false
Enabled=true
GLDirect=true
GLMode=SHM
GLTextureFilter=2
GLVSync=true
HiddenPreviews=5
OpenGLIsUnsafe=false
XRenderSmoothScale=false

Currently, this config is in place (KDE 4.6.1):

[Compositing]
AnimationSpeed=3
Backend=OpenGL
CheckIsSafe=true
DisableChecks=false
Enabled=true
GLDirect=false
GLMode=SHM
GLTextureFilter=2
GLVSync=true
HiddenPreviews=5
OpenGLIsUnsafe=false
UnredirectFullscreen=true
XRenderSmoothScale=false

I'll try to figure out which setting triggers this issue and provide an update once I have any new findings.
Comment 32 Thomas Lübking 2011-03-11 20:53:29 UTC
re-enable dri, then try v'sync first - it's auto deactivated by indirect rendering - then choose another scaling method. (or do both at once and if it helps, reactive one OR the other for testing)
Comment 33 Thilo-Alexander Ginkel 2011-03-11 21:19:54 UTC
(In reply to comment #32)
> re-enable dri, then try v'sync first - it's auto deactivated by indirect
> rendering - then choose another scaling method. (or do both at once and if it
> helps, reactive one OR the other for testing)

Done, which resulted in the following config:


[Compositing]
AnimationSpeed=3
Backend=OpenGL
CheckIsSafe=true
DisableChecks=false
Enabled=true
GLDirect=true
GLMode=SHM
GLTextureFilter=2
GLVSync=false
HiddenPreviews=5
OpenGLIsUnsafe=false
UnredirectFullscreen=true
XRenderSmoothScale=false

Unfortunately, this is still freezing.

Overall, the only configuration, which did not result in freezes involved
1) disabling DRI
2) using the following config:

[Compositing]
AnimationSpeed=3
Backend=OpenGL
CheckIsSafe=true
DisableChecks=false
Enabled=false
GLDirect=true
GLMode=Fallback
GLTextureFilter=1
GLVSync=false
HiddenPreviews=5
OpenGLIsUnsafe=false
UnredirectFullscreen=false
XRenderSmoothScale=false

The latter resulted in plenty of KWin crashes, but no freezes (I will attach the backtrace in a separate reply), which may or may not be related to this issue.
Comment 34 Thilo-Alexander Ginkel 2011-03-11 21:31:36 UTC
Created attachment 57885 [details]
Backtrace for comment #33, option 2
Comment 35 Thomas Lübking 2011-03-11 21:48:04 UTC
"GLTextureFilter=2" <- at least here the "accurate" filtering is still active (but not in the second config, so i'm not sure what's been tested)

"GLMode=SHM" <- no, just use TFP instead.

"GLMode=Fallback" <- crashes are "normal" ;-)

"GLTextureFilter=1" <- tested this with enabled dri?

"HiddenPreviews=5" <- out of curiosity: does setting "always" for "keep window thumbnails" (advanced tab) have any impact?
Comment 36 Thilo-Alexander Ginkel 2011-03-11 22:18:49 UTC
Next test run with:

GLDirect=true
GLMode=TFP
GLVSync=false
GLTextureFilter={0|1}
HiddenPreviews={4|6}

=> Still always freezes.
Comment 37 Thomas Lübking 2011-04-12 18:25:48 UTC
can someone encountering this issue or bug #266182 please dump and attach his nvidia settings?
"nvidia-settings -q all > nvidiasettings.txt"
Comment 38 Thilo-Alexander Ginkel 2011-04-12 18:33:21 UTC
Created attachment 58854 [details]
nvidia-settings -q all
Comment 39 Thomas Lübking 2011-04-12 19:07:07 UTC
Ok, please run "nvidia-settings"
- "OpenGL Settings": disable "Sync to VBlank" (allow flipping should be ok, only try to disable if the rest fails)
- "Antialiasing Settigns": Don't override anything (texture sharpening should be ok, treat like filpping)

restart "kwin --replace &" afterwards and check whether the issue remains
Comment 40 Thilo-Alexander Ginkel 2011-04-12 19:23:25 UTC
> --- Comment #39 from Thomas Lübking <thomas luebking gmail com>  2011-04-12 19:07:07 ---
> Ok, please run "nvidia-settings"
> - "OpenGL Settings": disable "Sync to VBlank" (allow flipping should be ok,
> only try to disable if the rest fails)
> - "Antialiasing Settigns": Don't override anything (texture sharpening should
> be ok, treat like filpping)
>
> restart "kwin --replace &" afterwards and check whether the issue remains

Wow, problem solved (didn't have to touch the fallback settings mentioned).

Anti-aliasing however seems to be disabled now, so the image quality
leaves some room for improvement. Still, much better than a freeze!
:-)
Comment 41 Massimiliano Torromeo 2011-04-12 19:38:38 UTC
I already had "sync to vblank" disabled and no override in the antialiasing settings, but flip switch still freezed.

Once I disabled "Allow flipping" and restarted kwin it worked!
Comment 42 Thilo-Alexander Ginkel 2011-04-12 19:42:20 UTC
One question remains, though (despite not being very KDE-related):

How would I apply these changes by default? The nvidia driver does not seem to provide any option to persistently change these options through an xorg.conf setting [1].

[1] http://us.download.nvidia.com/XFree86/Linux-x86_64/260.19.44/README/xconfigoptions.html
Comment 43 Thilo-Alexander Ginkel 2011-04-12 19:50:13 UTC
(In reply to comment #41)
> I already had "sync to vblank" disabled and no override in the antialiasing
> settings, but flip switch still freezed.
> 
> Once I disabled "Allow flipping" and restarted kwin it worked!

Well, "Flip Switch" ceased showing any problems even without touching "Flip Switch", but "Desktop Cube" kept freezing until I disabled the latter.

Currently, everything is working smoothly even with VBlank and Anti-Aliasing enabled, but Flip Switch disabled.
Comment 44 Thilo-Alexander Ginkel 2011-04-12 19:52:52 UTC
(In reply to comment #43)
> (In reply to comment #41)
> > I already had "sync to vblank" disabled and no override in the antialiasing
> > settings, but flip switch still freezed.
> > 
> > Once I disabled "Allow flipping" and restarted kwin it worked!
> 
> Well, "Flip Switch" ceased showing any problems even without touching "Flip
> Switch", but "Desktop Cube" kept freezing until I disabled the latter.
> 
> Currently, everything is working smoothly even with VBlank and Anti-Aliasing
> enabled, but Flip Switch disabled.

Ehm... The last sentence was supposed to read: "Currently, everything is working smoothly even with VBlank and Anti-Aliasing enabled, but *Allow Flipping* disabled."

Sorry for the confusion.
Comment 45 Thilo-Alexander Ginkel 2011-04-12 20:28:25 UTC
Bad news: While the changes proposed in comment #39 seem to reduce the error's probabilty, they do not prevent it from happening under all circumstances (which can also be seen in the dmesg output that will still contain nvidia driver bug entries from time to time, such as: "NVRM: Xid (0001:00): 13, 0003 00000000 00008297 00001b0c 1000f010 00000040").
Comment 46 Thomas Lübking 2011-04-12 21:09:18 UTC
The settings are persistent (but on runtime level and i think even per user)

Some comments on them (maybe we should have such in some tech doc)
a) "Sync to VBlank" - this setting has hardly any impact on kwin compositing since it only intercepts the glXSwapBuffers call which is only made during fullscreen effects like "flip switch", the cube etc.
The kwin setting should work all the time and since the two may interfere (see bug #269816) and cause visible lags the nvidia feature should NOT be enabled with kwin.

b) Antialiasing: this is rather resource (both, memory & gpu) intense. Martin said he has tried it for cover switch etc. but it was to heavy even then. Maybe we should try again ;-)
Anyway it does not make any sense on the regular desktop display (but slows down things) and should only be enabled dynamically when an effect actually transforms the matrix in a non trivial (scaling) way.
If the alternative is that ppl. globally enable it, we should rather offer it from kwin side (even if disabled by default)

c) Anisotropic filtering: Is only relevant when a texture (window) is rotated (cover/flip/cube)
If at all you'd get a very minor improvement on steep angles in those effects.

a) b) and c) can be overridden by environment variables and are mostly meant for games which do not provide such settings.

d) Flipping is much more complicated. It can quite accelerate the buffer swapping but that's hardly used by kwin anyway (except in GLES) and differs between GeForce and Quadro.
On GeForce GPUs it should not work w/o nvidias global __GL_SYNC_TO_VBLANK anyway or if the (only!) GL client is not fullscreen or (partially) obscured.
On Quadro GPUs the vblank/fullscreen/obscurance restrictions do not apply.

In other words:
it should actually not apply on a (geforce) desktop at all and rarely on a quadro one.

@ Massimiliano: do you have a quadro chip?

Sth. seems broken in that feature and it looks like it causes #266182 as well (it's the common it the settings posted there - we'll see soon)
Comment 47 Massimiliano Torromeo 2011-04-12 21:15:14 UTC
(In reply to comment #46)
> @ Massimiliano: do you have a quadro chip?

No,
VGA compatible controller: nVidia Corporation G96 [GeForce 9600M GS] (rev a1)
Comment 48 e3k 2011-04-12 21:48:34 UTC
same issue here with gentoo kde 4.6.2 and 
VGA compatible controller: nVidia Corporation G94 [GeForce 9600 GT] (rev a1)
Comment 49 Thomas Lübking 2011-04-12 21:57:42 UTC
*** Bug 266182 has been marked as a duplicate of this bug. ***
Comment 50 Thomas Lübking 2011-04-12 21:59:45 UTC
@e3k
and issue fixed by disabling "allow flipping" in nvidia-settings?
Comment 51 alexander 2011-04-12 23:52:12 UTC
(In reply to comment #50)
> @e3k
> and issue fixed by disabling "allow flipping" in nvidia-settings?

No, see last comments in bug 266182

#41
"It seems that disabling Allow Flipping only helps temporarily. I've got the
next freeze about 10 min. after having disabled it. I checked three times and I
could reproduce (had to reboot the box...). It simply takes longer until the
freeze comes."

#42
"Yes, I can confirm. In addition, when press space button and cube disappears
the screen becomes black for a moment, like a black flash."


Has anyone tried with the new nvidia driver 270.41.03 ?
Comment 52 Anish Bhatt 2011-04-12 23:53:21 UTC
I'd tried it 270 when it was in beta, no luck. Will try again as soon as I get access to the nvidia machine.
Comment 53 Thilo-Alexander Ginkel 2011-04-12 23:56:41 UTC
Is there anyone, who can confirm the existence of this issue with a GeForce 7300SE card? I have a spare card of this type that I would provide to an interested developer to aid debugging. I don't have the card installed right now (but a GeForce 9500 GT), which is why testing for the existence of the issue for the 7300GS is a little tricky...
Comment 54 Thomas Lübking 2011-04-13 00:41:41 UTC
It does not happen here on a  G73 (7600GT) so you should check the card since there's a good chance that it doesn't affect a G72 either :-(

I do however have some XID error messages in dmesg, so it actually might be just a gtk+ issue (see comments #12 / #13)
Comment 55 e3k 2011-04-13 17:39:36 UTC
alexander i use: x11-drivers/nvidia-drivers-260.19.44

all the 270.x.x drivers are currently hard masked in gentoo portage dont want to mess with that.
Comment 56 e3k 2011-04-14 21:51:24 UTC
alexander i tried now x11-drivers/nvidia-drivers-270.41.03 as they were now unmasked. same issue.
Comment 57 alexander 2011-04-15 19:31:52 UTC
(In reply to comment #56)
> alexander i tried now x11-drivers/nvidia-drivers-270.41.03 as they were now
> unmasked. same issue.

Ok, thank you very much e3k. 
So, we have no hope to solve..
Comment 58 GiorgioP 2011-04-19 22:26:54 UTC
I can reproduce this problem with my Chakra linux MS5 KDE 4.6.2, nvidia driver 260.19.44
Comment 59 Ralph Moenchmeyer 2011-04-23 17:36:18 UTC
Same problem here - Opensuse 11.4. KDE 4.6.2, graphics card Nvidia GTX 460,  Nvidia driver: 270.41.06

Deactivation of "Allow Flipping" in nvidia-settings does not help. 

However, a deactivation of "direct rendering" in KDE's "systemsettings" for Open GL options makes the problem disappear.
Comment 60 Ralph Moenchmeyer 2011-04-23 17:59:19 UTC
Additional remarks with respect to my previous comment #59: 

The "freeze" like behaviour occurs only when leaving the cube animation - not during the animation itself.

When returning to the normal desktop mouse and keyboard get unresponsive until - after a while - the system deactivates desktop effects.  

The freeze like behaviour is accompanied by a 100% spike of CPU activity on one one more of my CPU cores. I can sometimes see this in gkrell just before everything seems to freeze.    

Of course, deactivation of "direct rendering" is no real solution as it leads to other disadvantages as e.g. a decline in performance.
Comment 61 Thomas Lübking 2011-04-23 20:55:15 UTC
can you try to scan log the cpu load?
open a terminal and fire "top -bid 1 > ~/cpu_use.log"
this will write the active processes and their load to the file cpu_use.log in your home dir once per second (the "1" parameter)

also see the last comment in bug #266182 about moving libkwinnvidiahack out of the way. (but having an idea whether and how much kwin or X take of your cpu under this condition would be helpful)
Comment 62 Thomas Lübking 2011-04-23 21:19:09 UTC
ERRRR scratch that!!
kwin won't load w/o libnvidiahack anymore...

so just have a look for the cpu load if you can spare the time
Comment 63 Thilo-Alexander Ginkel 2011-04-24 00:22:52 UTC
Created attachment 59261 [details]
top -bid 1 > ~/cpu_use.log

Log while the system is frozen (but responds via ssh).

What is underrepresented in these statistics is a family of processes that frequently shows up when running top interactively: "migration/x". In the interactive top output that typically consumes > 50% CPU.

@Thomas: To narrow down why you can't reproduce the issue: Which Linux kernel and X version are you running? Do you use a multi-core CPU?
Comment 64 Ralph Moenchmeyer 2011-04-24 10:19:36 UTC
Created attachment 59270 [details]
top -bid 1 > ~/cpu_use.log
Comment 65 Ralph Moenchmeyer 2011-04-24 10:22:31 UTC
    (In reply to comment #61)
> can you try to scan log the cpu load?
> open a terminal and fire "top -bid 1 > ~/cpu_use.log"
> this will write the active processes and their load to the file cpu_use.log in
> your home dir once per second (the "1" parameter)
> 

I have an i7 950 quadcore processor (turboboost deactivated) on a Gigabyte X58 UD5 with triple channel RAM access (3 x 4 GB RAM), graphics card Nvidia GTX 460, Nvidia driver: 270.41.06. 

See my attachment where the high CPU period corresponds to the freeze period until the 3D desktop effects are deactivated. Hope that - together with Thilo-Alexander's log - helps to analyze the situation .
Comment 66 Ralph Moenchmeyer 2011-04-24 10:33:51 UTC
Regarding my log: 

It may not that easy to interpret it - therefore some hints what I did: 

I started the log. Then used Ctrl-F11 to start the cube animation. After that I rotated the cube with 8 desktop planes several times. 

Then I used "Enter" to stop the cube animation and to return to the normal desktop. The mouse could the still be moved but no clicks or keyboard input were handled by the system. I clicked on several windows and buttons on the screen. A second later even the mouse pointer froze at it's position. Then, several seconds later the KWin message regarding the deactivation of Desktop effects appeared. After that I stopped the log.
Comment 67 Thomas Lübking 2011-04-24 12:18:08 UTC
a) the logs suggest that thilo's using indirect rendering and ralph uses direct rendering - confirmed? in case:
@thilo: does indirect rendering still work around the flipswitch? (but not the cube issue)
@ralph: does it for you? what about the cube under this condition? (sorry if you've answered this before)

In summery the entire load would go somewhere to the driver, you can use sysprof for more details, but the nvidia function names are apparently scrambled or just stripped - ie. w/o useful names :-(
You could also gdb attach kwin and take a backtrace - you'll likely end up somewhere in glXSwapBuffers

WARNING: the below will "freeze" the compositor for sure (thus prevent screen update)
-> use from VT1 only ( "(gdb) " is gdb's prompt)
----------------------
pidof kwin
12345
gdb
(gdb) attach 12345
(gdb) bt
-----------


b) 2.6.38-ARCH, xorg-server 1.10.1, nvidia 270.41.06, siglecore cpu, AGP based 7600GT
(inb4 anybody laughs: /my/ system runs fluidly... ;-)
Option      "TripleBuffer" "false" / "true" has no impact in this regard here.

c) ftr: i've reopened the cube bug #266182 - so if you're not experiencing the flip issue but the cube one (and while they're probably still related) rather stick to that one

d) the option "NoFlip" "true" should set the opengl driver flip-on-swap (the setting is NOT related to the flipswitch effect at all) on the server level, no idea whether it can effectively be changed at runtime then
Comment 68 Ralph Moenchmeyer 2011-04-24 13:07:53 UTC
Yes, I used direct rendering and my test referred to the cube. Without direct rendering there is no problem with the cube, desktop switching and returning to the chosen desktop. With direct rendering the system "freezes" for a period of several seconds as described in comments 20, 60 and 66.   

I commented the cube problem here because the corresponding bug was marked as a duplicate of this one. Sorry, if this led to some confusion.  

By "flip switching" you probably mean the effect by which you flip through a 3D (diagonal) sequence of active windows to choose one ? 

If yes: 

My desktop freezes during "flip switch" for window selection, too, and in a very similar (if not identical) way as with the cube animation for desktop selection. The freeze occurs in both cases when returning to the normal desktop - not during the animation itself. The desktop gets unresponsive until Kwin deactivates desktop effects.    

One difference, however, is that with the "flip switch freeze" I see horizontal artifacts running over the screen. The keyboard remains active for one second in the chosen window. A mouse click leads to a total "freeze" until the deactivation of 3D desktop effects by Kwin. I also see a spike in CPU activity during the freeze at the end of the flip switch animation. 

This is so similar that I think the reason for both the "cube" and the "flip switch" problem since KDE 4.6.2 probably is one and the same. 

I should add that I do not have any problem with the second flip switch animation - that one where the 3D windows move horizontally from a stack on the right side to a stack on the left side. I do not know what that animation is called in English. This second animation for choosing a window works without any problems. Maybe this helps to narrow down the line of investigation. 

I should add, too, that I never experienced any of these animation problems with KDE 4.5.x. From there I went directly to KDE 4.6.2.
Comment 69 Martin Flöser 2011-04-24 13:13:27 UTC
I just did a test with cube as I had an idea what might be causing it and it would be nice if some 
people could test it. I disabled "Show desktop name" in the cube effect and did not have a 
freeze. Afterwards I enabled it again and was hit by the freeze.

For some time I had thought that EffectFrame might be the culprit as it is more or less the only 
thing which got changed in 4.6 concerning both Cube and FlipSwitch. Could someone please 
confirm the experience.

And maybe we should duplicate the two bugs again. I am quite certain that cube and flip switch 
freezes are caused by the same issue.
Comment 70 Thilo-Alexander Ginkel 2011-04-24 13:19:40 UTC
(In reply to comment #67)
> a) the logs suggest that thilo's using indirect rendering and ralph uses direct
> rendering - confirmed? in case:
> @thilo: does indirect rendering still work around the flipswitch? (but not the
> cube issue)

Most of the time. I hit the freeze after making some effects changes (where even disabling DRI does not always help).

BTW, considering that your system is a single-core CPU, I tested the hypothesis that a multi-core CPU is required to trigger the issue. Seems that this hypothesis is correct as disabling all but one core (using maxcpus=1 on the kernel command line) makes the freezes go away (even with DRI and VSync enabled). However, I still get the nVidia Xid kernel log entries where I would usually have gotten a freeze:

NVRM: Xid (0001:00): 13, 0003 00000000 00008297 00000f10 44960000 00000040

@All: Is there anyone, who is getting these freezes with DRI enabled and only a single(-core) CPU?

@Thomas: If you - by chance - have an Intel socket 775 mainboard, let me know - I have some spare dual-core hardware available that I could lend you to aid debugging.
Comment 71 Thilo-Alexander Ginkel 2011-04-24 13:47:52 UTC
(In reply to comment #69)
> I just did a test with cube as I had an idea what might be causing it and it
> would be nice if some 
> people could test it. I disabled "Show desktop name" in the cube effect and did
> not have a 
> freeze. Afterwards I enabled it again and was hit by the freeze.
> 
> For some time I had thought that EffectFrame might be the culprit as it is more
> or less the only 
> thing which got changed in 4.6 concerning both Cube and FlipSwitch. Could
> someone please 
> confirm the experience.

Yep, confirmed. DRI+VSync enabled+EffectFrame causes Xid entries and sporadic freezes. Both disappear if I disable the EffectFrame (both for the Desktop Cube as well as the FlipSwitch).
Comment 72 Martin Flöser 2011-04-24 14:35:55 UTC
I think I understand the issue now.

During FlipSwitch we render fullscreen updates which use the code path using the waitsync. 
So the last frame gets rendered (including the EffectFrame) but is not yet rendered immediately 
but is synced.

During the postscreen handling the EffectFrame's internal pixmaps are deleted which is before 
the EffectFrame is actually rendered and causing the issue. I tried just to not free the 
EffectFrame in postscreen and can no longer reproduce it.

I will now look into a proper patch to the issue. I assume it will also fix the bad drawable errors 
in general visible with NVIDIA blob. That sounds like the same issue to me.
Comment 73 Martin Flöser 2011-04-24 14:53:26 UTC
Created attachment 59274 [details]
Flush buffer before delete of pixmaps

For me this patch solves the issue. If someone could please confirm I will push to master and branch
Comment 74 Thilo-Alexander Ginkel 2011-04-24 16:54:54 UTC
(In reply to comment #73)
> Created an attachment (id=59274) [details]
> Flush buffer before delete of pixmaps
> 
> For me this patch solves the issue. If someone could please confirm I will push
> to master and branch

Looks good so far. No freezes, no Xid errors. Updated Kubuntu packages are available at: https://launchpad.net/~thilo.ginkel/+archive/kde-4.6.x
Comment 75 Martin Flöser 2011-04-24 17:25:47 UTC
*** Bug 266182 has been marked as a duplicate of this bug. ***
Comment 76 Martin Flöser 2011-04-24 17:27:48 UTC
*** Bug 266182 has been marked as a duplicate of this bug. ***
Comment 77 Richard Cox 2011-04-24 19:35:27 UTC
(In reply to comment #73)
> Created an attachment (id=59274) [details]
> Flush buffer before delete of pixmaps
> 
> For me this patch solves the issue. If someone could please confirm I will push
> to master and branch

I can confirm this patch fixes the cube issue on my system:

Gentoo: amd64
Kernel:  2.6.28
nvidia-drivers:  270.41.03
xorg-server:  1.10.1
KDE:  4.6.2

No freezes and no NVRM message in my dmesg output.
Comment 78 Richard Cox 2011-04-24 19:49:12 UTC
(In reply to comment #77)
> (In reply to comment #73)
> > Created an attachment (id=59274) [details] [details]
> > Flush buffer before delete of pixmaps
> > 
> > For me this patch solves the issue. If someone could please confirm I will push
> > to master and branch
> 
> I can confirm this patch fixes the cube issue on my system:
> 
> Gentoo: amd64
> Kernel:  2.6.28
> nvidia-drivers:  270.41.03
> xorg-server:  1.10.1
> KDE:  4.6.2
> 
> No freezes and no NVRM message in my dmesg output.

I should add...flip switch has no issues either, but I never used it before, until now.

Great job fixing this!
Comment 79 Martin Flöser 2011-04-24 20:15:16 UTC
Git commit b3737884c0ca8ad8004c39f9d0e38572cde0dc57 by Martin Gräßlin.
Committed on 24/04/2011 at 20:20.
Pushed by graesslin into branch 'master'.

Perform glFlush before deleting the EffectFrame's pixmaps

On NVIDIA it is possible that the actual rendering gets delayed to
after the deletion of the pixmap during the end of fullscreen effects.
This was causing freezes. By using glFlush before deleting the pixmaps
we can ensure that the pixmap is not needed anymore after the pixmaps
are deleted.

BUG: 261323
FIXED-IN: 4.6.3

M  +1    -0    kwin/scene_opengl.cpp     

http://commits.kde.org/kde-workspace/b3737884c0ca8ad8004c39f9d0e38572cde0dc57
Comment 80 Martin Flöser 2011-04-24 20:17:28 UTC
Git commit 007fdd04c7d0fcfc29a476846380544d95693276 by Martin Gräßlin.
Committed on 24/04/2011 at 20:20.
Pushed by graesslin into branch 'KDE/4.6'.

Perform glFlush before deleting the EffectFrame's pixmaps

On NVIDIA it is possible that the actual rendering gets delayed to
after the deletion of the pixmap during the end of fullscreen effects.
This was causing freezes. By using glFlush before deleting the pixmaps
we can ensure that the pixmap is not needed anymore after the pixmaps
are deleted.

BUG: 261323
FIXED-IN: 4.6.3

M  +1    -0    kwin/scene_opengl.cpp     

http://commits.kde.org/kde-workspace/007fdd04c7d0fcfc29a476846380544d95693276
Comment 81 alexander 2011-04-24 21:40:16 UTC
Thank you so much ;)
Comment 82 Martin Flöser 2011-04-30 10:46:47 UTC
*** Bug 262139 has been marked as a duplicate of this bug. ***
Comment 83 Thomas Lübking 2011-04-30 14:49:13 UTC
*** Bug 272053 has been marked as a duplicate of this bug. ***
Comment 84 Jorge Adriano 2011-04-30 20:46:08 UTC
Thank you for that Martin!
Comment 85 Thomas Lübking 2011-05-11 12:05:01 UTC
*** Bug 272928 has been marked as a duplicate of this bug. ***