Bug 343551

Summary:	Kwin hangs, stops drawing the screen and starts using 100% cpu inside nvidia-glcore after modifying compositing settings
Product:	[Plasma] kwin	Reporter:	Simeon Bird <bladud>
Component:	scene-opengl	Assignee:	KWin default assignee <kwin-bugs-null>
Status:	RESOLVED FIXED
Severity:	normal	CC:	adloconwy+kdebug, auxsvr, bladud, sergio.callegari, simonandric5, sombragris
Priority:	NOR	Keywords:	drkonqi
Version First Reported In:	unspecified	Flags:	thomas.luebking: NVIDIA+
Target Milestone:	---
Platform:	unspecified
OS:	Linux
See Also:	https://bugs.kde.org/show_bug.cgi?id=346116 https://bugs.kde.org/show_bug.cgi?id=348753
Latest Commit:	1de1e80d5077157fc25503c4699969c57929795d	Version Fixed In:	5.3
Sentry Crash Report:
Attachments:	Backtrace when kwin is stuck and screen is not drawing Patch to 'fix' hang on deletion by not deleting object stop stap control before deleting sync objects Another patch to fix the hang by manually triggering the xcb fence. Updated patch to fix hang by triggering fence

Description Simeon Bird 2015-01-30 02:26:51 UTC

Application: systemsettings5 (5.2.0)

Qt Version: 5.4.0
Operating System: Linux 3.18.4-1-ARCH x86_64
Distribution: "Arch Linux"

-- Information about the crash:
- What I was doing when the application crashed:

1. Open system settings to the compositor page. 
2. Examine page for a few seconds
3. hit the button to return to overview - overview displays, but crash occurs shortly afterwards.

- Custom settings of the application:
I am using the nvidia binary driver, v304.125. 
rendering backend is opengl v3.1 in system settings
opengl interface is glx
scale method is 'accurate'

The crash can be reproduced every time.

-- Backtrace:
Application: System Settings (systemsettings5), signal: Segmentation fault
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[Current thread is 1 (Thread 0x7faf0ec277c0 (LWP 9063))]

Thread 5 (Thread 0x7faefcc35700 (LWP 9064)):
#0  0x00007faf0b3a344d in poll () from /usr/lib/libc.so.6
#1  0x00007faf098539f2 in ?? () from /usr/lib/libxcb.so.1
#2  0x00007faf0985556f in xcb_wait_for_event () from /usr/lib/libxcb.so.1
#3  0x00007faeff6093f9 in ?? () from /usr/lib/qt/plugins/platforms/libqxcb.so
#4  0x00007faf0ba195ee in ?? () from /usr/lib/libQt5Core.so.5
#5  0x00007faf08150754 in ?? () from /usr/lib/libGL.so.1
#6  0x00007faf08fce314 in start_thread () from /usr/lib/libpthread.so.0
#7  0x00007faf0b3ac24d in clone () from /usr/lib/libc.so.6

Thread 4 (Thread 0x7faeeea71700 (LWP 9080)):
#0  0x00007faf0ec094c0 in update_get_addr () from /lib64/ld-linux-x86-64.so.2
#1  0x00007faf0ba184f2 in ?? () from /usr/lib/libQt5Core.so.5
#2  0x00007faf0bc5a60a in ?? () from /usr/lib/libQt5Core.so.5
#3  0x00007faf08ab121d in g_main_context_prepare () from /usr/lib/libglib-2.0.so.0
#4  0x00007faf08ab1bbb in ?? () from /usr/lib/libglib-2.0.so.0
#5  0x00007faf08ab1dac in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
#6  0x00007faf0bc5b08c in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQt5Core.so.5
#7  0x00007faf0bc01532 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQt5Core.so.5
#8  0x00007faf0ba14664 in QThread::exec() () from /usr/lib/libQt5Core.so.5
#9  0x00007faf0ba195ee in ?? () from /usr/lib/libQt5Core.so.5
#10 0x00007faf08150754 in ?? () from /usr/lib/libGL.so.1
#11 0x00007faf08fce314 in start_thread () from /usr/lib/libpthread.so.0
#12 0x00007faf0b3ac24d in clone () from /usr/lib/libc.so.6

Thread 3 (Thread 0x7faeda3df700 (LWP 9180)):
#0  0x00007faf0b3a344d in poll () from /usr/lib/libc.so.6
#1  0x00007faf08ab1c94 in ?? () from /usr/lib/libglib-2.0.so.0
#2  0x00007faf08ab2022 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#3  0x00007faee5325cf6 in ?? () from /usr/lib/libgio-2.0.so.0
#4  0x00007faf08ad85f5 in ?? () from /usr/lib/libglib-2.0.so.0
#5  0x00007faf08150754 in ?? () from /usr/lib/libGL.so.1
#6  0x00007faf08fce314 in start_thread () from /usr/lib/libpthread.so.0
#7  0x00007faf0b3ac24d in clone () from /usr/lib/libc.so.6

Thread 2 (Thread 0x7faedb5ee700 (LWP 9380)):
#0  0x00007faf0b3a344d in poll () from /usr/lib/libc.so.6
#1  0x00007faece53570c in ?? () from /usr/lib/libusb-1.0.so.0
#2  0x00007faf08fce314 in start_thread () from /usr/lib/libpthread.so.0
#3  0x00007faf0b3ac24d in clone () from /usr/lib/libc.so.6

Thread 1 (Thread 0x7faf0ec277c0 (LWP 9063)):
[KCrash Handler]
#5  0x00007faef1492ab0 in QQuickWindow::maybeUpdate() () from /usr/lib/libQt5Quick.so.5
#6  0x00007faef147ed08 in QQuickItemPrivate::dirty(QQuickItemPrivate::DirtyType) () from /usr/lib/libQt5Quick.so.5
#7  0x00007faef1488e85 in ?? () from /usr/lib/libQt5Quick.so.5
#8  0x00007faf0bc344ba in QObject::event(QEvent*) () from /usr/lib/libQt5Core.so.5
#9  0x00007faef1487a0b in QQuickItem::event(QEvent*) () from /usr/lib/libQt5Quick.so.5
#10 0x00007faf0d0a1d8c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /usr/lib/libQt5Widgets.so.5
#11 0x00007faf0d0a7370 in QApplication::notify(QObject*, QEvent*) () from /usr/lib/libQt5Widgets.so.5
#12 0x00007faf0bc03a9b in QCoreApplication::notifyInternal(QObject*, QEvent*) () from /usr/lib/libQt5Core.so.5
#13 0x00007faf0bc05adb in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () from /usr/lib/libQt5Core.so.5
#14 0x00007faf0bc5ac83 in ?? () from /usr/lib/libQt5Core.so.5
#15 0x00007faf08ab1a0d in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#16 0x00007faf08ab1cf8 in ?? () from /usr/lib/libglib-2.0.so.0
#17 0x00007faf08ab1dac in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
#18 0x00007faf0bc5b077 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQt5Core.so.5
#19 0x00007faf0bc01532 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQt5Core.so.5
#20 0x00007faf0bc08f0c in QCoreApplication::exec() () from /usr/lib/libQt5Core.so.5
#21 0x000000000040e64c in main ()

Reported using DrKonqi

Comment 1 Thomas Lübking 2015-01-30 22:46:39 UTC


*** This bug has been marked as a duplicate of bug 343543 ***

Comment 2 Thomas Lübking 2015-01-30 23:01:51 UTC

Have you been in the "compositing" (where you can change the backend etc.) or the "effects" (where you can swith on/off wobbly windows etc.) kcm?

Comment 3 Simeon Bird 2015-01-31 00:42:41 UTC

The compositing, where you can change the backend. 
In plasma 5.1 the crash was also present, but took down kwin as well.

Comment 4 Simeon Bird 2015-01-31 01:15:55 UTC

I spoke too soon - actually changing the settings causes kwin and kded5 to freeze and take 100% of a cpu each.

Comment 5 Thomas Lübking 2015-01-31 01:22:38 UTC

Changing what settings?
There's a reported eventloop recursion for kwin in the tabbox, see bug #340294
There's also a bug report for kded5, seems caused by the powerdevil module, see bug #337674

Neither would be related to this crash - it's something in QML - apparently rather the "new" QML context then the closed one (as the compositing kcm doesn't use it)
But I've not yet checked whether the overview really uses QML.

Comment 6 Simeon Bird 2015-01-31 01:37:46 UTC

Just now it crashed checking the box labelled "skip compositing for full-screen windows". 

I suspect that changing the backend or any of the other compositing settings would also crash (as in plasma 5.1). 

If you like I can check whether it still crashes with nouveau.

Comment 7 Thomas Lübking 2015-01-31 01:39:34 UTC

*systemsettings* crashed with *that* backtrace for altering a kwin setting??

Comment 8 Simeon Bird 2015-01-31 01:51:58 UTC

Sorry, I was imprecise.

systemsettings crashes with this backtrace when exiting the compositing kcm after not changing a setting.

If I do change a setting in the compositing kcm, both kwin and kded5 hang. This may involve a crash, but it may not - it hangs and there is no backtrace window.

Comment 9 Simeon Bird 2015-01-31 01:54:29 UTC

In fact when kwin hangs there is no crash at all, at least not one which creates a coredump.

Comment 10 Simeon Bird 2015-01-31 02:27:00 UTC

When kwin hangs with 100% cpu usage, I obtained a backtrace by attaching gdb to the hung process. The top four lines looked like:

sched_yield /usr/lib/libc6
???   /usr/lib/libnvidia-glcore.so.304.125
???   /usr/lib/libnvidia-glcore.so.304.125
???   /usr/lib/libnvidia-glcore.so.304.125

and below that was kwin, which doesn't have symbols at the moment.

Note that I have enabled Triple buffering in xorg.conf as suggested in https://bugs.kde.org/show_bug.cgi?id=322060 

If I export __GL_YIELD="USLEEP" instead, the top lines in the backtrace are nanosleep and usleep.

Comment 11 Simeon Bird 2015-01-31 02:59:31 UTC

Created attachment 90827 [details]
Backtrace when kwin is stuck and screen is not drawing

Comment 12 Simeon Bird 2015-01-31 03:16:05 UTC

Created attachment 90828 [details]
Patch to 'fix' hang on deletion by not deleting object

This patch fixes the kwin hang for me, at the cost of leaking memory. 
It seems that this must be a bug in the nvidia driver?

I found this by googling:
https://www.opengl.org/discussion_boards/showthread.php/171741-NVIDIA-bug-in-glDeleteSync

Comment 13 Simeon Bird 2015-01-31 03:19:21 UTC

Re-opened and updated title, since I realise there are two different bugs here.

Comment 14 Thomas Lübking 2015-01-31 18:42:40 UTC

Would either of those patches do as well? (1st one preferably)

diff --git a/scene_opengl.cpp b/scene_opengl.cpp
index 7584dd5..486af4d 100644
--- a/scene_opengl.cpp
+++ b/scene_opengl.cpp
@@ -120,6 +120,8 @@ SyncObject::SyncObject()
 
 SyncObject::~SyncObject()
 {
+    if (m_state == Waiting)
+        glFinish();
     xcb_sync_destroy_fence(connection(), m_fence);
     glDeleteSync(m_sync);


-----------------------------------

diff --git a/scene_opengl.cpp b/scene_opengl.cpp
index 7584dd5..e095157 100644
--- a/scene_opengl.cpp
+++ b/scene_opengl.cpp
@@ -412,6 +412,7 @@ SceneOpenGL::~SceneOpenGL()
     // do cleanup after initBuffer()
     SceneOpenGL::EffectFrame::cleanup();
     if (init_ok) {
+        glFinish();
         delete m_syncManager;
 
         // backend might be still needed for a different scene

Comment 15 Simeon Bird 2015-01-31 19:04:15 UTC

I tried both patches. Unfortunately they don't make a difference.

Comment 16 Simeon Bird 2015-01-31 19:04:52 UTC

(I tried both in turn, not both at once)

Comment 17 Thomas Lübking 2015-01-31 23:54:01 UTC

Ok, let's be more explicit on our needs ;-)

loki:/src/KDE4/kwin/> git diff scene_opengl.cpp
diff --git a/scene_opengl.cpp b/scene_opengl.cpp
index 7584dd5..08256be 100644
--- a/scene_opengl.cpp
+++ b/scene_opengl.cpp
@@ -120,6 +120,8 @@ SyncObject::SyncObject()
 
 SyncObject::~SyncObject()
 {
+//     if (m_state == Waiting)
+        glFinish();
     xcb_sync_destroy_fence(connection(), m_fence);
     glDeleteSync(m_sync);
 
@@ -412,6 +414,7 @@ SceneOpenGL::~SceneOpenGL()
     // do cleanup after initBuffer()
     SceneOpenGL::EffectFrame::cleanup();
     if (init_ok) {
+        m_backend->makeCurrent();
         delete m_syncManager;
 
         // backend might be still needed for a different scene

Comment 18 Simeon Bird 2015-02-01 04:41:13 UTC

Hm. That didn't help either (strangely).

Comment 19 Thomas Lübking 2015-02-02 00:21:47 UTC

*grrrr*
Maybe we can trick it through the swap interval....

=> Does this also happen if you set the tearing prevention to "none"?

Comment 20 Simeon Bird 2015-02-02 05:14:17 UTC

If I set tearing prevention to 'none' the problem is fixed!

Comment 21 Thomas Lübking 2015-02-02 14:45:11 UTC

Created attachment 90873 [details]
stop stap control before deleting sync objects

Ok, attached is a larger (and untested!) patch.

Comment 22 Simeon Bird 2015-02-02 16:50:21 UTC

Ah, apologies. Turning off tearing prevention doesn't actually make a difference; I was running with the patch from comment 12 by accident.

I also tried the patch from comment 21 and it didn't fix it either.

Thanks for your help

Comment 23 Thomas Lübking 2015-02-04 14:30:26 UTC

*** Bug 343773 has been marked as a duplicate of this bug. ***

Comment 24 sombragris 2015-02-04 15:24:11 UTC

Well, I filed a bug report for a different use case (bug  343773) and it has been marked as duplicate of this bug. In my case, Kwin started eating 100% CPU and not drawing anything. Killing kwin_x11 and setting the rendering engine to XRender gave me an usable desktop (after a restart).

This is a regression, since the bug was not present in the latest stable kwin from Plasma 4.

Comment 25 sombragris 2015-02-04 15:24:55 UTC

I wish to add that I am also using the nVidia 304.125 proprietary legacy driver.

Comment 26 Fredrik Höglund 2015-02-09 18:11:35 UTC

The backtrace shows that the driver is busy-waiting for something in glDeleteSync().
What do you suppose that function could be waiting for?

Comment 27 Thomas Lübking 2015-02-09 22:24:11 UTC

(In reply to Fredrik Höglund from comment #26)
> What do you suppose that function could be waiting for?

You believe a nice "xcb_flush(connection());" could do?

Comment 28 Simeon Bird 2015-02-14 17:45:20 UTC

Sorry, that didn't work either [xcb_flush(connection()); just before the glDeleteSync]
I also tried it in conjunction with the glFinish patch above

Comment 29 Fredrik Höglund 2015-02-14 18:52:01 UTC

(In reply to Thomas Lübking from comment #27)
> (In reply to Fredrik Höglund from comment #26)
> > What do you suppose that function could be waiting for?
> 
> You believe a nice "xcb_flush(connection());" could do?

You didn't answer my question.

Comment 30 Thomas Lübking 2015-02-14 20:38:02 UTC

(In reply to Fredrik Höglund from comment #29)

> You didn't answer my question.

I thought I did ;-)
Since we already ruled out waiting for the retrace and it's apparently not the fence, i could only imagine it's waiting to get the context active - but I've no favorite supposition (or had)

The diver is not supposed to block here, so it could be any kind of (inc. internal) mutex.

Comment 31 Simeon Bird 2015-02-15 04:45:00 UTC

Created attachment 91085 [details]
Another patch to fix the hang by manually triggering the xcb fence.

The hang occurs when the sync is in the Ready or Resetting state. It seems that nvidia doesn't like it if the gl sync is deleted or waited on before the xcb fence has been triggered.

This patch fixes it - there is no theory behind this, just trial and error. It also seems to me that if wait() is called sufficiently quickly after trigger() there will also be a hang.

Comment 32 Fredrik Höglund 2015-02-16 00:27:53 UTC

(In reply to Simeon Bird from comment #31)
> Created attachment 91085 [details]
> Another patch to fix the hang by manually triggering the xcb fence.
> 
> The hang occurs when the sync is in the Ready or Resetting state. It seems
> that nvidia doesn't like it if the gl sync is deleted or waited on before
> the xcb fence has been triggered.
> 
> This patch fixes it - there is no theory behind this, just trial and error.
> It also seems to me that if wait() is called sufficiently quickly after
> trigger() there will also be a hang.

Your patch is absolutely correct, but some of the comments in the code are not. What glDeleteSync() is clearly waiting for is for the fence to become signaled, and that is never going to happen unless kwin tells the X server to trigger it. So it's not really correct to say that we need to manually trigger the fence; it's not something that can happen automatically. It would be a very serious bug if it ever did.

The comment above xcb_flush() is also not exactly correct. If the xcb_flush() call is left out, glDeleteSync() will wait for the fence to become signaled, but the trigger request will be stuck in the output buffer and never sent to the X server. So glDeleteSync() ends up waiting indefinitely.

There is no need to call wait() before deleting the fence. The purpose of wait() is to prevent the GPU from executing future draw commands before the fence is signaled, and that's not relevant here. There may be a similar hazard between calling wait() and glDeleteSync() without a glFlush() in-between as with calling trigger() and glDeleteSync() without an xcb_flush() in-between. If you want to make sure that the fence is signaled before you call glDeleteSync(), you should call finish() instead of wait().

Comment 33 Simeon Bird 2015-02-16 05:50:54 UTC

> Your patch is absolutely correct, but some of the comments in the code are
> not.

Ok, I'll update the comments and post a new version. Incidentally, is this actually an nvidia bug?
ie, does the standard call for glDeleteSync not to block? If the answer is yes, should the patch
be made conditional on the nvidia driver somehow?

> There is no need to call wait() before deleting the fence. The purpose of
> wait() is to prevent the GPU from executing future draw commands before the
> fence is signaled, and that's not relevant here. 

What I was worried about was in some other case - if trigger() is called and then insertWait() is called immediately afterwards as part of the normal draw routines. This would be a classic race condition and would lead to an occasional unrepeatable hang. But maybe it isn't possible for this to happen without something equivalent to xcb_flush?

> There may be a similar
> hazard between calling wait() and glDeleteSync() without a glFlush()
> in-between as with calling trigger() and glDeleteSync() without an
> xcb_flush() in-between. If you want to make sure that the fence is signaled
> before you call glDeleteSync(), you should call finish() instead of wait().

That's actually fine (I checked when debugging)

Comment 34 Fredrik Höglund 2015-02-17 23:37:28 UTC

(In reply to Simeon Bird from comment #33)
> > Your patch is absolutely correct, but some of the comments in the code are
> > not.
> 
> Ok, I'll update the comments and post a new version. Incidentally, is this
> actually an nvidia bug?
> ie, does the standard call for glDeleteSync not to block? If the answer is
> yes, should the patch
> be made conditional on the nvidia driver somehow?

I would say that the OpenGL specification strongly implies that glDeleteSync should not block, but it doesn't explicitly say that it's not allowed to. My guess is that there's some limitation that prevents the NVIDIA driver from knowing when it's safe to delete the sync object without blocking on the fence. Triggering the fence before deleting it is not a big deal though, so I wouldn't bother with making it conditional on the NVIDIA driver. It's the only driver that implements the GL_EXT_x11_sync_object extension anyway.

> > There is no need to call wait() before deleting the fence. The purpose of
> > wait() is to prevent the GPU from executing future draw commands before the
> > fence is signaled, and that's not relevant here. 
> 
> What I was worried about was in some other case - if trigger() is called and
> then insertWait() is called immediately afterwards as part of the normal
> draw routines. This would be a classic race condition and would lead to an
> occasional unrepeatable hang. But maybe it isn't possible for this to happen
> without something equivalent to xcb_flush?

That's a good question. It shouldn't matter if the command buffer that signals the fence is submitted after the command buffer that waits for it, as long as both command buffers are able to execute concurrently. This is of course hardware dependent, but all current NVIDIA GPU's should have multiple hardware contexts. The best way to test this is probably to call glWaitSync() and glFlush(), and then tell the X server to trigger the fence. If that results in a GPU hang, we need to make sure that the X server has processed the trigger request before we call glWaitSync(). It might be a good idea to do that anyway for the sake of robustness.

Comment 35 Thomas Lübking 2015-02-19 14:21:03 UTC

(In reply to Fredrik Höglund from comment #32)
> What glDeleteSync() is clearly waiting for is for the fence to become signaled

(excuse my stupidity)
Is this any "clearly" beyond hindsight?
From out of all options, I considered this to be the least reasonable one (would that mean the driver is in pre-emptive waiting condition - and what would be the runtime implications for windows that never trigger the fence?)

Comment 36 Fredrik Höglund 2015-02-20 17:51:21 UTC

(In reply to Thomas Lübking from comment #35)
> (In reply to Fredrik Höglund from comment #32)
> > What glDeleteSync() is clearly waiting for is for the fence to become signaled
>
> (excuse my stupidity)
> Is this any "clearly" beyond hindsight?
> From out of all options, I considered this to be the least reasonable one

A fence is a synchronization primitive that is inserted in the command stream so that you can wait for it and know that all prior commands have completed. So when the function that deletes the associated sync object waits indefinitely for something, my first thought is that it is waiting for the fence. Especially when you consider that at least some of these sync objects are in an unsignaled state, and no fence command has been set in the command stream that will signal them. That Simeon's patch fixes the problem proves the theory.

> (would that mean the driver is in pre-emptive waiting condition - and what
> would be the runtime implications for windows that never trigger the fence?)

Windows don't trigger fences. The fences are triggered from Compositor::performCompositing() immediately after fetching and resetting the damage region, so we can know that the damage has landed in the window textures before we render them. When we are about to render the first damaged window, we insert a command to wait for the fence. If there are no damaged windows, we don't trigger or wait for any fences.

Comment 37 Simeon Bird 2015-02-28 22:46:38 UTC

Created attachment 91353 [details]
Updated patch to fix hang by triggering fence

Ok, here is a patch with updated comments. How does this get into kwin? Should I open a review board, or do you just take it?

Comment 38 auxsvr 2015-03-29 10:22:38 UTC

Under similar conditions I get the following backtrace while kwin_x11 uses 100% CPU:

#0  0x00007f5a77d62a17 in sched_yield () at /lib64/libc.so.6
#1  0x00007f5a671d5e4e in  () at /usr/lib64/libnvidia-glcore.so.304.125
#2  0x00007f5a671d68f6 in  () at /usr/lib64/libnvidia-glcore.so.304.125
#3  0x00007f5a66fb5c2f in  () at /usr/lib64/libnvidia-glcore.so.304.125
#4  0x00007f5a7795d25e in KWin::SyncObject::~SyncObject() (this=0xbd7ef8, __in_chrg=<optimized out>)
    at /usr/src/debug/kwin-5.2.2/scene_opengl.cpp:124
#5  0x00007f5a779612cc in KWin::SceneOpenGL::~SceneOpenGL() (this=0xbd7ee0, __in_chrg=<optimized out>)
    at /usr/include/c++/4.8/array:81
#6  0x00007f5a779612cc in KWin::SceneOpenGL::~SceneOpenGL() (this=0xbd7ee0, __in_chrg=<optimized out>)
    at /usr/src/debug/kwin-5.2.2/scene_opengl.cpp:242
#7  0x00007f5a779612cc in KWin::SceneOpenGL::~SceneOpenGL() (this=0xba53a0, __in_chrg=<optimized out>)
    at /usr/src/debug/kwin-5.2.2/scene_opengl.cpp:415
#8  0x00007f5a77961359 in KWin::SceneOpenGL2::~SceneOpenGL2() (this=0xba53a0, __in_chrg=<optimized out>)
    at /usr/src/debug/kwin-5.2.2/scene_opengl.cpp:966
#9  0x00007f5a77948657 in KWin::Compositor::finish() (this=this@entry=0x8e8490) at /usr/src/debug/kwin-5.2.2/composite.cpp:337
#10 0x00007f5a77948c04 in KWin::Compositor::suspend(KWin::Compositor::SuspendReason) (this=0x8e8490, reason=<optimized out>)
    at /usr/src/debug/kwin-5.2.2/composite.cpp:508
#11 0x00007f5a75e4503f in QMetaObject::activate(QObject*, int, int, void**) (a=0x7fff200490b0, r=0x8a58c0, this=0xcf0ea0)
    at ../../src/corelib/kernel/qobject_impl.h:124
#12 0x00007f5a75e4503f in QMetaObject::activate(QObject*, int, int, void**) (sender=0xc338a0, signalOffset=<optimized out>, local_signal_index=<optimized out>, argv=0x7fff200490b0) at kernel/qobject.cpp:3702
#13 0x00007f5a76acc662 in QAction::triggered(bool) () at /usr/lib64/libQt5Widgets.so.5
#14 0x00007f5a76aceb48 in QAction::activate(QAction::ActionEvent) () at /usr/lib64/libQt5Widgets.so.5

Should I file a new report?

Comment 39 Thomas Lübking 2015-03-29 10:27:21 UTC

(In reply to auxsvr from comment #38)
> Under similar conditions I get the following backtrace while kwin_x11 uses
> 100% CPU:

With 5.3?
Otherwise it's very most likely this bug and should be fixed/worked around in 5.3

Comment 40 auxsvr 2015-03-29 11:57:01 UTC

I'm sorry, I didn't see that 5.3 fixes this. I'm on 5.2.2.

Comment 41 Sergio 2015-05-02 14:27:04 UTC

Can someone clarify if this is expected to be fixed in 5.3? I have just upgraded my system to kubuntu 15.04 that uses plasma 5 and lets either 5.2 (default) or 5.3 (via a dedicated repository) be installed. Unfortunately, with neither of them I succeed in using kwin_x11 with opengl glx together with a system with a nvidia geoforce 7025 / nforce 630 with the nvidia 304 legacy driver that reports opengl 2.1 on this hardware.

Comment 42 Sergio 2015-05-02 14:27:54 UTC

Can someone clarify if this is expected to be fixed in 5.3? I have just upgraded my system to kubuntu 15.04 that uses plasma 5 and lets either 5.2 (default) or 5.3 (via a dedicated repository) be installed. Unfortunately, with neither of them I succeed in using kwin_x11 with opengl glx together with a system with a nvidia geoforce 7025 / nforce 630 with the nvidia 304 legacy driver that reports opengl 2.1 on this hardware.

Comment 43 Simeon Bird 2015-05-02 15:11:18 UTC

It is fixed for me on 5.3 - same driver and glx

Comment 44 Thomas Lübking 2015-05-02 15:58:01 UTC

>  with neither of them I succeed in using kwin_x11 with opengl glx
Are you sure it's for this particular bug?
This one's caused by a hanging fence sync and apparently triggered by invoking the config module.
It should be work-a-roundable by
   export KWIN_EXPLICIT_SYNC=0; kwin_x11 --replace &

Comment 45 Sergio 2015-05-03 23:16:37 UTC

Tried... seems to work with the workaround on 5.3. So I guess it is not fixed in 5.3, or at least not in Kubuntu's 5.3...

Comment 46 Thomas Lübking 2015-05-04 09:14:19 UTC

Do you get 100% cpu load instead?
If so, can you gdb into kwin and check where it hangs?
If not, the sync fences may cause an "unrelated" problem for you.

Comment 47 Sergio 2015-05-04 11:22:35 UTC

After the utopic->vivid upgrade, I get the machine in over 60% iowait, 0 cpu load, but I do not think this is related. Possibly it is another (and more serious) issue with ubuntu vivid.

When kwin is hung, continuously switching to a virtual console and back to the X11 screen with ALT+FN lets one do operations in steps...  The machine is using the KDE 5.3 ppa right now. Setting the environment variables makes it almost usable (apart from the other issues mentioned above, like the 60% iowait and io-related processes getting stuck).  In any case, the machine is now off and I am reinstalling it as trusty with kde 4 or mint with cinnamon soon because I cannot afford keeping it off, so unfortunately I will not be able to do more tests or provide a lot of further information. It is a pity, kubuntu decided to make so many changes at the same time in this upgrade, because I really do not have not enough time right now to try decoupling the problems.

Comment 48 adlo 2015-05-09 01:33:29 UTC

This bug still seems to exist in Plasma 5.3 on Arch Linux.

Comment 49 Thomas Lübking 2015-06-05 19:13:08 UTC

Let's track remaining issues with this feature and the nvidia legacy driver in  bug #348753

It would be great if somebody encountering this could gdb into kwin_x11 and check where it's hanging.
Usual suspects would glDeleteSync calls in libkwineffects/kwinglutils.cpp