Bug 321760

Summary: KWin triggers X crash
Product: [Plasma] kwin Reporter: Hrvoje Senjan <hrvoje.senjan>
Component: compositingAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED UPSTREAM    
Severity: normal CC: cfeck, fredrik
Priority: NOR    
Version: 4.10.90   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: glxinfo
KWin support info (with XRender backend)
Output when trying to activate openGL1/legacy
Screenshot

Description Hrvoje Senjan 2013-06-29 12:22:53 UTC
I guess it's a Mesa/X bug, but maybe it can we workarounded in KWin.
Pulled my ancient laptop for testing beta2, and found that activating openGL 1.2 compositing causes summarized behaviour

Reproducible: Always

Steps to Reproduce:
1. Activate openGL 1.2 comp. on this kind of ancient laptop/graphic chip (852GM/855GM)
Actual Results:  
One founds himself observing nice kdm screen (or similar ;-)

Expected Results:  
Fail to activate comp. / successfully activate it

This was in Xorg.0.log.old:

[   476.698] (EE) 0: /usr/bin/Xorg (xorg_backtrace+0x49) [0x81eb889]
[   476.698] (EE) 1: /usr/bin/Xorg (0x8048000+0x1a7806) [0x81ef806]
[   476.698] (EE) 2: linux-gate.so.1 (__kernel_rt_sigreturn+0x0) [0xb779940c]
[   476.698] (EE) 3: /usr/lib/libdricore9.1.3.so.1 (_mesa_VertexAttrib2dvNV+0x36) [0xb68ecc96]
[   476.698] (EE) 4: /usr/lib/xorg/modules/extensions/libglx.so (0xb6e7c000+0x10a1a) [0xb6e8ca1a]
[   476.698] (EE) 5: /usr/lib/xorg/modules/extensions/libglx.so (0xb6e7c000+0x43fce) [0xb6ebffce]
[   476.698] (EE) 6: /usr/lib/xorg/modules/extensions/libglx.so (0xb6e7c000+0x472db) [0xb6ec32db]
[   476.699] (EE) 7: /usr/bin/Xorg (0x8048000+0x379dd) [0x807f9dd]
[   476.699] (EE) 8: /usr/bin/Xorg (0x8048000+0x24e2a) [0x806ce2a]
[   476.699] (EE) 9: /lib/libc.so.6 (__libc_start_main+0xf5) [0xb72dd525]
[   476.699] (EE) 10: /usr/bin/Xorg (0x8048000+0x251f9) [0x806d1f9]
Comment 1 Hrvoje Senjan 2013-06-29 12:23:43 UTC
Created attachment 80849 [details]
glxinfo
Comment 2 Hrvoje Senjan 2013-06-29 12:25:43 UTC
Created attachment 80850 [details]
KWin support info (with XRender backend)
Comment 3 Martin Flöser 2013-06-29 12:29:08 UTC
sorry, not much we can do.  KWin should not be able to crash X at all (privilege escalation). Also we do not see where exactly in KWin it crashes. All we see from the crash is that it's related to glx. Unfortunately we need to access glx to figure out which driver it is. So in order to figure out whether it's save we need to call the unsafe functionality.

The good part is: KWin can crash X only once. We have protection against such breakage.
Comment 4 Hrvoje Senjan 2013-06-29 12:32:28 UTC
OK, as said it is upstream bug. (Reported only as this worked with 4.10.x - yes, poorly on this poor chip, but no crashing).
Sorry for bothering ;-)
Comment 5 Thomas Lübking 2013-06-29 12:42:10 UTC
> OpenGL renderer string: Mesa DRI Intel(R) 852GM/855GM x86/MMX/SSE2

> glLegacy: false

-> Does it also happen with setting compositor to "OpenGL 1.2"?
Fwwi: xrender provides likely much better performance on the chip anyway.
Comment 6 Hrvoje Senjan 2013-06-29 17:12:08 UTC
(In reply to comment #5)
> -> Does it also happen with setting compositor to "OpenGL 1.2"?
Yes. That's the one i tried

> Fwwi: xrender provides likely much better performance on the chip anyway.
Yes, it does. Only in case of 4.11 it's the only composited experience on this chip :-)
Comment 7 Thomas Lübking 2013-06-29 20:28:38 UTC
(In reply to comment #6)

> Yes. That's the one i tried
Hum? - At least the support infor seems to suggest different:

> glLegacy: false

> Yes, it does. Only in case of 4.11 it's the only composited experience on
> this chip :-)

tried kwin_gles? Afaik it should definitly outperform GL 2.0 on that chip and maybe even GL 1.2
Comment 8 Hrvoje Senjan 2013-06-30 10:17:20 UTC
(In reply to comment #7)
> Hum? - At least the support infor seems to suggest different:
> 
> > glLegacy: false
Aa, i guess i should explained better. I've tried both GL1.2 and GL2, however, after getting those crashes, i've removed kwinrc (thought maybe some leftover cruft was present)

> tried kwin_gles? Afaik it should definitly outperform GL 2.0 on that chip
> and maybe even GL 1.2
Will try that one also. Though, with a clean kwinrc, plasma seems to load rather long time (waited 3,4 minutes, then given up).  Also, it starts normally with OpenGLIsUnsafe=true
Comment 9 Hrvoje Senjan 2013-06-30 10:57:15 UTC
Created attachment 80862 [details]
Output when trying to activate openGL1/legacy

OK. Updated to latest git snapshots of Mesa/X, and the crashes are gone. Still i'm not able to activate opengl compositing.  Attached output with MESA_DEBUG (yes, not really useful)
Thomas, no need to bother with this one. If i would got it running, probably the performance would be really poor anyway. Thanks for helping anyway! :-)
Comment 10 Thomas Lübking 2013-06-30 11:10:57 UTC
> kwin(5043) KWin::GlxBackend::init: Direct rendering: false
This seems bad to begin with.
Do you have an evironment for this set? (LIBGL_ALWAYS_INDIRECT)
Comment 11 Christoph Feck 2013-06-30 11:17:06 UTC
The performance of OpenGL on i945 was okey in 4.10, actually a bit faster than XRender on my system.
Comment 12 Hrvoje Senjan 2013-06-30 11:30:42 UTC
(In reply to comment #10)
> > kwin(5043) KWin::GlxBackend::init: Direct rendering: false
> This seems bad to begin with.
> Do you have an evironment for this set? (LIBGL_ALWAYS_INDIRECT)
No.  At least i can't see it. Tried with LIBGL_ALWAYS_INDIRECT=no, that seems to activates openGL, however, no decos are drawn (tried both graphicssystems)

(In reply to comment #11)
> The performance of OpenGL on i945 was okey in 4.10, actually a bit faster
> than XRender on my system.

That could be, but that's already a much newer chip than this one here ;-)
Comment 13 Thomas Lübking 2013-06-30 11:50:38 UTC
(In reply to comment #11)
> The performance of OpenGL on i945 was okey in 4.10, actually a bit faster
> than XRender on my system.
2Gen (830) ./. 3Gen (945) - not even comparable architecture =)
(and initially, i even feared for some 810 chip, ie 1Gen)
eg. 2Gen has no HW T&L

(In reply to comment #12)
> No.  At least i can't see it. Tried with LIBGL_ALWAYS_INDIRECT=no, that
There's also KWIN_DIRECT_GL, set it to "1" to enforce direct rendering.

> seems to activates openGL, however, no decos are drawn (tried both
> graphicssystems)
Drawn or available? Do they appear when you suspend the compositor and can you interact with them (despite invisible)
Does that apply to all decorations?
Comment 14 Christoph Feck 2013-06-30 11:56:29 UTC
Here, the last good SHA was 1035a20b4ec89779992514cb4141a7f8af312873, and it failed to work after the commits including 2938b3548572442f74af613f2f8776f6e478578d.

Fredrik asked me to bisect the change that broke it, but understanding I have to recompile/reboot after each change, I was too lazy to do it (also considering that this is my work machine).

Hrvoje, if you feel bored...

> That could be, but that's already a much newer chip than this one here ;-)

Oh, right, I was thinking 955, not 855...
Comment 15 Hrvoje Senjan 2013-06-30 12:02:12 UTC
> (In reply to comment #12)
> > No.  At least i can't see it. Tried with LIBGL_ALWAYS_INDIRECT=no, that
> There's also KWIN_DIRECT_GL, set it to "1" to enforce direct rendering.
That creates
kwin(1165) kdemain: KWIN_DIRECT_GL set, not forcing LIBGL_ALWAYS_INDIRECT=1
and seems KWin is stuck, e.g. can't move windows, etc. Can suspend compositing though.

> > seems to activates openGL, however, no decos are drawn (tried both
> > graphicssystems)
> Drawn or available? Do they appear when you suspend the compositor and can
> you interact with them (despite invisible)
> Does that apply to all decorations?
Yes - i've tried oxygen, laptop, and Plastik
Comment 16 Hrvoje Senjan 2013-06-30 12:09:07 UTC
(In reply to comment #13)
> Drawn or available? Do they appear when you suspend the compositor and can
> you interact with them (despite invisible)
> Does that apply to all decorations?

Drawn. Will attach a screenshot how oxygen looks when one grabs a window (they are drawn correctly with suspended compositing)
Comment 17 Hrvoje Senjan 2013-06-30 12:12:07 UTC
Created attachment 80865 [details]
Screenshot

appearence with openGL1

(In reply to comment #14)
> Hrvoje, if you feel bored...
I guess a vacation would be in order to manage bisecting on this machine :-)
Will see if i can try it
Comment 18 Thomas Lübking 2013-06-30 12:23:44 UTC
tried on a GMA945 - latest KWin apparently completely break GLX and GLES is "glitchy" (graphics errors to degree of being unusable)
Comment 19 Hrvoje Senjan 2013-06-30 12:42:37 UTC
(In reply to comment #17)
> Will see if i can try it
In 15mins, cmake reports [ 14%] . Sorry, i really can't bisect every commit in between. Choosed a "random" one, to maybe narrow it down,  but can't help further. If it would be borked for ati, i could utilize my "main" machine, there it could be done.
Comment 20 Thomas Lübking 2013-06-30 13:00:06 UTC
This seems some mess up about direct rendering.
If I enforce indirect rendering:

KWIN_DIRECT_GL= LIBGL_ALWAYS_INDIRECT=1 kwin --replace &

everything is fine (and direct rendering accurately reported as "false")

But if if *try* to enforce direct rendering

KWIN_DIRECT_GL=1 LIBGL_ALWAYS_INDIRECT= kwin --replace &

direct rendering is incorrectly (?!) reported "no" while everything works

Also everything works if i call (unsetting env)
KWIN_DIRECT_GL= LIBGL_ALWAYS_INDIRECT= kwin --replace &
Direct rendering is still reported "no"

Now, if i just omit any environment:
kwin --replace &

There's no rendering/update and direct rendering is still reported as "no".
Comment 21 Thomas Lübking 2013-06-30 13:44:13 UTC
@Fredrik
This is because of

ctx = glXCreateContextAttribsARB(display(), fbconfig, 0, direct, attribs_legacy);

in glxbackend.cpp - could be that intel/mesa falsely announces glXCreateContextAttribsARB - or the attributes ("0") may be wrong/insufficient?

@Everone else: try commenting those lines (glxbackend.cpp:~190) and see whether that fixes it for you (Christoph: i assume "yes", Hrvoje: i have no idea ;-)

//        if (!ctx)
//            ctx = glXCreateContextAttribsARB(display(), fbconfig, 0, direct, attribs_legacy);
Comment 22 Thomas Lübking 2013-06-30 13:58:07 UTC
const int attribs_legacy[] = {
            GLX_CONTEXT_FLAGS_ARB, GLX_CONTEXT_FORWARD_COMPATIBLE_BIT_ARB, GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB, 0
        };

Seems to do instead as well, but i've no idea on "legacy" implications.
Comment 23 Fredrik Höglund 2013-06-30 15:40:34 UTC
The backtrace shows that the crash happened in glVertexAttrib2dvNV(). But it's actually the indirect dispatch table in the server that's broken, since kwin never calls this function.

The function kwin actually called is glFramebufferTexture2D(), and the crash happened when glVertexAttrib2dv() tried to dereference the pointer argument, which in reality isn't a pointer, but a GLenum set to GL_COLOR_ATTACHMENT0.

The following change probably fixes it:

--- a/kwin/paintredirector.cpp
+++ b/kwin/paintredirector.cpp
@@ -29,6 +29,7 @@ DEALINGS IN THE SOFTWARE.
 #include "deleted.h"
 #include "effects.h"
 #include <kwinglutils.h>
+#include <kwinglplatform.h>
 #include <kwinxrenderutils.h>
 #include <kdebug.h>
 #include <QPaintEngine>
@@ -322,7 +323,7 @@ OpenGLPaintRedirector::OpenGLPaintRedirector(Client *c, QWidget *widget)
     for (int i = 0; i < TextureCount; ++i)
         m_textures[i] = NULL;
 
-    if (!s_fbo && GLRenderTarget::supported())
+    if (!s_fbo && GLRenderTarget::supported() && GLPlatform::instance()->isDirectRendering())
         glGenFramebuffers(1, &s_fbo);
 
     PaintRedirector::resizePixmaps();
Comment 24 Hrvoje Senjan 2013-06-30 18:20:57 UTC
I'll try the patches. But, at earliest sometimes tommorow
Comment 25 Thomas Lübking 2013-06-30 19:42:49 UTC
(In reply to comment #23)
> -    if (!s_fbo && GLRenderTarget::supported())
> +    if (!s_fbo && GLRenderTarget::supported() &&
> GLPlatform::instance()->isDirectRendering())

Leaving the crash aside, this does *not* fix the issue of the GMA945 (and apparently also the i830) dropping into (rather unwanted anyway) indirect rendering what seems to be the root cause of effective troubles.

By this the patch does also *not* fix the issue, that this indirect context created by glXCreateContextAttribsARB (with an only "0" attribute list) does not paint/update on the GMA945 at all. See comments #21 & #22

Whatever the issue in the server for this might be, the current state breaks a (still quite important?) target architecture - i can open a new bug in doubt ;-)
Comment 26 Hrvoje Senjan 2013-07-01 01:45:27 UTC
(In reply to comment #25)
> Whatever the issue in the server for this might be, the current state breaks
> a (still quite important?) target architecture - i can open a new bug in
> doubt ;-)

Maybe i'm not assuming correctly, but this affects all of opengl legacy?

When trying https://bugs.kde.org/show_bug.cgi?id=314602#c48
->kwin(2290) KWin::GlxBackend::initBuffer: Buffer visual (depth  24 ): 0x "29"
OpenGL vendor string:                   ATI Technologies Inc.
OpenGL renderer string:                 ATI Mobility Radeon HD 3650
OpenGL version string:                  2.1 (3.3.11672 Compatibility Profile Context)
OpenGL shading language version string: 
Driver:                                 Catalyst
Driver version:                         2.1
GPU class:                              R600
OpenGL version:                         2.1
GLSL version:                           0.0
X server version:                       1.12.4
Linux kernel version:                   3.9.8
Direct rendering:                       no
Requires strict binding:                yes
GLSL shaders:                           yes
Texture NPOT support:                   yes
Virtual Machine:                        no
kwin(2290) KWin::GlxBackend::init: Direct rendering: false
kwin(2290) KWin::checkGLError: GL error ( Init ):  "GL_INVALID_ENUM" 
kwin(2290): OpenGL 1 compositing setup failed

With KWIN_DIRECT_GL=1 i get working openGL1, with LIBGL_ALWAYS_INDIRECT=1/0 no openGL1 compositing, with "as is" also no openGL1 compositing (the output above)

Will test patch from comment 21
Comment 27 Hrvoje Senjan 2013-07-01 02:12:56 UTC
(In reply to comment #26)
> Will test patch from comment 21
Did not change anything here (that is with ati card/catalyst)
Comment 28 Thomas Lübking 2013-07-01 13:00:22 UTC
(In reply to comment #27)
> Did not change anything here (that is with ati card/catalyst)

"Older"™ Catalyst drivers (most of those before 9.0, as reported by the ati config tool) "legally" use indirect rendering because of various reported bugs.

It seems indirect rendering broke "somewhen" (in either kwin or mesa or wherever) and the (i hope: falsely) acquired indirect context only exopsed that.
The "patches" from comment #21 or #22 should on the i830 chip lead to a direct context and a working system. Same for the OSS radeon driver.
...Hopefully.

(In reply to comment #26)
> Maybe i'm not assuming correctly, but this affects all of opengl legacy?
No, you are.
At least it seems every driver that has GLX_ARB_create_context (presumingly: *all*) and calls it with the "legacy" attribute list might struggle on this by getting you an indirect context (doesn't happen on the nvidia blob, but afaics the blob simply never creates indirect contexts in the first place)
Comment 29 Hrvoje Senjan 2013-07-01 15:34:02 UTC
(In reply to comment #14)
> Here, the last good SHA was 1035a20b4ec89779992514cb4141a7f8af312873, and it
This one also creates a non-working openGL1 with legacy catalyst

kwin(11923): Could not find a framebuffer configuration for depth 32. 
kwin(11923) KWin::OpenGLBackend::setFailed: Creating the OpenGL rendering failed:  "Could not initialize the drawable configs" 
kwin(11923): Failed to initialize compositing, compositing disabled
the same with enforcing KWIN_DIRECT_GL=1
Comment 30 Thomas Lübking 2013-07-01 18:49:17 UTC
(In reply to comment #29)
> > Here, the last good SHA was 1035a20b4ec89779992514cb4141a7f8af312873, and it
> This one also creates a non-working openGL1 with legacy catalyst
> 
> kwin(11923): Could not find a framebuffer configuration for depth 32. 

Likely bug #317972 - unrelated ;-)
Comment 31 Fredrik Höglund 2013-07-01 21:11:31 UTC
(In reply to comment #22)
> const int attribs_legacy[] = {
>             GLX_CONTEXT_FLAGS_ARB, GLX_CONTEXT_FORWARD_COMPATIBLE_BIT_ARB,
> GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB, 0
>         };
> 
> Seems to do instead as well, but i've no idea on "legacy" implications.

This is not a valid attribute list, so it causes glXCreateContextAttribs() to fail and kwin to fall back to glXCreateNewContext(). The attribute list is not even zero terminated.

(In reply to comment #25)
> (In reply to comment #23)
> > -    if (!s_fbo && GLRenderTarget::supported())
> > +    if (!s_fbo && GLRenderTarget::supported() &&
> > GLPlatform::instance()->isDirectRendering())
> 
> Leaving the crash aside, this does *not* fix the issue of the GMA945 (and
> apparently also the i830) dropping into (rather unwanted anyway) indirect
> rendering what seems to be the root cause of effective troubles.

The patch is meant to fix the issue this bug report is about.

I find it pretty annoying when people discuss different problems in the same bug report, since it inevitably leads to confusion about which issue comments refer to. And you don't know what has actually been fixed when the bug report is closed. 

> By this the patch does also *not* fix the issue, that this indirect context
> created by glXCreateContextAttribsARB (with an only "0" attribute list) does
> not paint/update on the GMA945 at all. See comments #21 & #22
> 
> Whatever the issue in the server for this might be, the current state breaks
> a (still quite important?) target architecture - i can open a new bug in
> doubt ;-)

That regression has been in master for at least two months. During that time only a single person has reported it, and that person (a KDE developer) was unwilling to run kwin in konsole and answer the simple question of whether kwin prints any error messages or not.

So an important architecture for whom I wonder? I'm not sure it is for the ones who still use it.

(In reply to comment #28)
> At least it seems every driver that has GLX_ARB_create_context (presumingly:
> *all*) and calls it with the "legacy" attribute list might struggle on this
> by getting you an indirect context (doesn't happen on the nvidia blob, but
> afaics the blob simply never creates indirect contexts in the first place)

If glXCreateNewContext() is working for you, then maybe you should try stepping through glXCreateContextAttribs() in gdb and see if you can figure out where things go wrong.

Keep in mind that glXIsDirect() returns false when it fails, for example because the context is invalid.

I would also look into the stream of GL_INVALID_ENUM and GL_INVALID_OPERATION errors 
in the debug output.
Comment 32 Thomas Lübking 2013-07-01 22:30:24 UTC
(In reply to comment #31)
see bug #321843
Comment 33 Christoph Feck 2013-07-02 00:22:34 UTC
I can confirm that the patch from comment #21 works on my machine. It fixes both the X server crash with Mesa 9.1.3 as well as the display freezing I saw with older Mesa 9.0.x from openSUSE 12.3. Didn't test the other patches.

> answer the simple question of whether kwin prints any error messages or not.

I told you that the display freezes, so I couldn't see any output. Only rebooting resolved it.
Comment 34 Fredrik Höglund 2013-07-02 00:57:31 UTC
(In reply to comment #33)
> I told you that the display freezes, so I couldn't see any output. Only
> rebooting resolved it.

IIRC I asked which commit introduced the regression, if there were any errors printed on stderr, if it happens with both kwin and kwin_gles, and whether the compositing type setting made any difference.

You replied that you couldn't test any of these things because you had now installed an older snapshot. And that was the last I heard from you on the subject.

Toggling compositing with the hotkey should get the display back, and reveal any debug output.
Comment 35 Hrvoje Senjan 2013-07-02 19:03:27 UTC
Upstream report
https://bugs.freedesktop.org/show_bug.cgi?id=66359
(about original issue - crash)