Bug 241402

Summary: kwin freezes when changing related settings in systemsettings while compositing is active
Product: [Plasma] kwin Reporter: Rainer Kastl <rainer.kastl>
Component: compositingAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED UPSTREAM    
Severity: normal CC: anshulajain, bjoern, both1970, cruzki123, disp.reg.bugs.kde, echukwuogor, finex, fredrik, ggrabler, glad.deschrijver, iansamit, ipstanistreet, jay, jpsinthemix, kate_baggins, kde, kde, marcus, mboquien, micuintus, msnkipa, over1pixel, pete, rahul, rdieter, ronald, schiv, shane, sven.burmeister, wengxt, wstephenson, yogeshm.007
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: kwin output
Log from the freeze
partial .xsession-errors log
Patch to fix the freezes in combination with reverting rev 1137668
241402 patch
kdebase-workspace-4.5.1-compositingprefs_detect.patch
glx detect/contect setting test code
Another possible fix
Idea to explore for sidestepping Mesa bug(s)
compositing settings fix and resource cleanup

Description Rainer Kastl 2010-06-11 11:37:23 UTC
Created attachment 47891 [details]
kwin output

Version:           unspecified (using Devel) 
OS:                Linux

I´m using [kde-unstable] from archlinux (4.4.85/KDE 4.5 Beta 2).

Whenever I change settings in systemsettings related to kwin (like switching the default tiling layout), kwin freezes. Mouse cursor is still visible and can be moved, but clicks aren´t registered and the display does not get updated anymore. Video stops playing, but audio continues. This only happens when compositing is activated.

Attached is the kwin log, the effects start to unload when I press Ctrl-C in the terminal.

Reproducible: Always

Steps to Reproduce:
1. Make sure compositing is activated.
2. Start systemsettings and change a value related to kwin.
3. Click apply.

Actual Results:  
Display stops updating, clicks are not registered.

Expected Results:  
No freeze. ;)

xorg-server: 1.8.1-1
xf86-video-ati: 6.13.0-1
mesa, ati-dri, libgl: 7.8.1-3
Comment 1 Rainer Kastl 2010-06-11 11:40:23 UTC
Only happens when using OpenGL, XRender is fine.
Comment 2 Carlos Berroterán 2010-06-12 02:35:58 UTC
I'm getting the same problem. Using Arch and the kde-unstable repository with an Intel GMA965 (X3100). I get the freeze if I try to change the fonts or the window decoration, too.

xf86-video-intel: 2.11.0-2
Other packages are the same ones of the reporter.
Comment 3 Rainer Kastl 2010-06-22 15:21:30 UTC
Created attachment 48226 [details]
Log from the freeze

Log with debug messages.
Comment 4 Christoph Feck 2010-06-22 15:33:56 UTC
I updated this weekend from openSUSE 11.2 (which had xserver 1.7.x, Mesa 7.6 I think) to openSUSE 11.3 RC1 (which has xserver 1.8, Mesa 7.8), WITHOUT recompiling KDE from trunk, and I sometimes get freezes now, too. I am pretty sure it is an upstream bug (in X11 server or Mesa).
Comment 5 Ekeluo Chukwuogor 2010-07-12 22:20:01 UTC
I'm about to commit ****** because of this. I open warzone2100 (just launch by typing warz in krunner, don't need to see) then exit after menu shows, this kwin back on track but does nothing good for memory consumption. Just about anything related to kwin (style, theme, effect, decoration) causes this for me. running rc1 on kubuntu 10.04.1.
Comment 6 Thomas Lübking 2010-07-12 22:41:19 UTC
just suspend/resume compositing instead (SHIFT+ALT+F12)

if this is reproducable to you like to the OP, please try to just call

qdbus org.kde.kwin /KWin reconfigure

also
-> test another decoration and
-> disable all effect plugins to see whether the issue remains then.

"style" (and maybe "theme" as of plasma-desktop "theme" as well) are not kwin related, btw.
Comment 7 Ekeluo Chukwuogor 2010-07-22 14:26:35 UTC
Calling 'qdbus org.kde.kwin /KWin reconfigure' causes it everytime. Changing decos didn't affect it (tried bespin, polyester).

Thanks for the suspend/resume tip though.
Comment 8 Weng Xuetian 2010-08-12 21:29:39 UTC
Happened on my ATI HD 3450 with open source driver (kernel 2.6.34, 2.6.35, mesa 7.8.1 or 7.9-git)
But kwin works well without this issue with my Nvidia Go 6200 (offical driver).

...But this is not happened with 4.4 though. Maybe some new opengl api with some issue introduced in 4.5 cause this.
Comment 9 Thomas Lübking 2010-08-12 21:43:56 UTC
see comment #4
Comment 10 Weng Xuetian 2010-08-12 21:55:39 UTC
(In reply to comment #9)
> see comment #4

I also think it's a upstream issue, but can this problem solved in a similar way as #243181 https://bugs.kde.org/show_bug.cgi?id=243181?

I think quite a lot cards will be affected by this problem...
Comment 11 Thomas Lübking 2010-08-12 22:14:49 UTC
If the bug is in mesa we'd likely have to blacklist _every_ driver except nvidia... 
doesn't sound like an option to me :-)

also the blacklist is intended for weak GPUs, not broken drivers.

plus at least for the intel driver it seems as if there're a really sever issue that breaks when using two GL contexts in a short time frame (ie. you can segfault kwin by launching glxgears or so...) so the driver needs to be fixed anyway, since this is no more a kwin issue...
Comment 12 Martin Flöser 2010-08-12 23:17:21 UTC
> also the blacklist is intended for weak GPUs, not broken drivers.
It's also meant for broken drivers, that's why the version is included. But 
it's done in a way that assumes that a new driver version will fix the 
problems.
Comment 13 Ian Smith 2010-08-14 08:51:29 UTC
I have the same problem.

BTW, there is some confusion out in the forums between this bug and https://bugzilla.novell.com/show_bug.cgi?id=615649. Is there a relationship?
Comment 14 Weng Xuetian 2010-08-14 10:03:11 UTC
I'm not sure... but my card is ATI HD 3450.

Is any nvidia user who use nouveau encounter this problem? Maybe this can help us determine whether it is related to mesa or a specific dri driver.
Comment 15 John Stanley 2010-08-15 11:25:15 UTC
(In reply to comment #14)
> I'm not sure... but my card is ATI HD 3450.
> 
> Is any nvidia user who use nouveau encounter this problem? Maybe this can help
> us determine whether it is related to mesa or a specific dri driver.

(In reply to comment #0)
> Created an attachment (id=47891) [details]
> kwin output
> 
> Version:           unspecified (using Devel) 
> OS:                Linux
> 
> I´m using [kde-unstable] from archlinux (4.4.85/KDE 4.5 Beta 2).
> 
> Whenever I change settings in systemsettings related to kwin (like switching
> the default tiling layout), kwin freezes. Mouse cursor is still visible and can
> be moved, but clicks aren´t registered and the display does not get updated
> anymore. Video stops playing, but audio continues. This only happens when
> compositing is activated.
> 
> Attached is the kwin log, the effects start to unload when I press Ctrl-C in
> the terminal.
> 
> Reproducible: Always
> 
> Steps to Reproduce:
> 1. Make sure compositing is activated.
> 2. Start systemsettings and change a value related to kwin.
> 3. Click apply.
> 
> Actual Results:  
> Display stops updating, clicks are not registered.
> 
> Expected Results:  
> No freeze. ;)
> 
> xorg-server: 1.8.1-1
> xf86-video-ati: 6.13.0-1
> mesa, ati-dri, libgl: 7.8.1-3


I build i686-GNU/Linux systems, and I saw the same behavior with kde 4.4.95 (Linux-2.6.35/gcc-4.5.1/glibc-2.12.1/xorg-server 1.8.2/Mesa-7.8.2/xf86-video-intel.2.12.0), and now see the same with kde 4.5.0. Downgrading the intel driver to 2.11.0 and 2.10.0 had no effect.

Note that for kde 4.4.{2,3,4,5} (Linux-2.6.33.7/gcc-4.4.3/glibc-2.11.1/xorg-server 1.8.1/Mesa-7.8.1/xf86-video-intel.2.10.0), this problem was not present.

I have an Intel Corporation 82915G/GV/910GL Integrated Graphics Controller.
Comment 16 John Stanley 2010-08-15 13:01:20 UTC
Created attachment 50575 [details]
partial .xsession-errors log 

Attached is a partial .xsession-errors log related to this issue (there were no associated errors in Xorg.0.log)
Comment 17 John Stanley 2010-08-15 13:05:02 UTC
Comment on attachment 50575 [details]
partial .xsession-errors log 

This is a segment of the partial .xsession-errors file resulting when this bug occurs. There were no associated Xorg.0.log errors.
Comment 18 John Stanley 2010-08-15 13:11:31 UTC
Oh, forgot to give the Qt version: with kde 4.4.95 I used Qt-4.7rc2, and with kde 4.5.0 I downgraded to Qt-4.6.3 hoping that would resolve the problem, but nope.
Comment 19 Thomas Lübking 2010-08-15 13:46:40 UTC
(In reply to comment #18)
> Oh, forgot to give the Qt version: with kde 4.4.95 I used Qt-4.7rc2, and with
> kde 4.5.0 I downgraded to Qt-4.6.3 hoping that would resolve the problem, but
> nope.

as comment #4 mentions:
this has nothing to do with the KDE or Qt version but is somewhere in xorg-server, mesa or the driver.
Up-or-downgrading KDE won't help.
Try to disable kms by passing "i915.modeset=0" to the kernel in grub
Comment 20 Glad Deschrijver 2010-08-16 00:03:40 UTC
I have the same bug of the original poster with one difference: it does not necessarily occur when using systemsettings, it happens when I do anything (or even nothing) after a few minutes after I started KDE.  The keyboard also stops working.  In Juk the music continues to play until the end of the song, the following song is not started.  When I remove ~/.kde and restart KDE it is even worse because then the freeze happens even sooner and the mouse cannot be moved anymore.  I am using KDE 4.5.0 from KDEmod/Arch Linux on x86_64 with kernel 2.6.34 and NVIDIA GeForce GT 240/PCI/SSE2, OpenGL version 3.3.0 NVIDIA 256.44 using the proprietary nvidia driver.  Disabling the blur desktop effect solved the problem until I revisited the "Desktop Effects" configuration in systemsettings and tried to close systemsettings after changing nothing (I don't know if this is persistent since my motivation to test this further is abysmal after already having rebooted 10 times this evening).  I had none of these problems in KDE 4.4.5.  The Qt version was and is 4.6.3, xorg-server version was and is 1.8.1, mesa version was and is 7.8.2.  Yes, these three packages didn't change during the upgrade from KDE 4.4.5 to KDE 4.5.0, they were already installed since 4 July and I upgraded to KDE 4.5.0 only today 15 August (I never used the betas or RCs).  This contradicts the conclusion of comment #4.
Comment 21 Glad Deschrijver 2010-08-16 00:04:19 UTC
*** This bug has been confirmed by popular vote. ***
Comment 22 Thomas Lübking 2010-08-16 01:09:09 UTC
comment #20 sounds more like bug #247839 (the failure on closing systemsettings could be just random)

However, given that mouse and kbd (including ctrl+alt+backspace, "zapping") are inoperative (contradicting this OP) and you won't be able to "fix" it by suspeding/resuming compositing (?!) this is a server halt.

if you can ssh into that machine from another one, there's a good chance that you can either kill the server or at least get a "clean" shutdown.
also you can look for the X11 cpu usage and check dmesg for Xid entries (gpu errors likely says "NVRM")
Also have a short look at your gpu temp (nvidia-settings -q GPUCoreTemp) - just ruling out it's gotten hot wherever you live ;-)

Next you should determine that it's caused/induced by kwin's (GL) compositing (deactivate it or launch another WM, like "openbox --replace &")
Iff this "fixes" the issue, just deactivate all effects to bisect if a specific one is causing this or it's the general rendering.
Comment 23 John Stanley 2010-08-16 13:05:19 UTC
(In reply to comment #22)
> comment #20 sounds more like bug #247839 (the failure on closing systemsettings
> could be just random)
> 
> However, given that mouse and kbd (including ctrl+alt+backspace, "zapping") are
> inoperative (contradicting this OP) and you won't be able to "fix" it by
> suspeding/resuming compositing (?!) this is a server halt.
> 
> if you can ssh into that machine from another one, there's a good chance that
> you can either kill the server or at least get a "clean" shutdown.
> also you can look for the X11 cpu usage and check dmesg for Xid entries (gpu
> errors likely says "NVRM")
> Also have a short look at your gpu temp (nvidia-settings -q GPUCoreTemp) - just
> ruling out it's gotten hot wherever you live ;-)
> 
> Next you should determine that it's caused/induced by kwin's (GL) compositing
> (deactivate it or launch another WM, like "openbox --replace &")
> Iff this "fixes" the issue, just deactivate all effects to bisect if a specific
> one is causing this or it's the general rendering.

Hi folks,
Ok, for kde-4.5.0, I downgraded all of X (libs/server/drivers), libdrm, and Mesa to the configuration I now use w/o problems under kde-4.4.5. The issue nder v4.5.0 remains. My feeling is that its most likely a kdelibs/kdebase-workspace issue. Also, in the .xsession-errors log I posted earlier, the message:
  systemsettings(3100) EventListener::eventFilter: User of KWidgetItemDelegate should not delete widgets created by createWidgets!

occurs many times when this 'hang' happens. Perhaps its related to bug: 238864, Comment #7 ?
Comment 24 Glad Deschrijver 2010-08-17 00:37:34 UTC
In reply to comment #22:

I only have one computer so I cannot ssh into it from another machine.  Top doesn't show disturbing behavior of X11 before this happens (it happens totally unexpectedly, there is no exaggerated CPU usage before it happens, nor any sluggishness).  I cannot find Xid entries in the dmesg or other log files.  Yesterday after the 10 reboots, I disabled the blur effect and the freeze only occurred once when closing systemsettings (that was immediately after rebooting for the 10th time), but after that I worked for more than one hour without a freeze.  Today the freeze only happens when I activate the blur effect (a few minutes after activating it), when I disable the blur effect the freeze doesn't occur anymore.
Comment 26 John Stanley 2010-08-17 13:55:01 UTC
I'm now in the process of rebuilding 4.5 on a nicely working intel 915 system now using v4.4.5, without any xorg/libdrm/mesa changes. We'll see how it goes. As far as bug #243181, I suspect it not related. Anyway, I don't really think "blacklisting" video cards is an appropriate approach; a better approach would be to "blacklist" kde-4.5, downgrade to 4.4.5, and wait for 4.5.2 or 4.5.3. Anyway, 'nuf of that. In actuality, for me (w/intel 915 card), all desktop effects I've tried are working fine as usual (luv the rolling cube for desktop navigation); its simply that whenever I change a desktop effects setting via "apply", the system hangs--when I keyboard-toggle desktop effects off and then back on, all is well, with changes intact. Nevertheless, in my judgment 4.5 is not stable enough for general use. See bug #246498 and #247839 -- a lot of people are having similiar problems on a variety of video cards..
Comment 27 Thomas Lübking 2010-08-17 16:26:53 UTC
(In reply to comment #26)
> As far as bug #243181, I suspect it not related.
you're not experiencing bug #20 - that's entirely differen and has nothing to do with the original bug. it should not even be here :-)

> Anyway, I don't really think "blacklisting" video cards is an appropriate approach; 
Frankly, I personally HATE it but it was the last solution we (ok martin ;-P ) could get into the release when becoming aware of the amount of trouble causing drivers/open gl implementations regarding those two shader effects.

what would be required was an external stress test application to see what your gpu/driver combination can do atm. This is however NOT related to this bug at all.

> a better approach would be to "blacklist" kde-4.5, downgrade to 4.4.5, and wait for 4.5.2 or 4.5.3.
That'd be then your distros job and actually some distros seem to do so (because of these issues)

> its simply that whenever I change a desktop effects setting via "apply"
yes, that's this bug - see comments #4 & #6

bug #246498 looks like this one and also (apparently) only affects intel users (ignoring comment #20 here, which is not related) so it might be a dupe.
(there're other intel related bug reports regarding two gl contexts in a short time frmae, like playing an opengl game or so)
Comment 28 Thomas Lübking 2010-08-18 14:12:46 UTC
*** Bug 246498 has been marked as a duplicate of this bug. ***
Comment 29 Weng Xuetian 2010-08-18 17:30:54 UTC
#27
Also ati with open source drivers...
My friend and I all suffered with this problem, but we use different ati card. ati hd 3450 and 4300.
Comment 30 Thomas Lübking 2010-08-18 17:42:43 UTC
does anybody encountering this issue
- have a second computer
- sshd on the "broken" on
- basic gdb knowledge?

when the freeze occures (apparently one can trigger it if, then for sure)
- ssh into the "frozen" machine
- check for the kwin process "ps -A | grep kwin"
- attach to it "gdb", started enter "attach $pid", wait until debug libs etc. are loaded
- call a backtrace "bt" (if you've only a text terminal on the non frozen machine, you can "gdb 2>&1 | tee gdb.log" to dump the gdb session into a log file as well)
- don't forget to "detach", then "quit" gdb and unfreeze the frozen one

thanks
Comment 31 Weng Xuetian 2010-08-18 17:54:24 UTC
for my case, no need to sshd, because ctrl + alt + fn works....

and actually, kwin seems not freeze, because if you use your mouse to do something (though it will not displayed right), after twice alt shift f12, I can saw the result of my action...

Another information is, qdbus org.kde.kwin /KWin reconfigure is running again after freeze, kwin will crash and dmesg will shows:
radeon 0000:01:00.0: r600_cs_track_check:280 mask 0x0000000F | 0x0000000F no cb for 0
radeon 0000:01:00.0: r600_packet3_check:1108 invalid cmd stream 526
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
Comment 32 Thomas Lübking 2010-08-18 18:31:44 UTC
(In reply to comment #31)
> for my case, no need to sshd, because ctrl + alt + fn works....
there'd a chance that moving to a VT will resolve the freeze, but since alt+shift+f12 and WM actions are intercepted you'd only see the eventloop anyway...
 
> Another information is, qdbus org.kde.kwin /KWin reconfigure is running again
do you have that btrace? does kwin crash for you after running (and exiting) some opengl applications (w/o deacivating compositing)

> after freeze, kwin will crash and dmesg will shows:
does the repainting halt if you disable direct rendering (advanced tab) or pass "nomodeset" to the kernel in grub? (be aware that there're reports for this causing visual glitches, notably on font rendering, see this report on mesa: https://bugs.freedesktop.org/show_bug.cgi?id=28327)
Comment 33 John Stanley 2010-08-19 08:04:34 UTC
Followup to Comment #26:
Finished the build, and how depressing... the desktop settings 'apply' still fails in the same manner, but much worse, even though desktop effects are auto-enabled on first boot, and after login the screen fades in properly, desktop grid fails to 'take' most of the time, desktop cube doesn't work at all. Also, after applying a desktop effect settings change, the pc frequently dropped back to the kdm login, or hard hanged. The pc formerly had kde-4.4.5, with Linux-2.6.33.7, gcc-4.4.3, glibc-2.11.1, Qt-4.6.3, xorg-server-1.8.0, Mesa-7.8.1, libdrm-2.4.20, and f86-video-intel.2.10.0 and all worked great. All I did was remove v4.4.5, and built v4.5 (upgrading attica-0.1.3 to v0.1.4). The graphics card is Intel Corporation 82865G Integrated Graphics Controller (rev 02).

The pc I first build kde-4.5.0 on (my earlier comments), has Linux-2.6.35, gcc-4.5.1, glibc-2.12.1, Qt-4.6.3, xorg-server-1.8.2, Mesa-7.8.2, libdrm-2.4.21, and f86-video-intel.2.12.0.

I know that Linux video is currently a monumental mess, so this could simply be latent bugs in kernel/libdrm/Mesa/Xorg that have only surfaced kde-4.5.0. Unfortunately, my kde builds are w/o debug so I can't use gdb.

I'll probably be short on time presently, and I want to resurrect the v4.4.5 system, but if there's anything I might do to help.
Comment 34 Rainer Kastl 2010-08-19 10:36:03 UTC
Since I haven´t had any free time available since reporting this bug, I didn´t report the bug upstream. Can anyone make sure it is reported or create it if it isn´t?

Thanks!
Comment 35 Weng Xuetian 2010-08-19 10:42:33 UTC
Actually I think for most people they don't know how to describe this problem to upstream, especially if this is related to video card driver... I'd like to know that does anyone have the idea that which api of opengl cause this problem? Which api is introduced in kde 4.5.0, but not in kde 4.4.5?
Comment 36 both1970 2010-08-21 11:52:17 UTC
I have this problem on Slackware 13.1 using 4.5. I can return to normal if I run DISPLAY=:0 kwin --replace from console.
Comment 37 Thomas Lübking 2010-08-22 13:04:09 UTC
for a shot in the dark:
can anybody being able to reproduce this try to revert this commit:
    http://websvn.kde.org/?view=revision&revision=1137490
recompile and restart kwin, then try again?
Comment 38 John Stanley 2010-08-22 22:55:19 UTC
(In reply to comment #37)
> for a shot in the dark:
> can anybody being able to reproduce this try to revert this commit:
>     http://websvn.kde.org/?view=revision&revision=1137490
> recompile and restart kwin, then try again?

Yup, tried that. I simply created a patch to revert kwinglutils.cpp to v4.4.5, and it has no effect, so the problem unfortunately lies eleswhere.I also tried Linux-2.6.36-rc1--no change as well. I'm now about to rebuild v4.5.0 w/debug so I can use gdb, but it'll take several days. I do have a short backtrace which I got by rebuilding kdebase/-workspace/-runtime w/debug:

(gdb) bt
#0  0xffffe430 in __kernel_vsyscall ()
#1  0xb59764b1 in select () at ../sysdeps/unix/syscall-template.S:82
#2  0xb698a3d3 in qt_safe_select(int, fd_set*, fd_set*, fd_set*, timeval const*) () from /usr/lib/libQtCore.so.4
#3  0xb698e609 in QEventDispatcherUNIX::select(int, fd_set*, fd_set*, fd_set*, timeval*) () from /usr/lib/libQtCore.so.4
#4  0xb5f21dcc in ?? () from /usr/lib/libQtGui.so.4
#5  0xb698f476 in QEventDispatcherUNIXPrivate::doSelect(QFlags<QEventLoop::ProcessEventsFlag>, timeval*) ()
   from /usr/lib/libQtCore.so.4
#6  0xb69900f6 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4
#7  0xb5f22076 in ?? () from /usr/lib/libQtGui.so.4
#8  0xb6961889 in QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4
#9  0xb6961afa in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4
#10 0xb69663df in QCoreApplication::exec() () from /usr/lib/libQtCore.so.4
#11 0xb5e72ab7 in QApplication::exec() () from /usr/lib/libQtGui.so.4
#12 0xb77093bb in kdemain (argc=3, argv=0xbfc529a4)
    at /home/bld/kdebase-workspace-4.5.0-jps_src/kdebase-workspace-4.5.0/kwin/main.cpp:531
#13 0x0804874b in main (argc=3, argv=0xbfc529a4) at /home/bld/kdebase-workspace-4.5.0-jps_src/build/kwin/kwin_dummy.cpp:3
(gdb) detach
Detaching from program: /usr/bin/kwin, process 2221
(gdb) quit
Comment 39 Thomas Lübking 2010-08-22 23:28:11 UTC
yeah it keeps hanging  in the eventfilter.
next candidate to revert would then be http://websvn.kde.org/?view=rev&revision=1137668

check your glx version (NOT the glx client version) (glxinfo | grep -i version) and esp. if it's < 1.3 (intel -> yes :) give it a try =\
you'll however loose direct rendering until it's "officially" available ... "if"  :-(
Comment 40 Thomas Lübking 2010-08-22 23:30:30 UTC
errr.. sorry in case that was too ambigious:
waiting in the eventloop until sth. interesting happens is what applications usually do, no dead- or livelock, nothing frozen on the CPU (this is why shift+alt+f12 is still intercepted)
Comment 41 John Stanley 2010-08-23 07:11:45 UTC
(In reply to comment #40)
> errr.. sorry in case that was too ambigious:
> waiting in the eventloop until sth. interesting happens is what applications
> usually do, no dead- or livelock, nothing frozen on the CPU (this is why
> shift+alt+f12 is still intercepted)

glx ver is 1.4. Here's the output from glxinfo:
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
server glx extensions:
    GLX_ARB_multisample, GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, 
    GLX_OML_swap_method, GLX_SGI_make_current_read, GLX_SGI_swap_control, 
    GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
    GLX_SGIX_visual_select_group, GLX_INTEL_swap_event
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
client glx extensions:
    GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_allocate_memory, 
    GLX_MESA_copy_sub_buffer, GLX_MESA_swap_control, 
    GLX_MESA_swap_frame_usage, GLX_OML_swap_method, GLX_OML_sync_control, 
    GLX_SGI_make_current_read, GLX_SGI_swap_control, GLX_SGI_video_sync, 
    GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
    GLX_SGIX_visual_select_group, GLX_EXT_texture_from_pixmap, 
    GLX_INTEL_swap_event
GLX version: 1.4
GLX extensions:
    GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, 
    GLX_MESA_swap_control, GLX_OML_swap_method, GLX_OML_sync_control, 
    GLX_SGI_make_current_read, GLX_SGI_swap_control, GLX_SGI_video_sync, 
    GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
    GLX_SGIX_visual_select_group, GLX_EXT_texture_from_pixmap, 
    GLX_INTEL_swap_event
OpenGL vendor string: Tungsten Graphics, Inc
OpenGL renderer string: Mesa DRI Intel(R) 915G GEM 20100328 2010Q1 x86/MMX/SSE2
OpenGL version string: 1.4 Mesa 7.8.2
OpenGL extensions:
    GL_ARB_copy_buffer, GL_ARB_depth_texture, GL_ARB_draw_buffers, 
    GL_ARB_draw_elements_base_vertex, GL_ARB_fragment_program, 
    GL_ARB_half_float_pixel, GL_ARB_map_buffer_range, GL_ARB_multisample, 
    GL_ARB_multitexture, GL_ARB_pixel_buffer_object, GL_ARB_point_parameters, 
    GL_ARB_point_sprite, GL_ARB_provoking_vertex, GL_ARB_shader_objects, 
    GL_ARB_shading_language_100, GL_ARB_shading_language_120, GL_ARB_shadow, 
    GL_ARB_sync, GL_ARB_texture_border_clamp, GL_ARB_texture_compression, 
    GL_ARB_texture_cube_map, GL_ARB_texture_env_add, 
    GL_ARB_texture_env_combine, GL_ARB_texture_env_crossbar, 
    GL_ARB_texture_env_dot3, GL_ARB_texture_mirrored_repeat, 
    GL_ARB_texture_non_power_of_two, GL_ARB_texture_rectangle, 
    GL_ARB_transpose_matrix, GL_ARB_vertex_array_object, 
    GL_ARB_vertex_buffer_object, GL_ARB_vertex_program, GL_ARB_vertex_shader, 
    GL_ARB_window_pos, GL_EXT_abgr, GL_EXT_bgra, GL_EXT_blend_color, 
    GL_EXT_blend_equation_separate, GL_EXT_blend_func_separate, 
    GL_EXT_blend_logic_op, GL_EXT_blend_minmax, GL_EXT_blend_subtract, 
    GL_EXT_cull_vertex, GL_EXT_compiled_vertex_array, GL_EXT_copy_texture, 
    GL_EXT_draw_range_elements, GL_EXT_framebuffer_blit, 
    GL_EXT_framebuffer_object, GL_EXT_fog_coord, 
    GL_EXT_gpu_program_parameters, GL_EXT_multi_draw_arrays, 
    GL_EXT_packed_depth_stencil, GL_EXT_packed_pixels, 
    GL_EXT_pixel_buffer_object, GL_EXT_point_parameters, 
    GL_EXT_polygon_offset, GL_EXT_provoking_vertex, GL_EXT_rescale_normal, 
    GL_EXT_secondary_color, GL_EXT_separate_specular_color, 
    GL_EXT_shadow_funcs, GL_EXT_stencil_two_side, GL_EXT_stencil_wrap, 
    GL_EXT_subtexture, GL_EXT_texture, GL_EXT_texture3D, 
    GL_EXT_texture_cube_map, GL_EXT_texture_edge_clamp, 
    GL_EXT_texture_env_add, GL_EXT_texture_env_combine, 
    GL_EXT_texture_env_dot3, GL_EXT_texture_filter_anisotropic, 
    GL_EXT_texture_lod_bias, GL_EXT_texture_object, GL_EXT_texture_rectangle, 
    GL_EXT_vertex_array, GL_3DFX_texture_compression_FXT1, 
    GL_APPLE_client_storage, GL_APPLE_packed_pixels, 
    GL_APPLE_vertex_array_object, GL_APPLE_object_purgeable, 
    GL_ATI_blend_equation_separate, GL_ATI_texture_env_combine3, 
    GL_ATI_separate_stencil, GL_IBM_multimode_draw_arrays, 
    GL_IBM_rasterpos_clip, GL_IBM_texture_mirrored_repeat, 
    GL_INGR_blend_func_separate, GL_MESA_pack_invert, GL_MESA_ycbcr_texture, 
    GL_MESA_window_pos, GL_NV_blend_square, GL_NV_light_max_exponent, 
    GL_NV_packed_depth_stencil, GL_NV_texture_env_combine4, 
    GL_NV_texture_rectangle, GL_NV_texgen_reflection, GL_NV_vertex_program, 
    GL_NV_vertex_program1_1, GL_OES_read_format, GL_SGIS_generate_mipmap, 
    GL_SGIS_texture_border_clamp, GL_SGIS_texture_edge_clamp, 
    GL_SGIS_texture_lod, GL_SUN_multi_draw_arrays

32 GLX Visuals
   visual  x  bf lv rg d st colorbuffer ax dp st accumbuffer  ms  cav
 id dep cl sp sz l  ci b ro  r  g  b  a bf th cl  r  g  b  a ns b eat
----------------------------------------------------------------------
0x21 24 tc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0x22 24 dc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xbd 24 tc  0 24  0 r  .  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xbe 24 tc  0 24  0 r  y  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xbf 24 tc  0 24  0 r  y  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xc0 24 tc  0 24  0 r  .  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xc1 24 tc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xc2 24 tc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xc3 24 tc  0 32  0 r  .  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xc4 24 tc  0 32  0 r  y  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xc5 24 tc  0 32  0 r  y  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xc6 24 tc  0 32  0 r  .  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xc7 24 tc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xc8 24 tc  0 24  0 r  y  .  8  8  8  0  0 24  8 16 16 16  0  0 0 Slow
0xc9 24 tc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xca 24 tc  0 32  0 r  y  .  8  8  8  8  0 24  8 16 16 16 16  0 0 Slow
0xcb 24 dc  0 24  0 r  .  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xcc 24 dc  0 24  0 r  y  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xcd 24 dc  0 24  0 r  y  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xce 24 dc  0 24  0 r  .  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xcf 24 dc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xd0 24 dc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xd1 24 dc  0 32  0 r  .  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xd2 24 dc  0 32  0 r  y  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xd3 24 dc  0 32  0 r  y  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xd4 24 dc  0 32  0 r  .  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xd5 24 dc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xd6 24 dc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xd7 24 dc  0 24  0 r  y  .  8  8  8  0  0 24  8 16 16 16  0  0 0 Slow
0xd8 24 dc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xd9 24 dc  0 32  0 r  y  .  8  8  8  8  0 24  8 16 16 16 16  0 0 Slow
0x8c 32 tc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None

48 GLXFBConfigs:
   visual  x  bf lv rg d st colorbuffer ax dp st accumbuffer  ms  cav
 id dep cl sp sz l  ci b ro  r  g  b  a bf th cl  r  g  b  a ns b eat
----------------------------------------------------------------------
0x8d  0 tc  0 16  0 r  .  .  5  6  5  0  0  0  0  0  0  0  0  0 0 None
0x8e  0 tc  0 16  0 r  y  .  5  6  5  0  0  0  0  0  0  0  0  0 0 None
0x8f  0 tc  0 16  0 r  y  .  5  6  5  0  0  0  0  0  0  0  0  0 0 None
0x90  0 tc  0 16  0 r  .  .  5  6  5  0  0 16  0  0  0  0  0  0 0 None
0x91  0 tc  0 16  0 r  y  .  5  6  5  0  0 16  0  0  0  0  0  0 0 None
0x92  0 tc  0 16  0 r  y  .  5  6  5  0  0 16  0  0  0  0  0  0 0 None
0x93  0 tc  0 24  0 r  .  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0x94  0 tc  0 24  0 r  y  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0x95  0 tc  0 24  0 r  y  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0x96  0 tc  0 24  0 r  .  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0x97  0 tc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0x98  0 tc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0x99  0 tc  0 32  0 r  .  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0x9a  0 tc  0 32  0 r  y  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0x9b  0 tc  0 32  0 r  y  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0x9c  0 tc  0 32  0 r  .  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0x9d  0 tc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0x9e  0 tc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0x9f  0 tc  0 16  0 r  y  .  5  6  5  0  0 16  0  0  0  0  0  0 0 None
0xa0  0 tc  0 16  0 r  y  .  5  6  5  0  0 16  0 16 16 16  0  0 0 Slow
0xa1  0 tc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xa2  0 tc  0 24  0 r  y  .  8  8  8  0  0 24  8 16 16 16  0  0 0 Slow
0xa3  0 tc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xa4  0 tc  0 32  0 r  y  .  8  8  8  8  0 24  8 16 16 16 16  0 0 Slow
0xa5  0 dc  0 16  0 r  .  .  5  6  5  0  0  0  0  0  0  0  0  0 0 None
0xa6  0 dc  0 16  0 r  y  .  5  6  5  0  0  0  0  0  0  0  0  0 0 None
0xa7  0 dc  0 16  0 r  y  .  5  6  5  0  0  0  0  0  0  0  0  0 0 None
0xa8  0 dc  0 16  0 r  .  .  5  6  5  0  0 16  0  0  0  0  0  0 0 None
0xa9  0 dc  0 16  0 r  y  .  5  6  5  0  0 16  0  0  0  0  0  0 0 None
0xaa  0 dc  0 16  0 r  y  .  5  6  5  0  0 16  0  0  0  0  0  0 0 None
0xab  0 dc  0 24  0 r  .  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xac  0 dc  0 24  0 r  y  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xad  0 dc  0 24  0 r  y  .  8  8  8  0  0  0  0  0  0  0  0  0 0 None
0xae  0 dc  0 24  0 r  .  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xaf  0 dc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xb0  0 dc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xb1  0 dc  0 32  0 r  .  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xb2  0 dc  0 32  0 r  y  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xb3  0 dc  0 32  0 r  y  .  8  8  8  8  0  0  0  0  0  0  0  0 0 None
0xb4  0 dc  0 32  0 r  .  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xb5  0 dc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xb6  0 dc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xb7  0 dc  0 16  0 r  y  .  5  6  5  0  0 16  0  0  0  0  0  0 0 None
0xb8  0 dc  0 16  0 r  y  .  5  6  5  0  0 16  0 16 16 16  0  0 0 Slow
0xb9  0 dc  0 24  0 r  y  .  8  8  8  0  0 24  8  0  0  0  0  0 0 None
0xba  0 dc  0 24  0 r  y  .  8  8  8  0  0 24  8 16 16 16  0  0 0 Slow
0xbb  0 dc  0 32  0 r  y  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
0xbc  0 dc  0 32  0 r  y  .  8  8  8  8  0 24  8 16 16 16 16  0 0 Slow
Comment 42 John Stanley 2010-08-23 13:00:46 UTC
Ok, finally beginning to pin things down, In kdebase-workspace-4.4.5/kwin main.cpp, we have:

499     // HACK: This is needed for AIGLX
500     if( qstrcmp( qgetenv( "KWIN_DIRECT_GL" ), "1" ) != 0 )
501         setenv( "LIBGL_ALWAYS_INDIRECT","1", true );

while in kdebase-workspace-4.5.0/kwin/main.cpp, these lines are absent, and in kdebase-workspace-4.5.0/kwin/compositingprefs.cpp, we have:

120 #ifdef KWIN_HAVE_OPENGL_COMPOSITING
121     // HACK: This is needed for AIGLX
122     if( qstrcmp( qgetenv( "KWIN_DIRECT_GL" ), "1" ) != 0 )
123         {
124         // Start an external helper program that initializes GLX and returns
125         // 0 if we can use direct rendering, and 1 otherwise.
126         // The reason we have to use an external program is that after GLX
127         // has been initialized, it's too late to set the LIBGL_ALWAYS_INDIRECT
128         // environment variable.
129         // Direct rendering is preferred, since not all OpenGL extensions are
130         // available with indirect rendering.
131         const QString opengl_test = KStandardDirs::findExe( "kwin_opengl_test" );
132         if ( QProcess::execute( opengl_test ) != 0 )
133             setenv( "LIBGL_ALWAYS_INDIRECT", "1", true );
134         }

while in kdebase-workspace-4.4.5/kwin/compositingprefs.cpp these lines are absent.

If I build kdebase-workspace-4.4.5 with lines 499-501 added, then this entire issue disappears at least for me (i915 graphics).

If I rebuild a vanilla kdebase-workspace-4.4.5, and simply add LIBGL_ALWAYS_INDIRECT=1 to ~/.bash_profile, then this entire issue disappears as well.

Since I never explicitly set KWIN_DIRECT_GL, I assume its unset, and hence in v4.4.5 LIBGL_ALWAYS_INDIRECT is always set in kwin. For v4.5.0, if KWIN_DIRECT_GL is unset, kwin uses kwin_opengl_test to decide whether to set LIBGL_ALWAYS_INDIRECT, and apparently kwin_opengl_test returns 0 and so LIBGL_ALWAYS_INDIRECT is not set.

My glxinfo says 'direct rendering: Yes,' so I assume this is why kwin_opengl_test has a good return code. Furthermore, on first kde boot, desktop effects are active and in Desktop Effects-->Advanced 'Enable Direct Rendering' is checked. Note also that, on first kde boots, desktop effects all appear to work. This suggest to me that X/libdrm/Mesa/intel driver are all working, at least on first boots. For me, the issue has only been that on
making a CHANGE to desktop effects via the system settings dialog, or changes to screen appearance in general, results in the 'hang' if desktop effects are active. 

So it looks like kde initially sets up the graphics correctly, but then after
that, if changes are requested (perhaps requiring some sort of re-initializtion of the video), something goes amiss (not necessarily in kde).

So, can anybody shed light on how kde uses the environment variables KWIN_DIRECT_GL and LIBGL_ALWAYS_INDIRECT, particularly the latter? I'm curious because even with LIBGL_ALWAYS_INDIRECT set, Desktop Effects-->Advanced 'Enable Direct Rendering' is checked, and desktop effects are FAST, really FAST!

ps: I'm still seeing a ton of these messages in .xsession-errors:
systemsettings(3722) EventListener::eventFilter: User of KWidgetItemDelegate should not delete widgets created by createWidgets!
Comment 43 John Stanley 2010-08-23 13:07:27 UTC
In my last comment(#42), I miss-typed; replace

If I build kdebase-workspace-4.4.5 with lines 499-501 added, then this entire
issue disappears at least for me (i915 graphics).

If I rebuild a vanilla kdebase-workspace-4.4.5, and simply add
LIBGL_ALWAYS_INDIRECT=1 to ~/.bash_profile, then this entire issue disappears
as well.

with

If I build kdebase-workspace-4.5.0 with lines 499-501 added, then this entire
issue disappears at least for me (i915 graphics).

If I rebuild a vanilla kdebase-workspace-4.5.0, and simply add
LIBGL_ALWAYS_INDIRECT=1 to ~/.bash_profile, then this entire issue disappears
as well.

sorry 'bout that
Comment 44 John Stanley 2010-08-24 10:34:08 UTC
I knoww very little about opengl, as you could probably tell. I now see that Mesa uses LIBGL_ALWAYS_INDIRECT.  This 'hang' issue didn't occur in v4.4.5 because I do not set KWIN_DIRECT_GL and then kwin always sets LIBGL_ALWAYS_INDIRECT.  For v4.5.0, with KWIN_DIRECT_GL unset, kwin sets LIBGL_ALWAYS_INDIRECT only if 'kwin_opengl_test' returns nonzero.  For me, kwin_opengl_test returns 0 on my desktop as well as on my laptop (both using i915) so LIBGL_ALWAYS_INDIRECT is left unset and the 'hang' issue occurs.

So, in my case, assuming kwin_opengl_test returning 0 is legit, it looks like an upstream issue where direct gl support is declared by Mesa, but in fact is not the case. I do get a number of desktop effects, including translucency, but not 'blur' even though its enabled and no errors are reported (and its not blacklisted, at least in kwinrc).

Going back to v4.4.5, I find that the Xorg-server-1.8.0/Mesa-7.8.1/libdrm-2.4.20/xf86-video-2.10.0 works, but if I upgrade any or all of these 5 pkgs then all goes to hell-random crashes, etc. (for exanple, Xorg-server-1.8.2/Mesa-7.8.2/libdrm-2.4.21/xf86-video-2.11.0 fails badly).

Looks to me like its just the same old-same old Linux video -what a mess --- and NOT kde at all..

I'll stick with manually setting LIBGL_ALWAYS_INDIRECT, and carry on..
Comment 45 Shane 2010-08-24 11:44:40 UTC
I am not a programmer but I just want to say thank you to all for looking into this bug.

Also, I want to add that I am using Compiz-fusion as a WM and it is working flawlessly on my Intel 4500M. If anything, my perception is that it is much smoother than KWin... and many effects that do not work in KWin do work in Compiz. E.g. cover switch and zooming. (They do work in KWin but I have to disable the functionality checks).

My point is that it is possible to have compositing working properly with the same underlying libraries, drivers, etc.

Here are the packages I have installed in Arch.

[me@arch ~]$ pacman -Q | grep -e xorg-server -e mesa -e libdrm -e xf86-video
lib32-libdrm 2.4.21-1
lib32-mesa 7.8.2-1
libdrm 2.4.21-2
mesa 7.8.2-1
xf86-video-intel 2.12.0-1
xf86-video-vesa 2.3.0-2
xorg-server 1.8.1.902-1
xorg-server-utils 7.5-5

If a problem exists upstream, can one of us who is more knowledgeable please submit a bug report or a patch to them. I would... but I wouldn't know what I'd be talking about. I would much rather use Kwin than Compiz.

Thanks.
Comment 46 FiNeX 2010-08-25 11:31:32 UTC
Will a fix/workaround be included in the upcoming KDE 4.5.1 (tag is tomorrow)?


Otherwise is sufficient to set the "LIBGL_ALWAYS_INDIRECT" to have a manual workaround? Maybe it should be written in the release notes :-)



Thanks all!
Comment 47 Ray Rashif 2010-08-25 14:16:09 UTC
What follows is what I have observed, immediately going from 4.4 to 4.5 with the same Xorg, same Mesa, same Intel, same Dri etc. (Arch Linux):

#1: Fresh $HOME (everything deleted)
#2: Run KDE 4.4 == OK
#3: Fresh $HOME
#4: Upgrade to 4.5
#5: Run KDE 4.5 == NOT OK

This is an Intel GMA 950, on an Intel 945 board (laptop).

The issues, seen gradually since login:

* Compositing is disabled (expected default, similar to 4.4)
* Enable Compositing; SLOW performance (FAST/NORMAL in 4.4)
* Toggle a setting, click Apply; Weird Freeze (NORMAL in 4.4)

** SWITCHING VT and back has NO EFFECT
** Unchecking "Enable direct rendering" FAILS (compositing fails, WORKS in 4.4)
** LIBGL_ALWAYS_INDIRECT=1 (similar to above) naturally FAILS as well
** Xrender WORKS, with direct or indirect rendering

So, I don't know how it could be anything but Kwin compositing code (else, some other KDE code). You guys have done stuff that is incompatible with the current and latest open-source ATI/Intel video stack used by not-so-recent hardware. I'm guessing LIBGL_ALWAYS_INDIRECT=1 or unchecking "Enable direct rendering" works with newer hardware, but those still affected by this bug.
Comment 48 FiNeX 2010-08-25 14:51:33 UTC
I've just tried to set LIBGL_ALWAYS_INDIRECT=1 on my notebook (intel 945) but without any positive effects.

The freeze occours even without opening systemsettings.

After the login I've opened some applications (konsole, firefox, dolphin) and the desktop has been frozen again :-(

I'm using xorg-server 1.8.1.902-1 (from archlinux repositories).
Comment 49 Médéric Boquien 2010-08-25 15:07:00 UTC
I agree with FiNeX, a workaround is badly needed. I updated to 4.5.0 yesterday (archlinux) on 2 computers, one with an intel and one with an ati card, both running OSS drivers and it gives a disastrous first impression of KDE. On the other hand, on a computer running the nvidia driver no problem to be noted (any kwin developer using an OSS driver?)

@47: it is not necessarily in KDE. Changes in kwin may just have triggered the bug in mesa, it would not be the first time KDE pushes the envelope. Though the effect is really bad here.
Comment 50 Christoph Feck 2010-08-25 15:20:25 UTC
This bug is about KWin freezing when applying KWin related changes in systemsettings. What comment #48 refers to propably is the intel/DRI related kernel bug. Update to 2.6.35 kernel, newest stable Mesa and Xorg, and those random hangs with intel 945 disappear (at least on my machine).
Comment 51 Ray Rashif 2010-08-25 16:17:05 UTC
(In reply to comment #49) 
> @47: it is not necessarily in KDE. Changes in kwin may just have triggered the
> bug in mesa, it would not be the first time KDE pushes the envelope. Though the
> effect is really bad here.

That is exactly what I meant by using the term "incompatible"; applies both ways.
Comment 52 Martin Flöser 2010-08-25 18:51:02 UTC
SVN commit 1167908 by graesslin:

Revert rev 1137490: it caused compositing not working with legacy NVIDIA drivers and might be responsible for freezes when changing config.
BUG: 243991
CCBUG: 241402
FIXED-IN: 4.5.1



 M  +0 -20     kwinglutils.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1167908
Comment 53 Martin Flöser 2010-08-25 18:53:52 UTC
SVN commit 1167909 by graesslin:

Forward port rev 1167908
Revert rev 1137490: it caused compositing not working with legacy NVIDIA drivers and might be responsible for freezes when changing config.
It can be reverted as there is already a better fix for buggy drivers present in 4.5.1.
Did I mention that I love drivers?
CCBUG: 243991
CCBUG: 241402

 M  +0 -20     kwinglutils.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1167909
Comment 54 Martin Flöser 2010-08-25 18:58:05 UTC
(In reply to comment #46)
> Will a fix/workaround be included in the upcoming KDE 4.5.1 (tag is tomorrow)?
I just did a commit, but I do not know if it fixed it. I have to properly investigate the issue but currently I am lacking the time. If I get a patch before the release I will send a notice to the release team.


(In reply to comment #49)
> On the
> other hand, on a computer running the nvidia driver no problem to be noted (any
> kwin developer using an OSS driver?)
Most of the devs are using NVIDIA, but I will switch to an Ati based system soon to feel with my users. But if it's too bad for me I will get an NVIDIA card again.
Comment 55 Christoph Feck 2010-08-25 20:31:17 UTC
Tested r1167909 on trunk, still hangs when applying changes while compositing is active. Toggling compositing resumes.

OpenGL vendor string: Tungsten Graphics, Inc
OpenGL renderer string: Mesa DRI Intel(R) 945GM GEM 20100328 2010Q1 x86/MMX/SSE2
OpenGL version string: 1.4 Mesa 7.8.2
XOrg: 7.5 (server 1.8.0) (intel 2.12.0)
lspci: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller
Comment 56 Thomas Lübking 2010-08-25 21:38:41 UTC
martins fix is for the fbo test, while apparently this one's about the inversed LIBGL_ALWAYS_INDIRECT policy ...

kwin used to force it unless you exported KWIN_DIRECT_GL=1, now it prepends a GLX test and only sets it if the test fails - from other bugreports (kwin crashes after closing gl game blablablah ..) i however make that intel/mesa (and maby ati/mesa) uncleanly exit gl contexts - so there's actually a good chance that the test works but direct rendering is actually broken, bein unveiled on multiple context creations.

it might work to run the test several times then:
inc=0; while `kwin_opengl_test`; do ((++inc)); echo passed; done; echo "failed on $inc"

no idea whether that's it, though *shrug*
Comment 57 Martin Flöser 2010-08-26 21:21:42 UTC
*** Bug 246987 has been marked as a duplicate of this bug. ***
Comment 58 Martin Flöser 2010-08-28 14:54:03 UTC
*** Bug 243889 has been marked as a duplicate of this bug. ***
Comment 59 Martin Flöser 2010-08-28 15:12:17 UTC
*** Bug 245400 has been marked as a duplicate of this bug. ***
Comment 60 John Stanley 2010-08-29 03:29:34 UTC
What looks to be the root issue here is that KDE and/or Mesa/xorg video driver/kernel video driver are unable to support Mesa direct opengl for some driver-graphics card configurations (for me Intel Corporation 82865G or 82915G/GV/910GL Integrated Graphics using i915). I believe its not KDE, probably not Mesa, but rather driver (for me, xf86-video-intel and i915) issue(s).

KDE-4.4.x resolved this issue by forcing Mesa to use indirect opengl (LIBGL_ALWAYS_INDIRECT=1); for cases where Mesa direct opengl is know to work, one could override LIBGL_ALWAYS_INDIRECT=1 by setting KWIN_DIRECT_GL=1 in the kwin environment.

In KDE-4.5.0, LIBGL_ALWAYS_INDIRECT=1 is set only if (a) KWIN_DIRECT_GL=1 is not set in environment, and (b) the kwin_opengl_test fails; the present 'hang' issue occurs when kwin_opengl_test passes, even though Mesa direct opengl does not work for the driver-graphics card configuration being used.

On my system, if I run machtest
(http://wwwvis.informatik.uni-stuttgart.de/machtest/intro.html) and
glean (http://glean.sourceforge.net/) tests using direct opengl, I get many PASSES, but some notable FAILURES. Clearly KDE can do nothing about these direct opengl failures.

In summary, it appears to me that the strategy to be taken here, in KDE, is to strengthen kwin_opengl_test to be able to reliably detect failures in direct opengl (reliably prevent false positives) and to then set LIBGL_ALWAYS_INDIRECT=1. I think little else can be done on the KDE side.
Comment 61 Christoph Feck 2010-09-01 20:06:06 UTC
*** Bug 249778 has been marked as a duplicate of this bug. ***
Comment 62 Martin Flöser 2010-09-05 17:58:00 UTC
Created attachment 51344 [details]
Patch to fix the freezes in combination with reverting rev 1137668

This is a patch to solve all the issues we see with the Intel drivers when indirect rendering is enabled. As I do not have intel hardware the patch is untested.

The patch should go together with reverting svn rev 1137668. It would be nice if someone could try this combinations and have a look for the following issues:
* are effects enabled on kwin startup if the selfcheck is enabled (expected behaviour: breaks without patch, works with patch)
* does the desktop freeze when changing settings (expected behaviour: no freeze with the patch)
* does blur and lanczos get enabled without being on the blacklist (expected behaviour: are not enabled)
* is the direct rendering option honoured

Thanks for testings :-)
Comment 63 John Stanley 2010-09-05 19:49:57 UTC
(In reply to comment #62)
> Created an attachment (id=51344) [details]
> Patch to fix the freezes in combination with reverting rev 1137668
> 
> This is a patch to solve all the issues we see with the Intel drivers when
> indirect rendering is enabled. As I do not have intel hardware the patch is
> untested.
> 
> The patch should go together with reverting svn rev 1137668. It would be nice
> if someone could try this combinations and have a look for the following
> issues:
> * are effects enabled on kwin startup if the selfcheck is enabled (expected
> behaviour: breaks without patch, works with patch)
> * does the desktop freeze when changing settings (expected behaviour: no freeze
> with the patch)
> * does blur and lanczos get enabled without being on the blacklist (expected
> behaviour: are not enabled)
> * is the direct rendering option honoured
> 
> Thanks for testings :-)

Sorry to say it, but no change, problem remains...

* are effects enabled on kwin startup... YES (with or w/o patch)
* does the desktop freeze when...        YES (with or w/o patch)
* does blur and lanczos get enabled...   NO  (blacklisted with or w/o patch)
* is the direct rendering option...      YES (with or w/o patch)

This is for kde-4.5.1
Comment 64 Martin Flöser 2010-09-05 20:45:04 UTC
Oh I think I did not make myself clear: if you test this patch ensure that the 
driver is *not* on the blacklist. It's important to know if kwin recognizes 
that those effects should not be loaded. If it is blacklisted the test does 
not say anything
Comment 65 Scott Kitterman 2010-09-06 01:09:13 UTC
I tried this patch on Kubuntu Maverick (KDE 4.5.1) on a Dell mini 10v.  The only other change from the current development snapshot was I'm also running a recent mesa git snapshot (which helps considerably with solving incomplete painting and flashes with compositing enabled).

With a fresh .kde and this patch the initial login is with effects temporarily suspended (and blur is enabled in the desktop effects configuration).  If I disable blur, effects are activated and work well.  I still have the problem of changing effects while effects are enabled causing the screen to freeze.  When I logout and login again (without blur), I get effects.

This is significantly better than what I get with stock 4.5.1 where I can never manage to login with effects enabled.  

In system settings it still says I'm using direct rendering (enable direct rendering is checked) which, if I understand the patch correctly, is not correct.


00:02.0 VGA compatible controller: Intel Corporation Mobile 945GME Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
Comment 66 Scott Kitterman 2010-09-06 05:31:56 UTC
After I tried the patch, I noticed the mention of also reverting rev 1137668.  I built another package that also has that reverted.  Still get screen freezes when changing setting with effects active.  No effects are always temporarily disabled on login so this seems less good than the patch without revering rev 1137668 on my system.
Comment 67 John Stanley 2010-09-06 07:06:16 UTC
(In reply to comment #64)
> Oh I think I did not make myself clear: if you test this patch ensure that the 
> driver is *not* on the blacklist. It's important to know if kwin recognizes 
> that those effects should not be loaded. If it is blacklisted the test does 
> not say anything

Oops, my error, my driver (Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04) ) is not blacklisted. So the results are:

* are effects enabled on kwin startup... YES (with or w/o patch)
* does the desktop freeze when...        YES (with or w/o patch)
* does blur and lanczos get enabled...   YES (with or w/o patch)
* is the direct rendering option...      YES (with or w/o patch)

For me, this patch is actually a step backward in that there are now often kwin crashes after making desktop-settings changes.
Comment 68 Scott Kitterman 2010-09-07 04:53:10 UTC
Today I tried mgraesslin's suggestions of also reverting kdesvn 1096554.  With that in addition to the patch and the other reversion I get effects on login.  The ceckbox for blur is checked in the U/I, but it doesn't appear to be active.  Trying to change effects with them enabled now gets a kwin crash instead of a freeze (since recovers automatically, this is progress).
Comment 69 John Stanley 2010-09-10 11:38:51 UTC
Created attachment 51510 [details]
241402 patch
Comment 70 John Stanley 2010-09-10 11:41:34 UTC
Ok, been doing some investigation into this issue, and here's what I've found. Looks like a KDE bug, period.  First, I'll be referring to to kdebase-workspave-4.5.1 code; specifically kdebase-workspace-4.5.1.new/kwin compositingprefs.{cpp,h}. The problem occurs in CompositingPrefs::detect(), in the following lines:

145     // remember and later restore active context
146     GLXContext oldcontext = glXGetCurrentContext();
147     GLXDrawable olddrawable = glXGetCurrentDrawable();
148     GLXDrawable oldreaddrawable = None;
149     if( hasglx13 )
150         oldreaddrawable = glXGetCurrentReadDrawable();
151 
152     if( initGLXContext() )
153         {
154         detectDriverAndVersion();
155         applyDriverSpecificOptions();
156         }
157     if( hasglx13 )
158         glXMakeContextCurrent( display(), olddrawable, oldreaddrawable, oldcontext );
159     else
160         glXMakeCurrent( display(), olddrawable, oldcontext );
161     deleteGLXContext();

I've added kWarning lines to compositingprefs.cpp and done numerous rebuilds of kdebase-workspace-4.5.1 to see whats happening.  What happens on first kde boot (no ~/.kde, etc), is that on the first entry to CompositingPrefs::detect() we have oldcontext = NULL; in this case, all goes well. Once logged in, Desktop effects are active (and working).  Now if we select System Settings->Desktop effects, CompositingPrefs::detect() is entered again, and once again,oldcontext = NULL.  Next we select the All effects tab, and turn on some effect, say cube animation, and then click on apply; the system 'hangs'/loses mouse focus. However, keyboard focus remains as we all know, and so going to vt1 and looking at ~/.xsession-errors, I find that CompositingPrefs::detect() has been entered a third time and this time oldcontext != NULL.

From the foregoing, I have then found that if lines 156-161 above are replaced with

156         }
       deleteGLXContext();
157     if( hasglx13 )
158         glXMakeContextCurrent( display(), olddrawable, oldreaddrawable, oldcontext );
159     else
160         glXMakeCurrent( display(), olddrawable, oldcontext );
161     //deleteGLXContext();

then the entire issue disappears.  Further, if I modify the code so that a new context is created, used, and then destroyed only if oldcontext = NULL (i.e., initGLXContext used only when oldcontext = NULL), and when oldcontext != NULL, I simply use whatever current context is active, then again, the issue entirely disappears.

Now, as the code in CompositingPrefs::detect() looks more or less sound (with the exception of possible failures to free some X-related resources such as visinfo and colormap), I think the problem lies in the external process environment.  In fact, it is quite simple to put together a simple standalone problem which closely mimics CompositingPrefs::detect(); I have done so, and all works as expected, no leaks caught by valgrind, etc.  Based on this, I think the problem is not in Mesa/Xorg/libdrm/video drivers.

Since the CompositingPrefs::detect() code depends crucially the X connection ( display() here ), I suspect that this value may be dynamic even while CompositingPrefs::detect() is running; if this is so, then deleteGLXContext() could end up trying to free stuff from a Display associated at the time with mGLContext.  To test this, I have created a patch which basically uses a new X connection in initGLXContext(), rather than display(), and this appears thus far to work nicely.

I'd like to emphasize that this patch (241402 patch), is not necessarily a proper fix, but is intended to hopefully shed light on this issue..

Anyway, anyone with time, and more expertise here, please feel free to jump in.
Comment 71 John Stanley 2010-09-10 11:47:12 UTC
Comment on attachment 51510 [details]
241402 patch

--- kdebase-workspace-4.5.1.old/kwin/compositingprefs.h 2010-01-26 19:22:26.000000000 -0500
+++ kdebase-workspace-4.5.1.new/kwin/compositingprefs.h 2010-09-10 04:55:27.061000036 -0400
@@ -92,6 +92,9 @@
 #ifdef KWIN_HAVE_OPENGL_COMPOSITING
     GLXContext mGLContext;
     Window mGLWindow;
+    XVisualInfo *mVisinfo;
+    Colormap mColormap;
+    Display *mDpy;
 #endif
 };
 
--- kdebase-workspace-4.5.1.old/kwin/compositingprefs.cpp       2010-06-24 12:28:18.000000000 -0400
+++ kdebase-workspace-4.5.1.new/kwin/compositingprefs.cpp       2010-09-10 04:55:34.844000003 -0400
@@ -166,6 +166,11 @@
 {
 #ifdef KWIN_HAVE_OPENGL_COMPOSITING
     mGLContext = NULL;
+    mDpy       = NULL;
+    mVisinfo   = NULL;
+    mGLWindow  = 0;
+    mColormap  = 0;
+
     KXErrorHandler handler;
     // Most of this code has been taken from glxinfo.c
     QVector<int> attribs;
@@ -175,39 +180,44 @@
     attribs << GLX_BLUE_SIZE << 1;
     attribs << None;
 
-    XVisualInfo* visinfo = glXChooseVisual( display(), DefaultScreen( display()), attribs.data() );
-    if( !visinfo )
+    mDpy = XOpenDisplay(0);
+    if ( !mDpy )
+         {
+         kDebug( 1212 ) << "Error: XOpenDisplay(0) failed";
+         return false;
+         }
+    mVisinfo = glXChooseVisual( mDpy, DefaultScreen( mDpy ), attribs.data() );
+    if( !mVisinfo )
         {
         attribs.last() = GLX_DOUBLEBUFFER;
         attribs << None;
-        visinfo = glXChooseVisual( display(), DefaultScreen( display()), attribs.data() );
-        if (!visinfo)
+        mVisinfo = glXChooseVisual( mDpy, DefaultScreen( mDpy ), attribs.data() );
+        if( !mVisinfo )
             {
             kDebug( 1212 ) << "Error: couldn't find RGB GLX visual";
             return false;
             }
         }
 
-    mGLContext = glXCreateContext( display(), visinfo, NULL, True );
-    if ( !mGLContext )
-    {
+    mGLContext = glXCreateContext( mDpy, mVisinfo, NULL, True );
+    if( !mGLContext )
+        {
         kDebug( 1212 ) << "glXCreateContext failed";
-        XDestroyWindow( display(), mGLWindow );
         return false;
-    }
+        }
 
     XSetWindowAttributes attr;
     attr.background_pixel = 0;
     attr.border_pixel = 0;
-    attr.colormap = XCreateColormap( display(), rootWindow(), visinfo->visual, AllocNone );
+    mColormap = XCreateColormap( mDpy, RootWindow( mDpy, mVisinfo->screen ), mVisinfo->visual, AllocNone );
+    attr.colormap = mColormap;
     attr.event_mask = StructureNotifyMask | ExposureMask;
     unsigned long mask = CWBackPixel | CWBorderPixel | CWColormap | CWEventMask;
     int width = 100, height = 100;
-    mGLWindow = XCreateWindow( display(), rootWindow(), 0, 0, width, height,
-                       0, visinfo->depth, InputOutput,
-                       visinfo->visual, mask, &attr );
+    mGLWindow = XCreateWindow( mDpy, RootWindow( mDpy, mVisinfo->screen ), 0, 0, width, height,
+                       0, mVisinfo->depth, InputOutput, mVisinfo->visual, mask, &attr );
 
-    return glXMakeCurrent( display(), mGLWindow, mGLContext ) && !handler.error( true );
+    return glXMakeCurrent( mDpy, mGLWindow, mGLContext ) && !handler.error( true );
 #else
    return false;
 #endif
@@ -216,10 +226,31 @@
 void CompositingPrefs::deleteGLXContext()
 {
 #ifdef KWIN_HAVE_OPENGL_COMPOSITING
-    if( mGLContext == NULL )
-        return;
-    glXDestroyContext( display(), mGLContext );
-    XDestroyWindow( display(), mGLWindow );
+    if( mDpy != NULL )
+        {
+        if( mGLWindow != 0 )
+            {
+            XDestroyWindow( mDpy, mGLWindow );
+            mGLWindow = 0;
+            }
+        if( mGLContext != NULL )
+            {
+            glXDestroyContext( mDpy, mGLContext );
+            mGLContext = NULL;
+            }
+        if( mColormap != 0 )
+            {
+            XFreeColormap( mDpy, mColormap);
+            mColormap = 0;
+            }
+        if( mVisinfo != NULL )
+            {
+            XFree( mVisinfo );
+            mVisinfo = NULL;
+            }
+        XCloseDisplay( mDpy );
+        mDpy = NULL;
+        }
 #endif
 }
 
@@ -319,7 +350,6 @@
     //    }
     }
 
-
 bool CompositingPrefs::detectXgl()
     { // Xgl apparently uses only this specific X version
     return VendorRelease(display()) == 70000001;
Comment 72 John Stanley 2010-09-10 11:51:40 UTC
Comment on attachment 51510 [details]
241402 patch

--- kdebase-workspace-4.5.1.old/kwin/compositingprefs.h 2010-01-26 19:22:26.000000000 -0500
+++ kdebase-workspace-4.5.1.new/kwin/compositingprefs.h 2010-09-10 04:55:27.061000036 -0400
@@ -92,6 +92,9 @@
 #ifdef KWIN_HAVE_OPENGL_COMPOSITING
     GLXContext mGLContext;
     Window mGLWindow;
+    XVisualInfo *mVisinfo;
+    Colormap mColormap;
+    Display *mDpy;
 #endif
 };
 
--- kdebase-workspace-4.5.1.old/kwin/compositingprefs.cpp       2010-06-24 12:28:18.000000000 -0400
+++ kdebase-workspace-4.5.1.new/kwin/compositingprefs.cpp       2010-09-10 04:55:34.844000003 -0400
@@ -166,6 +166,11 @@
 {
 #ifdef KWIN_HAVE_OPENGL_COMPOSITING
     mGLContext = NULL;
+    mDpy       = NULL;
+    mVisinfo   = NULL;
+    mGLWindow  = 0;
+    mColormap  = 0;
+
     KXErrorHandler handler;
     // Most of this code has been taken from glxinfo.c
     QVector<int> attribs;
@@ -175,39 +180,44 @@
     attribs << GLX_BLUE_SIZE << 1;
     attribs << None;
 
-    XVisualInfo* visinfo = glXChooseVisual( display(), DefaultScreen( display()), attribs.data() );
-    if( !visinfo )
+    mDpy = XOpenDisplay(0);
+    if ( !mDpy )
+         {
+         kDebug( 1212 ) << "Error: XOpenDisplay(0) failed";
+         return false;
+         }
+    mVisinfo = glXChooseVisual( mDpy, DefaultScreen( mDpy ), attribs.data() );
+    if( !mVisinfo )
         {
         attribs.last() = GLX_DOUBLEBUFFER;
         attribs << None;
-        visinfo = glXChooseVisual( display(), DefaultScreen( display()), attribs.data() );
-        if (!visinfo)
+        mVisinfo = glXChooseVisual( mDpy, DefaultScreen( mDpy ), attribs.data() );
+        if( !mVisinfo )
             {
             kDebug( 1212 ) << "Error: couldn't find RGB GLX visual";
             return false;
             }
         }
 
-    mGLContext = glXCreateContext( display(), visinfo, NULL, True );
-    if ( !mGLContext )
-    {
+    mGLContext = glXCreateContext( mDpy, mVisinfo, NULL, True );
+    if( !mGLContext )
+        {
         kDebug( 1212 ) << "glXCreateContext failed";
-        XDestroyWindow( display(), mGLWindow );
         return false;
-    }
+        }
 
     XSetWindowAttributes attr;
     attr.background_pixel = 0;
     attr.border_pixel = 0;
-    attr.colormap = XCreateColormap( display(), rootWindow(), visinfo->visual, AllocNone );
+    mColormap = XCreateColormap( mDpy, RootWindow( mDpy, mVisinfo->screen ), mVisinfo->visual, AllocNone );
+    attr.colormap = mColormap;
     attr.event_mask = StructureNotifyMask | ExposureMask;
     unsigned long mask = CWBackPixel | CWBorderPixel | CWColormap | CWEventMask;
     int width = 100, height = 100;
-    mGLWindow = XCreateWindow( display(), rootWindow(), 0, 0, width, height,
-                       0, visinfo->depth, InputOutput,
-                       visinfo->visual, mask, &attr );
+    mGLWindow = XCreateWindow( mDpy, RootWindow( mDpy, mVisinfo->screen ), 0, 0, width, height,
+                       0, mVisinfo->depth, InputOutput, mVisinfo->visual, mask, &attr );
 
-    return glXMakeCurrent( display(), mGLWindow, mGLContext ) && !handler.error( true );
+    return glXMakeCurrent( mDpy, mGLWindow, mGLContext ) && !handler.error( true );
 #else
    return false;
 #endif
@@ -216,10 +226,31 @@
 void CompositingPrefs::deleteGLXContext()
 {
 #ifdef KWIN_HAVE_OPENGL_COMPOSITING
-    if( mGLContext == NULL )
-        return;
-    glXDestroyContext( display(), mGLContext );
-    XDestroyWindow( display(), mGLWindow );
+    if( mDpy != NULL )
+        {
+        if( mGLWindow != 0 )
+            {
+            XDestroyWindow( mDpy, mGLWindow );
+            mGLWindow = 0;
+            }
+        if( mGLContext != NULL )
+            {
+            glXDestroyContext( mDpy, mGLContext );
+            mGLContext = NULL;
+            }
+        if( mColormap != 0 )
+            {
+            XFreeColormap( mDpy, mColormap);
+            mColormap = 0;
+            }
+        if( mVisinfo != NULL )
+            {
+            XFree( mVisinfo );
+            mVisinfo = NULL;
+            }
+        XCloseDisplay( mDpy );
+        mDpy = NULL;
+        }
 #endif
 }
 
@@ -319,7 +350,6 @@
     //    }
     }
 
-
 bool CompositingPrefs::detectXgl()
     { // Xgl apparently uses only this specific X version
     return VendorRelease(display()) == 70000001;
Comment 73 John Stanley 2010-09-10 11:59:40 UTC
Created attachment 51511 [details]
kdebase-workspace-4.5.1-compositingprefs_detect.patch

Sorry, attached the wrong file in attachment 51510 [details]. This is the correct one.
Comment 74 Martin Flöser 2010-09-10 17:15:13 UTC
Thanks for investigating the issue. If some users can confirm that the patch 
fixes the issue I would apply it to trunk (after verifying that it does not 
break on nvidia and fglrx) and backport in about two weeks, so that this issue 
is solved before 4.5.2 is released.
Comment 75 Scott Kitterman 2010-09-10 21:18:17 UTC
Patch does not work for me.  Was it supposed to be applied in tandom with any of the other patches/reverts described in the bug?

This is on current Kubuntu Maverick (with KDE 4.5.1) and a recent Mesa git snapshot.

It did change things slightly.  Previously on this system (Dell mini 10v with Intel 945GME), shift+alt+f12 after a freeze would cause an X crash.  Now it doesn't.  It just does nothing instead.
Comment 76 Thomas Lübking 2010-09-10 22:13:33 UTC
*** Bug 250825 has been marked as a duplicate of this bug. ***
Comment 77 Thomas Lübking 2010-09-11 00:01:22 UTC
(In reply to comment #70)
> From the foregoing, I have then found that if lines 156-161 above are replaced
> with
> 
> 156         }
>        deleteGLXContext();
> 157     if( hasglx13 )
> 158         glXMakeContextCurrent( display(), olddrawable, oldreaddrawable,
> oldcontext );
> 159     else
> 160         glXMakeCurrent( display(), olddrawable, oldcontext );
> 161     //deleteGLXContext();
> 
> then the entire issue disappears.

... what means that the intel driver can not handle multiple gl contexts (it just destroys the current one - i recall to have mentioned such impression before - bug reports reg. crashes on kwin ./. gl games -, so to me this sounds absolutely reasonable)

-> The change is safe and should be applied (still, this is _clearly_ a driver bug, and no, i've not written this code :-) but this won't fix the external issues)

> Further, if I modify the code so that a new context is created, used, and then destroyed only if 
> oldcontext = NULL (i.e., initGLXContext used only when oldcontext = NULL), and when
> oldcontext != NULL, I simply use whatever current context is active, then again, the issue entirely
> disappears.

Because you do not bother the driver with multiple contexts anymore... - since the context is (for whatever reason) always constructed "direct"* and the actual directness nature is handled by the env var (-> why this??), kwin will unlikely run contexts on different dpys or vis and as long as the detection code does not mess up with the context but just queries some values, that's no big deal... but afaics not "correct" either... and a bit wonky if the dection ever starts to impact the "testing" context... =\

* the "True" in "glXCreateContext( display(), visinfo, NULL, True );"

> Now, as the code in CompositingPrefs::detect() looks more or less sound 
... except for the complete break on glXDestroyContext() ... ;-)

> In fact, it is quite simple to put together a simple standalone problem which closely mimics
> CompositingPrefs::detect(); I have done so, and all works as expected, 
... with two open contexts at the same time? (you should attach the code)

> no leaks caught by valgrind, etc.
What does the leak mention refer to? (just the XFree stuff?)

> Based on this, I think the problem is not in Mesa/Xorg/libdrm/video drivers.
Based on your observation i'm damn sure that mesa / the intel driver can only reliably handle _one_ gl contxt at a time ;-)
I wouldn't mind, but iff this really holds across processes as well, this workaround won't fix composited kwin + other gl clients ... :-(

> Since the CompositingPrefs::detect() code depends crucially the X connection (
> display() here ), I suspect that this value may be dynamic even while
> CompositingPrefs::detect() is running; 

unlikely - display() is QX11Info::display() which returns a value assigned during the QApplication construction - you could debug the value, but i doubt it ever changes once QApplication() was called (which is a requirement to do anything GUI related in Qt)
Since CompositingPrefs is constructed in the options constructor, options are allocated on the heap in the kwin ::Application constructor which inits on the KApplication constructor invoking the QApplication constructor, all should be fine... "should"

Ok, main questions:
- You mentioned that repositioning the deleteGLXContext() call completely fixed it for you (so far): does that still hold?
- Did reconnecting the xserver cause you any further improvement?
- Do you still have to set the libgl_always_indirect variable or -and more important- does dri now work for you?

If not, my opinion is
a) commit the deleteGLXContext() repositioning
b) commit the required XFree*()'s
c) do NOT make the code more compex by adding a second display connection, since "0" is not necessarily the proper display string and might just cause further issues... and /theoretically/ it should not be necessary at all

Finally and
@Martin/Lucas/Fredrik/whoever:

Can please so. elaborate why we cannot directly open an indirect context in case allocating the direct context fails but rely on the evironment variable? (unrespected by some implementations?)
Comment 78 Martin Flöser 2010-09-11 16:29:02 UTC
> Patch to fix the freezes in combination with reverting rev 1137668
> 
> This is a patch to solve all the issues we see with the Intel drivers when
> indirect rendering is enabled. As I do not have intel hardware the patch is
> untested.
I just want to say that even if the patch would fix the issue for Intel users 
it cannot be applied. I am currently using fglrx with indirect rendering and 
blur is working. That means we cannot assume that we need direct rendering for 
FBO or GLSL (fglrx supports FBO and ARB shaders, but not GLSL in indirect 
rendering). So a check for direct rendering as in the patch would mean 
regressions for users having a working driver.

Now I don't want to favor the proprietary drivers but I do not want to remove 
features for users with working drivers, because there are broken free 
drivers. It just needs to be fixed in the right place and that is the driver 
:-(
Comment 79 Jay LaCroix 2010-09-11 18:13:31 UTC
I don't understand. Can't you just make an 'if statement' for the fix and put a checkbox called "compatibility mode" that turns the fix on or off? Or maybe if it's Intel anything, the fix is applied, but if it's Nvidia, it's not? I don't see how a work around can't be coded up that makes everyone happy. Sure the issue may be the driver, but are you sure the driver will EVER be fixed? You said yourself this has been going on for years. I refuse to believe a work around that makes everyone happy is not possible.
Comment 80 Thomas Lübking 2010-09-11 21:43:49 UTC
according to the replies, this patch doesn't really fix anything of this bug anyway - it's just broad sword approach to prevent using some things if sth. (weak to no related) else doesn't work and by this: wrong :-)

Also this very patch is NOT related to comments #70 - #77
Comment 81 John Stanley 2010-09-12 13:00:18 UTC
(In reply to comment #77)
> (In reply to comment #70)
> > From the foregoing, I have then found that if lines 156-161 above are replaced
> > with
> > 
> > 156         }
> >        deleteGLXContext();
> > 157     if( hasglx13 )
> > 158         glXMakeContextCurrent( display(), olddrawable, oldreaddrawable,
> > oldcontext );
> > 159     else
> > 160         glXMakeCurrent( display(), olddrawable, oldcontext );
> > 161     //deleteGLXContext();
> > 
> > then the entire issue disappears.
> 
> ... what means that the intel driver can not handle multiple gl contexts (it
> just destroys the current one - i recall to have mentioned such impression
> before - bug reports reg. crashes on kwin ./. gl games -, so to me this sounds
> absolutely reasonable)
> Not so at all (see my attachment to follow (simple code, works great with no memory leaks caught by valgrind).
 
> -> The change is safe and should be applied (still, this is _clearly_ a driver
> bug, and no, i've not written this code :-) but this won't fix the external
> issues)
> I think the only think that clear here, and rather sad as well, is that for as long as this bug has been open, no one of any appreciable expertise has been able to put any time in. I'm not going to defend 'the drivers;' I know Mesa is a mess, I know the intel driver is far from ideal (if its anything like their buggy IPW2200 wireless driver code, which I have had the pleasure to look at)...

> > Further, if I modify the code so that a new context is created, used, and then destroyed only if 
> > oldcontext = NULL (i.e., initGLXContext used only when oldcontext = NULL), and when
> > oldcontext != NULL, I simply use whatever current context is active, then again, the issue entirely
> > disappears.
> 
> Because you do not bother the driver with multiple contexts anymore... - since
> the context is (for whatever reason) always constructed "direct"* and the
> actual directness nature is handled by the env var (-> why this??), kwin will
> unlikely run contexts on different dpys or vis and as long as the detection
> code does not mess up with the context but just queries some values, that's no
> big deal... but afaics not "correct" either... and a bit wonky if the dection
> ever starts to impact the "testing" context... =\
> 
> * the "True" in "glXCreateContext( display(), visinfo, NULL, True );"
> 
> > Now, as the code in CompositingPrefs::detect() looks more or less sound 
> ... except for the complete break on glXDestroyContext() ... ;-)
> 
> > In fact, it is quite simple to put together a simple standalone problem which closely mimics
> > CompositingPrefs::detect(); I have done so, and all works as expected, 
> ... with two open contexts at the same time? (you should attach the code)
>  Again, see my attachment to follow. Of course you don't have two current contexts (in the same process or thread)-- not possible, but you can have any number of contexts defined -- you just switch among them -- and can can have multiple 'displays' as well -- I don't mean multiple servers, I mean multiple connections to a single X server. 
> > no leaks caught by valgrind, etc.
> What does the leak mention refer to? (just the XFree stuff?)
> See my attachment -- there are some sample runs -- along with the code -- you can give a try yourself on your system if you like.
> > Based on this, I think the problem is not in Mesa/Xorg/libdrm/video drivers.
> Based on your observation i'm damn sure that mesa / the intel driver can only
> reliably handle _one_ gl contxt at a time ;-)
Again, dead wrong-- see my attachment
> I wouldn't mind, but iff this really holds across processes as well, this
> workaround won't fix composited kwin + other gl clients ... :-(
> 
> > Since the CompositingPrefs::detect() code depends crucially the X connection (
> > display() here ), I suspect that this value may be dynamic even while
> > CompositingPrefs::detect() is running; 
> As you pointed out above, my suspicion is probably unfounded, this is the sort of input we need!
> unlikely - display() is QX11Info::display() which returns a value assigned
> during the QApplication construction - you could debug the value, but i doubt
> it ever changes once QApplication() was called (which is a requirement to do
> anything GUI related in Qt)
> Since CompositingPrefs is constructed in the options constructor, options are
> allocated on the heap in the kwin ::Application constructor which inits on the
> KApplication constructor invoking the QApplication constructor, all should be
> fine... "should"
> "should" ?! Is that a yes or or no ?
> Ok, main questions:
> - You mentioned that repositioning the deleteGLXContext() call completely fixed
> it for you (so far): does that still hold?
Absolutely
> - Did reconnecting the xserver cause you any further improvement?
Possibly some effects, e.g., the rolling cube is much less choppy now, but this is quite subjective, and there is variability, so not really

> - Do you still have to set the libgl_always_indirect variable or -and more
> important- does dri now work for you?
> I stopped setting LIBGL_ALWAYS_INDIRECT a while ago (as I mentioned, I thought). I don't set ant opengl environment variables, so its always direct.
> If not, my opinion is
> a) commit the deleteGLXContext() repositioning
> b) commit the required XFree*()'s
> c) do NOT make the code more compex by adding a second display connection,
> since "0" is not necessarily the proper display string and might just cause
> further issues... and /theoretically/ it should not be necessary at all
> As I mentioned, the patch I submitted was NOT A FIX, but rather, I hoped that it might shed light on the issue. Simply moving deleteGLXContext() is NOT a solution-- it does, on my system, stop the immediate issue, but its not the proper thing to do (it may in fact introduce memory leaks and/or other issues downroad).
> Finally and
> @Martin/Lucas/Fredrik/whoever:
> 
> Can please so. elaborate why we cannot directly open an indirect context in
> case allocating the direct context fails but rely on the evironment variable?
> (unrespected by some implementations?)
This is ok, after all that's what was done in kde-4.4.x . Its probably the best thing to do until someone figures this thing out. I have several pcs using indirect and they run kde 4.4.x with effects very well.
Really now, I love kde (except for its size!)-- in my opinion, its by far the best... so, I'm going to continue looking at this, as time permits...
Comment 82 John Stanley 2010-09-12 13:06:24 UTC
Created attachment 51559 [details]
glx detect/contect setting test code 

The code comes first, followed by a couple of test cases, one with direct, the second with indirect rendering active (g++ -lGL -o glxtest glxtest.cpp to build)
Comment 83 John Stanley 2010-09-12 13:32:44 UTC
One other thing, I think this "bug" may very well encompass a lot of issues -- most of which are xorg/mesa/drm/driver related -- I'm pretty sure of this. I know from several years of experience that some combinations of these pkgs work really well, but most don't, and for me it mostly been by trial 'n error to fix things, because I know little about the details. I mentioned earlier that while v4.4.x works well for me, if I upgrade any or all of the  xorg/mesa/drm/driver pkgs I system can become totally unstable. For example, xf86-video-intel-2.10.0 won't work with libdrm-2.4.21, and with Mesa less than 7.7, and xf86-video-intel-2.12.0 won't play well with xorg-server less than 1.8.0 etc, etc., and it all varies from pc to pc... yuk.

My focus here, thus far has been ONLY on the 'hang' resulting from desktop effects changes, nothing else-- simply, because on my (intel-based) systems, this is the most visible problem. These ate in fact some stability issues  (which may or my not be related, but they can wait). As I mentioned w/ or w/o direct rendering, I have nice desktop effects -- not all of them (no blur, no explosion, etc), but enough, I'm happy. As I recall, for me, KDE4 wasn't stable until 4.2, so I can wait..
Comment 84 John Stanley 2010-09-12 13:37:27 UTC
(In reply to comment #75)
> Patch does not work for me.  Was it supposed to be applied in tandom with any
> of the other patches/reverts described in the bug?
> 
> This is on current Kubuntu Maverick (with KDE 4.5.1) and a recent Mesa git
> snapshot.
> 
> It did change things slightly.  Previously on this system (Dell mini 10v with
> Intel 945GME), shift+alt+f12 after a freeze would cause an X crash.  Now it
> doesn't.  It just does nothing instead.

No, no other patches. Depressing... You could try rebuilding kdebase-workspace-4.5.1 with only the single change in the location of deleteGLXContext(); (see comment #81) just to see if it helps...

Can you give the versions (revisions if from git) of all of xorg-server, libX11, Mesa, libdrm, the xorg video driver, and linux kernal?
Comment 85 Fredrik Höglund 2010-09-12 17:52:56 UTC
Created attachment 51562 [details]
Another possible fix

How about this version of the patch?

I have been able to reproduce the bug with the r600c driver, but with this patch everything seems to be working fine.

There are piglit[1] tests for destroying and switching contexts (glx-destroycontext-1 and glx-destroycontext-2), but not for the sequence of calls kwin is using at the moment.

[1] http://cgit.freedesktop.org/piglit
Comment 86 John Stanley 2010-09-12 22:15:36 UTC
(In reply to comment #85)
> Created an attachment (id=51562) [details]
> Another possible fix
> 
> How about this version of the patch?
> 
> I have been able to reproduce the bug with the r600c driver, but with this
> patch everything seems to be working fine.
> 
> There are piglit[1] tests for destroying and switching contexts
> (glx-destroycontext-1 and glx-destroycontext-2), but not for the sequence of
> calls kwin is using at the moment.
> 
> [1] http://cgit.freedesktop.org/piglit

The key point here is the repositioning the destroycontext line since -- so we have a 2nd success!  Still, its not a done deal, for the reasons I've mentioned: if oldcontext is not null, then its equivalent to repositioning the destroycontext line, if oldcontext is null, then detect() will return w/o having freed up X and GL resources -- not good -- may introduce other problems. See, destroycontext prior to restoring oldcontext, means that the "destroy" is incomplete (it'll be completed by the makecurrent switch).

The thing to note here is that if a mere repositioning like this has big effect (as it does on my and your system), then somethings amiss - I reported this to the Mesa folks, and they just blew it off, but if you look at the relevant Mesa code, I think they're mistaken. Even so, I still don't see this as clearly implying a Mesa bug as the cause of our problem.
Comment 87 Fredrik Höglund 2010-09-12 23:40:57 UTC
(In reply to comment #86)
> The key point here is the repositioning the destroycontext line since -- so we
> have a 2nd success!  Still, its not a done deal, for the reasons I've
> mentioned: if oldcontext is not null, then its equivalent to repositioning the
> destroycontext line, if oldcontext is null, then detect() will return w/o
> having freed up X and GL resources -- not good -- may introduce other problems.
> See, destroycontext prior to restoring oldcontext, means that the "destroy" is
> incomplete (it'll be completed by the makecurrent switch).

I don't quite see how we could leak anything, but maybe I'm missing something. My patch calls glxMakeCurrent() unconditionally after marking the context created by the detection code for destruction, and that should free it. That it switches to a null context shouldn't make a difference.

> The thing to note here is that if a mere repositioning like this has big effect
> (as it does on my and your system), then somethings amiss - I reported this to
> the Mesa folks, and they just blew it off, but if you look at the relevant Mesa
> code, I think they're mistaken. Even so, I still don't see this as clearly
> implying a Mesa bug as the cause of our problem.

I try not to assume anything before analyzing an issue, but I see nothing that  suggests that the order in which kwin is making these calls is illegal.

I think the best thing to do is to commit this modified patch, and submit new piglit tests for the combinations that aren't being tested currently.
Comment 88 Thomas Lübking 2010-09-13 01:15:26 UTC
you're not introducing leaks by your patch, but they're "present" (colormap & visual -at least as colormap isn't the default one, no idea whether the -default- visual is some global static instance on the sever)

The call order is NOT illegal - however john's testcase seems to do similar w/o causing trouble (but it could simply depend on the state of the other gl context, from a rough look, the testcase doesn't do anything but swapping buffers)

glXDestroyContext only frees the id unconditionally - the actual wipeout should not happen while the context is active. (but right afterwards)
Comment 89 John Stanley 2010-09-13 01:20:39 UTC
(In reply to comment #87)
> (In reply to comment #86)
> > The key point here is the repositioning the destroycontext line since -- so we
> > have a 2nd success!  Still, its not a done deal, for the reasons I've
> > mentioned: if oldcontext is not null, then its equivalent to repositioning the
> > destroycontext line, if oldcontext is null, then detect() will return w/o
> > having freed up X and GL resources -- not good -- may introduce other problems.
> > See, destroycontext prior to restoring oldcontext, means that the "destroy" is
> > incomplete (it'll be completed by the makecurrent switch).
> 
> I don't quite see how we could leak anything, but maybe I'm missing something.
> My patch calls glxMakeCurrent() unconditionally after marking the context
> created by the detection code for destruction, and that should free it. That it
> switches to a null context shouldn't make a difference.
> 
The thing is calling glxMakeCurrent() when there is a current context,
implicitly does a glxMakeCurrent( dpy, None, None, NULL) prior to making the
the new context current. So calling  glxMakeCurrent() unconditionally as you do
if perfectly ok, but unnecessary (in other words, the existing detect code
doing the switch back is doing the same thing your patch does, with the
exception of the destroycontext placement).

The reason why I suggest that resources may not be freed is that if you look at
Mesa glxcmds.c and glxcurrent.c, whether destroycontext is called before or
after the context switch DOES appear to make a difference (this, despite the
Mesa folks, saying it doesn't)-- and, unknown reasons, the fact is, on my
system it absolutely makes a difference.

> > The thing to note here is that if a mere repositioning like this has big effect
> > (as it does on my and your system), then somethings amiss - I reported this to
> > the Mesa folks, and they just blew it off, but if you look at the relevant Mesa
> > code, I think they're mistaken. Even so, I still don't see this as clearly
> > implying a Mesa bug as the cause of our problem.
> 
> I try not to assume anything before analyzing an issue, but I see nothing that 
> suggests that the order in which kwin is making these calls is illegal.
> 
> I think the best thing to do is to commit this modified patch, and submit new
> piglit tests for the combinations that aren't being tested currently.
Comment 90 Fredrik Höglund 2010-09-13 01:54:54 UTC
(In reply to comment #88)
> you're not introducing leaks by your patch, but they're "present" (colormap &
> visual -at least as colormap isn't the default one, no idea whether the
> -default- visual is some global static instance on the sever)

Well let's please focus on one bug at a time here.

(In reply to comment #89)
> The thing is calling glxMakeCurrent() when there is a current context,
> implicitly does a glxMakeCurrent( dpy, None, None, NULL) prior to making the
> the new context current. So calling  glxMakeCurrent() unconditionally as you do
> if perfectly ok, but unnecessary (in other words, the existing detect code
> doing the switch back is doing the same thing your patch does, with the
> exception of the destroycontext placement).

It should be unnecessary, but in practice the bug is still reproducible for me without that call. That's why I added it.

> The reason why I suggest that resources may not be freed is that if you look at
> Mesa glxcmds.c and glxcurrent.c, whether destroycontext is called before or
> after the context switch DOES appear to make a difference (this, despite the
> Mesa folks, saying it doesn't)-- and, unknown reasons, the fact is, on my
> system it absolutely makes a difference.

I guess the question then is if it's better to risk leaking a context each time the apply button is clicked (which is not that often), or have kwin freeze or crash. I certainly vote for the former. The leak also wouldn't be our bug in this case.
Comment 91 John Stanley 2010-09-13 03:48:35 UTC
(In reply to comment #90)
> (In reply to comment #88)
> > you're not introducing leaks by your patch, but they're "present" (colormap &
> > visual -at least as colormap isn't the default one, no idea whether the
> > -default- visual is some global static instance on the sever)
> 
> Well let's please focus on one bug at a time here.
> 
> (In reply to comment #89)
> > The thing is calling glxMakeCurrent() when there is a current context,
> > implicitly does a glxMakeCurrent( dpy, None, None, NULL) prior to making the
> > the new context current. So calling  glxMakeCurrent() unconditionally as you do
> > if perfectly ok, but unnecessary (in other words, the existing detect code
> > doing the switch back is doing the same thing your patch does, with the
> > exception of the destroycontext placement).
> 
> It should be unnecessary, but in practice the bug is still reproducible for me
> without that call. That's why I added it.
> 
> > The reason why I suggest that resources may not be freed is that if you look at
> > Mesa glxcmds.c and glxcurrent.c, whether destroycontext is called before or
> > after the context switch DOES appear to make a difference (this, despite the
> > Mesa folks, saying it doesn't)-- and, unknown reasons, the fact is, on my
> > system it absolutely makes a difference.
> 
> I guess the question then is if it's better to risk leaking a context each time
> the apply button is clicked (which is not that often), or have kwin freeze or
> crash. I certainly vote for the former. The leak also wouldn't be our bug in
> this case.

Agreed.
Comment 92 John Stanley 2010-09-15 06:11:16 UTC
(In reply to comment #88)
> you're not introducing leaks by your patch, but they're "present" (colormap &
> visual -at least as colormap isn't the default one, no idea whether the
> -default- visual is some global static instance on the sever)
> 
> The call order is NOT illegal - however john's testcase seems to do similar w/o
> causing trouble (but it could simply depend on the state of the other gl
> context, from a rough look, the testcase doesn't do anything but swapping
> buffers)
> 
> glXDestroyContext only frees the id unconditionally - the actual wipeout should
> not happen while the context is active. (but right afterwards)

I'm now in 100% agreement with: I'm totally convinced this is entirely a Mesa issue. Please accept, my apologies.. I'll probably be submitting one or more patches to Mesa, but who knows how that will go. This appears to be what's happening:
  1) on entry to detect, if there's no current context the all is well. This happens whenever compositing is not active: at boot, and on bringing and making a desktop effects change
  2) once, desktop effects are active, however, just after making a change in effects, detect is entered when there is a current context (this happens at the prefs.detect() call in Options::reloadCompositingSettings (options.cpp) ). In prefs.detect(), the failure occurs at XDestroyContext() because Mesa then, destroys all local drawables for which there does not exist an associated Window on display(), Defaultscreen(display()). This is totally wrong in our case as the current compositing context has associated drawables also on display(), Defaultscreen(display())! I'm not 100% sure why Mesa doesn't detect this, but I guess because the compositing Window was created in a different thread. Anyway, the result is that XDestroyContext() in detect() wipes out local drawables being used by the compositing context, and hence the 'hang.'

I've patched Mesa to NOT destroy any drawables that are current -- and, oi, it works. Obviously, this is not a complete fix (non-current drawables without existing Windows are still destroyed -- clearly a possible a superset of what should be destroyed).

The reason why repositioning XDestroyContext() works is because (contrary to Mesa doc), calling XDestroyContext() while the context is still current, or after it has been made non-current, are not, even in the end, equivalent. This is a Mesa bug in my judgment. Calling XDestroyContext() while the context is still current, never runs the drawables destruction described above -- this only happens when XDestroyContext() is called on non-current context.

The reason why having detect() use a different Xserver connection as I did in the patch works is that since detect()'s drawables are on a different display (different from the compositing context which is on QX11Info::display()), XDestroyContext() does not destroy drawables on compositing's display.

For me this issue is resolved (I'll simply patch Mesa and be done with it), but I think it'd be nice if we could done something simple in kde to circumvent the problem. We could simply reposition XDestroyContext() and live with the small associated memory leak (comment #90). Another possibility (which I'd verified working), is to modify detect() to check if there is a current constant on entry, and if so, simply use it as is, i.e., add code like:
    if( glXGetCurrentContext() != NULL)
        {
        detectDriverAndVersion();
        applyDriverSpecificOptions();
        return;
        }
In this case, there will be no need to save and restore the current context; a new 'detect' context will then only be employed when there is no current context. As I said, I've tried this and it appears to work fine. Nevetheless, I don't know kde well enough to be sure that this is an appropriate thing to do.
 
I'll attach a patch for this momentarily, so you see what I mean, but again, it may not be the right thing to do..
Comment 93 John Stanley 2010-09-15 06:28:26 UTC
Created attachment 51659 [details]
Idea to explore for sidestepping Mesa bug(s)
Comment 94 John Stanley 2010-09-15 07:30:04 UTC
I agree with Thomas that the ultimate cause of this issue is probably at least driver-related, since it doesn't occur for everyone. Perhaps, with some 'well-written' drivers the issue simply doesn't come up -- perhaps with intel drivers the 'screen id' is mishandled -- who knows..
Comment 95 John Stanley 2010-09-16 11:23:59 UTC
Finally resolved (I think). When detect() is entered when a current context already exists, at least some of the times, this context is associated with a Pixmap, not a Window. Mesa, in glXDestroyContext() will release drawables not a associated with a valid Windows for the current display() and screen, so it releases current drawables associated with the Pixmap (which, I think is associated with active compositing). Anyway, submitted a one-liner patch to Mesa (https://bugs.freedesktop.org/show_bug.cgi?id=30220) so hopefully things will be fixed by Mesa-7.8.3
Comment 96 Martin Flöser 2010-09-16 17:43:02 UTC
Thanks for your investigations and the time spent on this issue. I now dare to mark this bug as upstream. If the fix for mesa does not resolve this issue, please reopen.
Comment 97 Scott Kitterman 2010-09-17 01:36:06 UTC
I can confirm the proposed mesa patch solves the problem.  I'm going to see if we can get it in for the next Ubuntu release.  Thank you for working so hard on this.
Comment 98 John Stanley 2010-09-17 02:02:24 UTC
Unfortunately, the Mesa patch I submitted may introduce unacceptable memory leak issues in Mesa because with it certain Mesa/driver resources now may end up not being released. So it may break other drivers. I Guess its like working on old plumbing: fix a leak here, cause a new leak over there, ...
Comment 99 Jay LaCroix 2010-09-17 03:57:54 UTC
KDE 4.4.x was fine, so whatever changed between that version and this should be reversed. The drivers probably won't be fixed anytime soon, and we shouldn't give a substandard KDE experience while we wait for them to correct their mistakes. There has to be a work around. When it comes to driver developers fixing their mistakes, my experience over the last ten years when it comes to Linux is that they usually don't.
Comment 100 Weng Xuetian 2010-09-17 04:36:06 UTC
Will the patch in #93 merged in kde? I don't know quite a log about the opengl or glx, but his explanation for this problem is quite well. 

My R600 (Ati hd 3450) works well with this patch.
Comment 101 Martin Flöser 2010-09-17 07:42:00 UTC
> 4.4.x was fine, so whatever changed between that
> version and this should be reversed. The drivers probably won't be fixed
> anytime soon, and we shouldn't give a substandard KDE experience while
> we wait for them to correct their mistakes. There has to be a work
> around.
The change cannot be reverted in the 4.5 cycle as we plain simple cannot guarantee that there wont be regressions. We know that the 4.4 variant crashed instead of freezing and we know that without the change the desktop becomes unusable as direct rendering does not support the extensions blur requires. Working around these new problems is probably not possible without introducing regressions - e.g. Desktop unusable or blur disabled for users where it worked fine.
Comment 102 Thomas Lübking 2010-09-17 16:01:25 UTC
My 2¢

i oppose patch #93 since it removes the "proper" path to store/swap/test/restore the context [1] but if swapping the destruction order fixes it, this should be applied [2] to workaround mesa.

iff "not bother mesa with two contexts" is somehow mandantory, the current path should imo remain and just be blocked by the "if( glXGetCurrentContext() != NULL )" part (since it will "softcode" the NULL for the makeCurrent call)

[1] no problem atm, but not right either and the production context should not be messed up by testing
[2] the pot. X leaks are a complete different issue and can be fixed regardless
Comment 103 John Stanley 2010-09-18 08:38:52 UTC
(In reply to comment #102)
> My 2¢
> 
> i oppose patch #93 since it removes the "proper" path to
> store/swap/test/restore the context [1] but if swapping the destruction order
> fixes it, this should be applied [2] to workaround mesa.
> 
> iff "not bother mesa with two contexts" is somehow mandantory, the current path
> should imo remain and just be blocked by the "if( glXGetCurrentContext() !=
> NULL )" part (since it will "softcode" the NULL for the makeCurrent call)
> 
> [1] no problem atm, but not right either and the production context should not
> be messed up by testing

I agree, I think its bad practice, but just wanted to get input

> [2] the pot. X leaks are a complete different issue and can be fixed regardless

My understanding now is that there will be no leaks incurred by re-positioning because the resources not released by doing this, will be released later when the associated Pixmaps are destroyed. So I think its the way to go.
Comment 104 John Stanley 2010-09-18 08:48:07 UTC
Created attachment 51774 [details]
compositing settings fix and resource cleanup

I don't think we can/should wait on Mesa for this. It looks like a deep problem in the way Pixmaps are handled, and may have worsened for OpenGL-1.3 and newer. The Mesa patch I submitted may or may not be a solution because it may screw up other things internally, anyway, no response from them.

This patch just does the repositioning of glXDestroyContext, and resource releases; it also include several additional files which have similar resource release issues. Even if Mesa fixes things, everything in this patch should remain ok. Hopefully, others with this issue can give this one a try
Comment 105 Shane 2010-09-18 09:17:19 UTC
@ John Thanks for submitting a patch. Did you do this on a bug tracker or some place we can go and make some noise about this? I'm no good at coding... but if you need someone to pester them into fixing it... I'm there. Just point me in the right direction.
Comment 106 John Stanley 2010-09-18 09:46:56 UTC
(In reply to comment #105)
> @ John Thanks for submitting a patch. Did you do this on a bug tracker or some
> place we can go and make some noise about this? I'm no good at coding... but if
> you need someone to pester them into fixing it... I'm there. Just point me in
> the right direction.

Yup, comment #95 above, here's the bug report link https://bugs.freedesktop.org/show_bug.cgi?id=30220
Comment 107 Christoph Feck 2010-09-21 14:15:06 UTC
*** Bug 251928 has been marked as a duplicate of this bug. ***
Comment 108 Björn Ruberg 2010-09-21 22:04:23 UTC
Will a workaround go into KDE 4.5.2? I would count this as double major. The KDE 4.5 branch from today still froze for me.
Comment 109 John Stanley 2010-10-05 07:01:33 UTC
As this is 'resolved' I'm not sure this is the appropriate place for this but I'm not sure where else to post it:

Some good news of sorts: I've done some testing with the following config:

  Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (i915)
  xorg-server-1.9.0, xf86-video-2.13.0, libdrm-2.4.22, mesa-7.9-rc2, qt-4.7.0
  kde-4.5.1, linux-2.6.35.7

Up to now 'blur' (on my system, always activated on first kde-desktop boot, but has never worked) hasn't caused any problems. Now, with Mesa-7.9rc2/xorg-server-1.9.0 it causes desktop effects to be too slow, resulting in all effects being turned off. So, if I first manually turn off this effect then desktop effects are activated and the 'hang' issue here is gone.

Conclusion: Problem exists in Mesa-7.8.2/3rc1, but has been fixed in
Mesa-7.9rc2.

Blur, on the other hand, is quite sick (at least on my system), but this is another issue...
Comment 110 Martin Flöser 2010-10-05 08:23:06 UTC
> Conclusion: Problem exists in Mesa-7.8.2/3rc1, but has been fixed in
> Mesa-7.9rc2.
Btw I am rather sure that it only exists in 7.8 as I am unable to reproduce 
this issue with Mesa 7.7 which is used in the Debian system I can use for 
Intel testing.
Comment 111 Thomas Lübking 2010-10-12 15:21:29 UTC
*** Bug 253969 has been marked as a duplicate of this bug. ***
Comment 112 msnkipa 2010-10-12 19:29:25 UTC
>Btw I am rather sure that it only exists in 7.8 as I am unable to reproduce 
>this issue with Mesa 7.7 which is used in the Debian system I can use for 
>Intel testing.

I can confirm this idea, as I get this bug after I update my OpenSUSE 11.2 (with Mesa 7.7) to 11.2 (with Mesa 7.8).
Comment 113 Thomas Lübking 2010-10-22 16:07:24 UTC
*** Bug 254954 has been marked as a duplicate of this bug. ***
Comment 114 Björn Ruberg 2010-10-24 00:37:40 UTC
I updated from Fedora 13 (mesa 7.8) to Fedora 14 (mesa 7.9). Problem is gone.
Comment 115 Thomas Lübking 2010-11-18 18:05:48 UTC
*** Bug 257243 has been marked as a duplicate of this bug. ***
Comment 116 Anshul Jain 2011-02-14 09:36:33 UTC
I'm using KDE 4.6 on Opensuse 11.3 w/ Intel 4500 MHD graphics card, kwin still freezes for me when I use OpenGL. It is fine under XRender.
Comment 117 S. Burmeister 2011-02-14 09:44:09 UTC
Yes, because it is fixed in Mesa 7.9 and 11.3 ships 7.8.