Created attachment 47891 [details] kwin output Version: unspecified (using Devel) OS: Linux I´m using [kde-unstable] from archlinux (4.4.85/KDE 4.5 Beta 2). Whenever I change settings in systemsettings related to kwin (like switching the default tiling layout), kwin freezes. Mouse cursor is still visible and can be moved, but clicks aren´t registered and the display does not get updated anymore. Video stops playing, but audio continues. This only happens when compositing is activated. Attached is the kwin log, the effects start to unload when I press Ctrl-C in the terminal. Reproducible: Always Steps to Reproduce: 1. Make sure compositing is activated. 2. Start systemsettings and change a value related to kwin. 3. Click apply. Actual Results: Display stops updating, clicks are not registered. Expected Results: No freeze. ;) xorg-server: 1.8.1-1 xf86-video-ati: 6.13.0-1 mesa, ati-dri, libgl: 7.8.1-3
Only happens when using OpenGL, XRender is fine.
I'm getting the same problem. Using Arch and the kde-unstable repository with an Intel GMA965 (X3100). I get the freeze if I try to change the fonts or the window decoration, too. xf86-video-intel: 2.11.0-2 Other packages are the same ones of the reporter.
Created attachment 48226 [details] Log from the freeze Log with debug messages.
I updated this weekend from openSUSE 11.2 (which had xserver 1.7.x, Mesa 7.6 I think) to openSUSE 11.3 RC1 (which has xserver 1.8, Mesa 7.8), WITHOUT recompiling KDE from trunk, and I sometimes get freezes now, too. I am pretty sure it is an upstream bug (in X11 server or Mesa).
I'm about to commit ****** because of this. I open warzone2100 (just launch by typing warz in krunner, don't need to see) then exit after menu shows, this kwin back on track but does nothing good for memory consumption. Just about anything related to kwin (style, theme, effect, decoration) causes this for me. running rc1 on kubuntu 10.04.1.
just suspend/resume compositing instead (SHIFT+ALT+F12) if this is reproducable to you like to the OP, please try to just call qdbus org.kde.kwin /KWin reconfigure also -> test another decoration and -> disable all effect plugins to see whether the issue remains then. "style" (and maybe "theme" as of plasma-desktop "theme" as well) are not kwin related, btw.
Calling 'qdbus org.kde.kwin /KWin reconfigure' causes it everytime. Changing decos didn't affect it (tried bespin, polyester). Thanks for the suspend/resume tip though.
Happened on my ATI HD 3450 with open source driver (kernel 2.6.34, 2.6.35, mesa 7.8.1 or 7.9-git) But kwin works well without this issue with my Nvidia Go 6200 (offical driver). ...But this is not happened with 4.4 though. Maybe some new opengl api with some issue introduced in 4.5 cause this.
see comment #4
(In reply to comment #9) > see comment #4 I also think it's a upstream issue, but can this problem solved in a similar way as #243181 https://bugs.kde.org/show_bug.cgi?id=243181? I think quite a lot cards will be affected by this problem...
If the bug is in mesa we'd likely have to blacklist _every_ driver except nvidia... doesn't sound like an option to me :-) also the blacklist is intended for weak GPUs, not broken drivers. plus at least for the intel driver it seems as if there're a really sever issue that breaks when using two GL contexts in a short time frame (ie. you can segfault kwin by launching glxgears or so...) so the driver needs to be fixed anyway, since this is no more a kwin issue...
> also the blacklist is intended for weak GPUs, not broken drivers. It's also meant for broken drivers, that's why the version is included. But it's done in a way that assumes that a new driver version will fix the problems.
I have the same problem. BTW, there is some confusion out in the forums between this bug and https://bugzilla.novell.com/show_bug.cgi?id=615649. Is there a relationship?
I'm not sure... but my card is ATI HD 3450. Is any nvidia user who use nouveau encounter this problem? Maybe this can help us determine whether it is related to mesa or a specific dri driver.
(In reply to comment #14) > I'm not sure... but my card is ATI HD 3450. > > Is any nvidia user who use nouveau encounter this problem? Maybe this can help > us determine whether it is related to mesa or a specific dri driver. (In reply to comment #0) > Created an attachment (id=47891) [details] > kwin output > > Version: unspecified (using Devel) > OS: Linux > > I´m using [kde-unstable] from archlinux (4.4.85/KDE 4.5 Beta 2). > > Whenever I change settings in systemsettings related to kwin (like switching > the default tiling layout), kwin freezes. Mouse cursor is still visible and can > be moved, but clicks aren´t registered and the display does not get updated > anymore. Video stops playing, but audio continues. This only happens when > compositing is activated. > > Attached is the kwin log, the effects start to unload when I press Ctrl-C in > the terminal. > > Reproducible: Always > > Steps to Reproduce: > 1. Make sure compositing is activated. > 2. Start systemsettings and change a value related to kwin. > 3. Click apply. > > Actual Results: > Display stops updating, clicks are not registered. > > Expected Results: > No freeze. ;) > > xorg-server: 1.8.1-1 > xf86-video-ati: 6.13.0-1 > mesa, ati-dri, libgl: 7.8.1-3 I build i686-GNU/Linux systems, and I saw the same behavior with kde 4.4.95 (Linux-2.6.35/gcc-4.5.1/glibc-2.12.1/xorg-server 1.8.2/Mesa-7.8.2/xf86-video-intel.2.12.0), and now see the same with kde 4.5.0. Downgrading the intel driver to 2.11.0 and 2.10.0 had no effect. Note that for kde 4.4.{2,3,4,5} (Linux-2.6.33.7/gcc-4.4.3/glibc-2.11.1/xorg-server 1.8.1/Mesa-7.8.1/xf86-video-intel.2.10.0), this problem was not present. I have an Intel Corporation 82915G/GV/910GL Integrated Graphics Controller.
Created attachment 50575 [details] partial .xsession-errors log Attached is a partial .xsession-errors log related to this issue (there were no associated errors in Xorg.0.log)
Comment on attachment 50575 [details] partial .xsession-errors log This is a segment of the partial .xsession-errors file resulting when this bug occurs. There were no associated Xorg.0.log errors.
Oh, forgot to give the Qt version: with kde 4.4.95 I used Qt-4.7rc2, and with kde 4.5.0 I downgraded to Qt-4.6.3 hoping that would resolve the problem, but nope.
(In reply to comment #18) > Oh, forgot to give the Qt version: with kde 4.4.95 I used Qt-4.7rc2, and with > kde 4.5.0 I downgraded to Qt-4.6.3 hoping that would resolve the problem, but > nope. as comment #4 mentions: this has nothing to do with the KDE or Qt version but is somewhere in xorg-server, mesa or the driver. Up-or-downgrading KDE won't help. Try to disable kms by passing "i915.modeset=0" to the kernel in grub
I have the same bug of the original poster with one difference: it does not necessarily occur when using systemsettings, it happens when I do anything (or even nothing) after a few minutes after I started KDE. The keyboard also stops working. In Juk the music continues to play until the end of the song, the following song is not started. When I remove ~/.kde and restart KDE it is even worse because then the freeze happens even sooner and the mouse cannot be moved anymore. I am using KDE 4.5.0 from KDEmod/Arch Linux on x86_64 with kernel 2.6.34 and NVIDIA GeForce GT 240/PCI/SSE2, OpenGL version 3.3.0 NVIDIA 256.44 using the proprietary nvidia driver. Disabling the blur desktop effect solved the problem until I revisited the "Desktop Effects" configuration in systemsettings and tried to close systemsettings after changing nothing (I don't know if this is persistent since my motivation to test this further is abysmal after already having rebooted 10 times this evening). I had none of these problems in KDE 4.4.5. The Qt version was and is 4.6.3, xorg-server version was and is 1.8.1, mesa version was and is 7.8.2. Yes, these three packages didn't change during the upgrade from KDE 4.4.5 to KDE 4.5.0, they were already installed since 4 July and I upgraded to KDE 4.5.0 only today 15 August (I never used the betas or RCs). This contradicts the conclusion of comment #4.
*** This bug has been confirmed by popular vote. ***
comment #20 sounds more like bug #247839 (the failure on closing systemsettings could be just random) However, given that mouse and kbd (including ctrl+alt+backspace, "zapping") are inoperative (contradicting this OP) and you won't be able to "fix" it by suspeding/resuming compositing (?!) this is a server halt. if you can ssh into that machine from another one, there's a good chance that you can either kill the server or at least get a "clean" shutdown. also you can look for the X11 cpu usage and check dmesg for Xid entries (gpu errors likely says "NVRM") Also have a short look at your gpu temp (nvidia-settings -q GPUCoreTemp) - just ruling out it's gotten hot wherever you live ;-) Next you should determine that it's caused/induced by kwin's (GL) compositing (deactivate it or launch another WM, like "openbox --replace &") Iff this "fixes" the issue, just deactivate all effects to bisect if a specific one is causing this or it's the general rendering.
(In reply to comment #22) > comment #20 sounds more like bug #247839 (the failure on closing systemsettings > could be just random) > > However, given that mouse and kbd (including ctrl+alt+backspace, "zapping") are > inoperative (contradicting this OP) and you won't be able to "fix" it by > suspeding/resuming compositing (?!) this is a server halt. > > if you can ssh into that machine from another one, there's a good chance that > you can either kill the server or at least get a "clean" shutdown. > also you can look for the X11 cpu usage and check dmesg for Xid entries (gpu > errors likely says "NVRM") > Also have a short look at your gpu temp (nvidia-settings -q GPUCoreTemp) - just > ruling out it's gotten hot wherever you live ;-) > > Next you should determine that it's caused/induced by kwin's (GL) compositing > (deactivate it or launch another WM, like "openbox --replace &") > Iff this "fixes" the issue, just deactivate all effects to bisect if a specific > one is causing this or it's the general rendering. Hi folks, Ok, for kde-4.5.0, I downgraded all of X (libs/server/drivers), libdrm, and Mesa to the configuration I now use w/o problems under kde-4.4.5. The issue nder v4.5.0 remains. My feeling is that its most likely a kdelibs/kdebase-workspace issue. Also, in the .xsession-errors log I posted earlier, the message: systemsettings(3100) EventListener::eventFilter: User of KWidgetItemDelegate should not delete widgets created by createWidgets! occurs many times when this 'hang' happens. Perhaps its related to bug: 238864, Comment #7 ?
In reply to comment #22: I only have one computer so I cannot ssh into it from another machine. Top doesn't show disturbing behavior of X11 before this happens (it happens totally unexpectedly, there is no exaggerated CPU usage before it happens, nor any sluggishness). I cannot find Xid entries in the dmesg or other log files. Yesterday after the 10 reboots, I disabled the blur effect and the freeze only occurred once when closing systemsettings (that was immediately after rebooting for the 10th time), but after that I worked for more than one hour without a freeze. Today the freeze only happens when I activate the blur effect (a few minutes after activating it), when I disable the blur effect the freeze doesn't occur anymore.
ok, see this bug #243181 and this blog http://blog.martin-graesslin.com/blog/2010/07/blacklisting-drivers-for-some-kwin-effects/
I'm now in the process of rebuilding 4.5 on a nicely working intel 915 system now using v4.4.5, without any xorg/libdrm/mesa changes. We'll see how it goes. As far as bug #243181, I suspect it not related. Anyway, I don't really think "blacklisting" video cards is an appropriate approach; a better approach would be to "blacklist" kde-4.5, downgrade to 4.4.5, and wait for 4.5.2 or 4.5.3. Anyway, 'nuf of that. In actuality, for me (w/intel 915 card), all desktop effects I've tried are working fine as usual (luv the rolling cube for desktop navigation); its simply that whenever I change a desktop effects setting via "apply", the system hangs--when I keyboard-toggle desktop effects off and then back on, all is well, with changes intact. Nevertheless, in my judgment 4.5 is not stable enough for general use. See bug #246498 and #247839 -- a lot of people are having similiar problems on a variety of video cards..
(In reply to comment #26) > As far as bug #243181, I suspect it not related. you're not experiencing bug #20 - that's entirely differen and has nothing to do with the original bug. it should not even be here :-) > Anyway, I don't really think "blacklisting" video cards is an appropriate approach; Frankly, I personally HATE it but it was the last solution we (ok martin ;-P ) could get into the release when becoming aware of the amount of trouble causing drivers/open gl implementations regarding those two shader effects. what would be required was an external stress test application to see what your gpu/driver combination can do atm. This is however NOT related to this bug at all. > a better approach would be to "blacklist" kde-4.5, downgrade to 4.4.5, and wait for 4.5.2 or 4.5.3. That'd be then your distros job and actually some distros seem to do so (because of these issues) > its simply that whenever I change a desktop effects setting via "apply" yes, that's this bug - see comments #4 & #6 bug #246498 looks like this one and also (apparently) only affects intel users (ignoring comment #20 here, which is not related) so it might be a dupe. (there're other intel related bug reports regarding two gl contexts in a short time frmae, like playing an opengl game or so)
*** Bug 246498 has been marked as a duplicate of this bug. ***
#27 Also ati with open source drivers... My friend and I all suffered with this problem, but we use different ati card. ati hd 3450 and 4300.
does anybody encountering this issue - have a second computer - sshd on the "broken" on - basic gdb knowledge? when the freeze occures (apparently one can trigger it if, then for sure) - ssh into the "frozen" machine - check for the kwin process "ps -A | grep kwin" - attach to it "gdb", started enter "attach $pid", wait until debug libs etc. are loaded - call a backtrace "bt" (if you've only a text terminal on the non frozen machine, you can "gdb 2>&1 | tee gdb.log" to dump the gdb session into a log file as well) - don't forget to "detach", then "quit" gdb and unfreeze the frozen one thanks
for my case, no need to sshd, because ctrl + alt + fn works.... and actually, kwin seems not freeze, because if you use your mouse to do something (though it will not displayed right), after twice alt shift f12, I can saw the result of my action... Another information is, qdbus org.kde.kwin /KWin reconfigure is running again after freeze, kwin will crash and dmesg will shows: radeon 0000:01:00.0: r600_cs_track_check:280 mask 0x0000000F | 0x0000000F no cb for 0 radeon 0000:01:00.0: r600_packet3_check:1108 invalid cmd stream 526 [drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
(In reply to comment #31) > for my case, no need to sshd, because ctrl + alt + fn works.... there'd a chance that moving to a VT will resolve the freeze, but since alt+shift+f12 and WM actions are intercepted you'd only see the eventloop anyway... > Another information is, qdbus org.kde.kwin /KWin reconfigure is running again do you have that btrace? does kwin crash for you after running (and exiting) some opengl applications (w/o deacivating compositing) > after freeze, kwin will crash and dmesg will shows: does the repainting halt if you disable direct rendering (advanced tab) or pass "nomodeset" to the kernel in grub? (be aware that there're reports for this causing visual glitches, notably on font rendering, see this report on mesa: https://bugs.freedesktop.org/show_bug.cgi?id=28327)
Followup to Comment #26: Finished the build, and how depressing... the desktop settings 'apply' still fails in the same manner, but much worse, even though desktop effects are auto-enabled on first boot, and after login the screen fades in properly, desktop grid fails to 'take' most of the time, desktop cube doesn't work at all. Also, after applying a desktop effect settings change, the pc frequently dropped back to the kdm login, or hard hanged. The pc formerly had kde-4.4.5, with Linux-2.6.33.7, gcc-4.4.3, glibc-2.11.1, Qt-4.6.3, xorg-server-1.8.0, Mesa-7.8.1, libdrm-2.4.20, and f86-video-intel.2.10.0 and all worked great. All I did was remove v4.4.5, and built v4.5 (upgrading attica-0.1.3 to v0.1.4). The graphics card is Intel Corporation 82865G Integrated Graphics Controller (rev 02). The pc I first build kde-4.5.0 on (my earlier comments), has Linux-2.6.35, gcc-4.5.1, glibc-2.12.1, Qt-4.6.3, xorg-server-1.8.2, Mesa-7.8.2, libdrm-2.4.21, and f86-video-intel.2.12.0. I know that Linux video is currently a monumental mess, so this could simply be latent bugs in kernel/libdrm/Mesa/Xorg that have only surfaced kde-4.5.0. Unfortunately, my kde builds are w/o debug so I can't use gdb. I'll probably be short on time presently, and I want to resurrect the v4.4.5 system, but if there's anything I might do to help.
Since I haven´t had any free time available since reporting this bug, I didn´t report the bug upstream. Can anyone make sure it is reported or create it if it isn´t? Thanks!
Actually I think for most people they don't know how to describe this problem to upstream, especially if this is related to video card driver... I'd like to know that does anyone have the idea that which api of opengl cause this problem? Which api is introduced in kde 4.5.0, but not in kde 4.4.5?
I have this problem on Slackware 13.1 using 4.5. I can return to normal if I run DISPLAY=:0 kwin --replace from console.
for a shot in the dark: can anybody being able to reproduce this try to revert this commit: http://websvn.kde.org/?view=revision&revision=1137490 recompile and restart kwin, then try again?
(In reply to comment #37) > for a shot in the dark: > can anybody being able to reproduce this try to revert this commit: > http://websvn.kde.org/?view=revision&revision=1137490 > recompile and restart kwin, then try again? Yup, tried that. I simply created a patch to revert kwinglutils.cpp to v4.4.5, and it has no effect, so the problem unfortunately lies eleswhere.I also tried Linux-2.6.36-rc1--no change as well. I'm now about to rebuild v4.5.0 w/debug so I can use gdb, but it'll take several days. I do have a short backtrace which I got by rebuilding kdebase/-workspace/-runtime w/debug: (gdb) bt #0 0xffffe430 in __kernel_vsyscall () #1 0xb59764b1 in select () at ../sysdeps/unix/syscall-template.S:82 #2 0xb698a3d3 in qt_safe_select(int, fd_set*, fd_set*, fd_set*, timeval const*) () from /usr/lib/libQtCore.so.4 #3 0xb698e609 in QEventDispatcherUNIX::select(int, fd_set*, fd_set*, fd_set*, timeval*) () from /usr/lib/libQtCore.so.4 #4 0xb5f21dcc in ?? () from /usr/lib/libQtGui.so.4 #5 0xb698f476 in QEventDispatcherUNIXPrivate::doSelect(QFlags<QEventLoop::ProcessEventsFlag>, timeval*) () from /usr/lib/libQtCore.so.4 #6 0xb69900f6 in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4 #7 0xb5f22076 in ?? () from /usr/lib/libQtGui.so.4 #8 0xb6961889 in QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4 #9 0xb6961afa in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4 #10 0xb69663df in QCoreApplication::exec() () from /usr/lib/libQtCore.so.4 #11 0xb5e72ab7 in QApplication::exec() () from /usr/lib/libQtGui.so.4 #12 0xb77093bb in kdemain (argc=3, argv=0xbfc529a4) at /home/bld/kdebase-workspace-4.5.0-jps_src/kdebase-workspace-4.5.0/kwin/main.cpp:531 #13 0x0804874b in main (argc=3, argv=0xbfc529a4) at /home/bld/kdebase-workspace-4.5.0-jps_src/build/kwin/kwin_dummy.cpp:3 (gdb) detach Detaching from program: /usr/bin/kwin, process 2221 (gdb) quit
yeah it keeps hanging in the eventfilter. next candidate to revert would then be http://websvn.kde.org/?view=rev&revision=1137668 check your glx version (NOT the glx client version) (glxinfo | grep -i version) and esp. if it's < 1.3 (intel -> yes :) give it a try =\ you'll however loose direct rendering until it's "officially" available ... "if" :-(
errr.. sorry in case that was too ambigious: waiting in the eventloop until sth. interesting happens is what applications usually do, no dead- or livelock, nothing frozen on the CPU (this is why shift+alt+f12 is still intercepted)
(In reply to comment #40) > errr.. sorry in case that was too ambigious: > waiting in the eventloop until sth. interesting happens is what applications > usually do, no dead- or livelock, nothing frozen on the CPU (this is why > shift+alt+f12 is still intercepted) glx ver is 1.4. Here's the output from glxinfo: name of display: :0.0 display: :0 screen: 0 direct rendering: Yes server glx vendor string: SGI server glx version string: 1.4 server glx extensions: GLX_ARB_multisample, GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, GLX_OML_swap_method, GLX_SGI_make_current_read, GLX_SGI_swap_control, GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, GLX_SGIX_visual_select_group, GLX_INTEL_swap_event client glx vendor string: Mesa Project and SGI client glx version string: 1.4 client glx extensions: GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_allocate_memory, GLX_MESA_copy_sub_buffer, GLX_MESA_swap_control, GLX_MESA_swap_frame_usage, GLX_OML_swap_method, GLX_OML_sync_control, GLX_SGI_make_current_read, GLX_SGI_swap_control, GLX_SGI_video_sync, GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, GLX_SGIX_visual_select_group, GLX_EXT_texture_from_pixmap, GLX_INTEL_swap_event GLX version: 1.4 GLX extensions: GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, GLX_MESA_swap_control, GLX_OML_swap_method, GLX_OML_sync_control, GLX_SGI_make_current_read, GLX_SGI_swap_control, GLX_SGI_video_sync, GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, GLX_SGIX_visual_select_group, GLX_EXT_texture_from_pixmap, GLX_INTEL_swap_event OpenGL vendor string: Tungsten Graphics, Inc OpenGL renderer string: Mesa DRI Intel(R) 915G GEM 20100328 2010Q1 x86/MMX/SSE2 OpenGL version string: 1.4 Mesa 7.8.2 OpenGL extensions: GL_ARB_copy_buffer, GL_ARB_depth_texture, GL_ARB_draw_buffers, GL_ARB_draw_elements_base_vertex, GL_ARB_fragment_program, GL_ARB_half_float_pixel, GL_ARB_map_buffer_range, GL_ARB_multisample, GL_ARB_multitexture, GL_ARB_pixel_buffer_object, GL_ARB_point_parameters, GL_ARB_point_sprite, GL_ARB_provoking_vertex, GL_ARB_shader_objects, GL_ARB_shading_language_100, GL_ARB_shading_language_120, GL_ARB_shadow, GL_ARB_sync, GL_ARB_texture_border_clamp, GL_ARB_texture_compression, GL_ARB_texture_cube_map, GL_ARB_texture_env_add, GL_ARB_texture_env_combine, GL_ARB_texture_env_crossbar, GL_ARB_texture_env_dot3, GL_ARB_texture_mirrored_repeat, GL_ARB_texture_non_power_of_two, GL_ARB_texture_rectangle, GL_ARB_transpose_matrix, GL_ARB_vertex_array_object, GL_ARB_vertex_buffer_object, GL_ARB_vertex_program, GL_ARB_vertex_shader, GL_ARB_window_pos, GL_EXT_abgr, GL_EXT_bgra, GL_EXT_blend_color, GL_EXT_blend_equation_separate, GL_EXT_blend_func_separate, GL_EXT_blend_logic_op, GL_EXT_blend_minmax, GL_EXT_blend_subtract, GL_EXT_cull_vertex, GL_EXT_compiled_vertex_array, GL_EXT_copy_texture, GL_EXT_draw_range_elements, GL_EXT_framebuffer_blit, GL_EXT_framebuffer_object, GL_EXT_fog_coord, GL_EXT_gpu_program_parameters, GL_EXT_multi_draw_arrays, GL_EXT_packed_depth_stencil, GL_EXT_packed_pixels, GL_EXT_pixel_buffer_object, GL_EXT_point_parameters, GL_EXT_polygon_offset, GL_EXT_provoking_vertex, GL_EXT_rescale_normal, GL_EXT_secondary_color, GL_EXT_separate_specular_color, GL_EXT_shadow_funcs, GL_EXT_stencil_two_side, GL_EXT_stencil_wrap, GL_EXT_subtexture, GL_EXT_texture, GL_EXT_texture3D, GL_EXT_texture_cube_map, GL_EXT_texture_edge_clamp, GL_EXT_texture_env_add, GL_EXT_texture_env_combine, GL_EXT_texture_env_dot3, GL_EXT_texture_filter_anisotropic, GL_EXT_texture_lod_bias, GL_EXT_texture_object, GL_EXT_texture_rectangle, GL_EXT_vertex_array, GL_3DFX_texture_compression_FXT1, GL_APPLE_client_storage, GL_APPLE_packed_pixels, GL_APPLE_vertex_array_object, GL_APPLE_object_purgeable, GL_ATI_blend_equation_separate, GL_ATI_texture_env_combine3, GL_ATI_separate_stencil, GL_IBM_multimode_draw_arrays, GL_IBM_rasterpos_clip, GL_IBM_texture_mirrored_repeat, GL_INGR_blend_func_separate, GL_MESA_pack_invert, GL_MESA_ycbcr_texture, GL_MESA_window_pos, GL_NV_blend_square, GL_NV_light_max_exponent, GL_NV_packed_depth_stencil, GL_NV_texture_env_combine4, GL_NV_texture_rectangle, GL_NV_texgen_reflection, GL_NV_vertex_program, GL_NV_vertex_program1_1, GL_OES_read_format, GL_SGIS_generate_mipmap, GL_SGIS_texture_border_clamp, GL_SGIS_texture_edge_clamp, GL_SGIS_texture_lod, GL_SUN_multi_draw_arrays 32 GLX Visuals visual x bf lv rg d st colorbuffer ax dp st accumbuffer ms cav id dep cl sp sz l ci b ro r g b a bf th cl r g b a ns b eat ---------------------------------------------------------------------- 0x21 24 tc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0x22 24 dc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0xbd 24 tc 0 24 0 r . . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0xbe 24 tc 0 24 0 r y . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0xbf 24 tc 0 24 0 r y . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0xc0 24 tc 0 24 0 r . . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xc1 24 tc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xc2 24 tc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xc3 24 tc 0 32 0 r . . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0xc4 24 tc 0 32 0 r y . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0xc5 24 tc 0 32 0 r y . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0xc6 24 tc 0 32 0 r . . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0xc7 24 tc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xc8 24 tc 0 24 0 r y . 8 8 8 0 0 24 8 16 16 16 0 0 0 Slow 0xc9 24 tc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0xca 24 tc 0 32 0 r y . 8 8 8 8 0 24 8 16 16 16 16 0 0 Slow 0xcb 24 dc 0 24 0 r . . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0xcc 24 dc 0 24 0 r y . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0xcd 24 dc 0 24 0 r y . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0xce 24 dc 0 24 0 r . . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xcf 24 dc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xd0 24 dc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xd1 24 dc 0 32 0 r . . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0xd2 24 dc 0 32 0 r y . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0xd3 24 dc 0 32 0 r y . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0xd4 24 dc 0 32 0 r . . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0xd5 24 dc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0xd6 24 dc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xd7 24 dc 0 24 0 r y . 8 8 8 0 0 24 8 16 16 16 0 0 0 Slow 0xd8 24 dc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0xd9 24 dc 0 32 0 r y . 8 8 8 8 0 24 8 16 16 16 16 0 0 Slow 0x8c 32 tc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 48 GLXFBConfigs: visual x bf lv rg d st colorbuffer ax dp st accumbuffer ms cav id dep cl sp sz l ci b ro r g b a bf th cl r g b a ns b eat ---------------------------------------------------------------------- 0x8d 0 tc 0 16 0 r . . 5 6 5 0 0 0 0 0 0 0 0 0 0 None 0x8e 0 tc 0 16 0 r y . 5 6 5 0 0 0 0 0 0 0 0 0 0 None 0x8f 0 tc 0 16 0 r y . 5 6 5 0 0 0 0 0 0 0 0 0 0 None 0x90 0 tc 0 16 0 r . . 5 6 5 0 0 16 0 0 0 0 0 0 0 None 0x91 0 tc 0 16 0 r y . 5 6 5 0 0 16 0 0 0 0 0 0 0 None 0x92 0 tc 0 16 0 r y . 5 6 5 0 0 16 0 0 0 0 0 0 0 None 0x93 0 tc 0 24 0 r . . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0x94 0 tc 0 24 0 r y . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0x95 0 tc 0 24 0 r y . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0x96 0 tc 0 24 0 r . . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0x97 0 tc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0x98 0 tc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0x99 0 tc 0 32 0 r . . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0x9a 0 tc 0 32 0 r y . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0x9b 0 tc 0 32 0 r y . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0x9c 0 tc 0 32 0 r . . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0x9d 0 tc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0x9e 0 tc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0x9f 0 tc 0 16 0 r y . 5 6 5 0 0 16 0 0 0 0 0 0 0 None 0xa0 0 tc 0 16 0 r y . 5 6 5 0 0 16 0 16 16 16 0 0 0 Slow 0xa1 0 tc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xa2 0 tc 0 24 0 r y . 8 8 8 0 0 24 8 16 16 16 0 0 0 Slow 0xa3 0 tc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0xa4 0 tc 0 32 0 r y . 8 8 8 8 0 24 8 16 16 16 16 0 0 Slow 0xa5 0 dc 0 16 0 r . . 5 6 5 0 0 0 0 0 0 0 0 0 0 None 0xa6 0 dc 0 16 0 r y . 5 6 5 0 0 0 0 0 0 0 0 0 0 None 0xa7 0 dc 0 16 0 r y . 5 6 5 0 0 0 0 0 0 0 0 0 0 None 0xa8 0 dc 0 16 0 r . . 5 6 5 0 0 16 0 0 0 0 0 0 0 None 0xa9 0 dc 0 16 0 r y . 5 6 5 0 0 16 0 0 0 0 0 0 0 None 0xaa 0 dc 0 16 0 r y . 5 6 5 0 0 16 0 0 0 0 0 0 0 None 0xab 0 dc 0 24 0 r . . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0xac 0 dc 0 24 0 r y . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0xad 0 dc 0 24 0 r y . 8 8 8 0 0 0 0 0 0 0 0 0 0 None 0xae 0 dc 0 24 0 r . . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xaf 0 dc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xb0 0 dc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xb1 0 dc 0 32 0 r . . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0xb2 0 dc 0 32 0 r y . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0xb3 0 dc 0 32 0 r y . 8 8 8 8 0 0 0 0 0 0 0 0 0 None 0xb4 0 dc 0 32 0 r . . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0xb5 0 dc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0xb6 0 dc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0xb7 0 dc 0 16 0 r y . 5 6 5 0 0 16 0 0 0 0 0 0 0 None 0xb8 0 dc 0 16 0 r y . 5 6 5 0 0 16 0 16 16 16 0 0 0 Slow 0xb9 0 dc 0 24 0 r y . 8 8 8 0 0 24 8 0 0 0 0 0 0 None 0xba 0 dc 0 24 0 r y . 8 8 8 0 0 24 8 16 16 16 0 0 0 Slow 0xbb 0 dc 0 32 0 r y . 8 8 8 8 0 24 8 0 0 0 0 0 0 None 0xbc 0 dc 0 32 0 r y . 8 8 8 8 0 24 8 16 16 16 16 0 0 Slow
Ok, finally beginning to pin things down, In kdebase-workspace-4.4.5/kwin main.cpp, we have: 499 // HACK: This is needed for AIGLX 500 if( qstrcmp( qgetenv( "KWIN_DIRECT_GL" ), "1" ) != 0 ) 501 setenv( "LIBGL_ALWAYS_INDIRECT","1", true ); while in kdebase-workspace-4.5.0/kwin/main.cpp, these lines are absent, and in kdebase-workspace-4.5.0/kwin/compositingprefs.cpp, we have: 120 #ifdef KWIN_HAVE_OPENGL_COMPOSITING 121 // HACK: This is needed for AIGLX 122 if( qstrcmp( qgetenv( "KWIN_DIRECT_GL" ), "1" ) != 0 ) 123 { 124 // Start an external helper program that initializes GLX and returns 125 // 0 if we can use direct rendering, and 1 otherwise. 126 // The reason we have to use an external program is that after GLX 127 // has been initialized, it's too late to set the LIBGL_ALWAYS_INDIRECT 128 // environment variable. 129 // Direct rendering is preferred, since not all OpenGL extensions are 130 // available with indirect rendering. 131 const QString opengl_test = KStandardDirs::findExe( "kwin_opengl_test" ); 132 if ( QProcess::execute( opengl_test ) != 0 ) 133 setenv( "LIBGL_ALWAYS_INDIRECT", "1", true ); 134 } while in kdebase-workspace-4.4.5/kwin/compositingprefs.cpp these lines are absent. If I build kdebase-workspace-4.4.5 with lines 499-501 added, then this entire issue disappears at least for me (i915 graphics). If I rebuild a vanilla kdebase-workspace-4.4.5, and simply add LIBGL_ALWAYS_INDIRECT=1 to ~/.bash_profile, then this entire issue disappears as well. Since I never explicitly set KWIN_DIRECT_GL, I assume its unset, and hence in v4.4.5 LIBGL_ALWAYS_INDIRECT is always set in kwin. For v4.5.0, if KWIN_DIRECT_GL is unset, kwin uses kwin_opengl_test to decide whether to set LIBGL_ALWAYS_INDIRECT, and apparently kwin_opengl_test returns 0 and so LIBGL_ALWAYS_INDIRECT is not set. My glxinfo says 'direct rendering: Yes,' so I assume this is why kwin_opengl_test has a good return code. Furthermore, on first kde boot, desktop effects are active and in Desktop Effects-->Advanced 'Enable Direct Rendering' is checked. Note also that, on first kde boots, desktop effects all appear to work. This suggest to me that X/libdrm/Mesa/intel driver are all working, at least on first boots. For me, the issue has only been that on making a CHANGE to desktop effects via the system settings dialog, or changes to screen appearance in general, results in the 'hang' if desktop effects are active. So it looks like kde initially sets up the graphics correctly, but then after that, if changes are requested (perhaps requiring some sort of re-initializtion of the video), something goes amiss (not necessarily in kde). So, can anybody shed light on how kde uses the environment variables KWIN_DIRECT_GL and LIBGL_ALWAYS_INDIRECT, particularly the latter? I'm curious because even with LIBGL_ALWAYS_INDIRECT set, Desktop Effects-->Advanced 'Enable Direct Rendering' is checked, and desktop effects are FAST, really FAST! ps: I'm still seeing a ton of these messages in .xsession-errors: systemsettings(3722) EventListener::eventFilter: User of KWidgetItemDelegate should not delete widgets created by createWidgets!
In my last comment(#42), I miss-typed; replace If I build kdebase-workspace-4.4.5 with lines 499-501 added, then this entire issue disappears at least for me (i915 graphics). If I rebuild a vanilla kdebase-workspace-4.4.5, and simply add LIBGL_ALWAYS_INDIRECT=1 to ~/.bash_profile, then this entire issue disappears as well. with If I build kdebase-workspace-4.5.0 with lines 499-501 added, then this entire issue disappears at least for me (i915 graphics). If I rebuild a vanilla kdebase-workspace-4.5.0, and simply add LIBGL_ALWAYS_INDIRECT=1 to ~/.bash_profile, then this entire issue disappears as well. sorry 'bout that
I knoww very little about opengl, as you could probably tell. I now see that Mesa uses LIBGL_ALWAYS_INDIRECT. This 'hang' issue didn't occur in v4.4.5 because I do not set KWIN_DIRECT_GL and then kwin always sets LIBGL_ALWAYS_INDIRECT. For v4.5.0, with KWIN_DIRECT_GL unset, kwin sets LIBGL_ALWAYS_INDIRECT only if 'kwin_opengl_test' returns nonzero. For me, kwin_opengl_test returns 0 on my desktop as well as on my laptop (both using i915) so LIBGL_ALWAYS_INDIRECT is left unset and the 'hang' issue occurs. So, in my case, assuming kwin_opengl_test returning 0 is legit, it looks like an upstream issue where direct gl support is declared by Mesa, but in fact is not the case. I do get a number of desktop effects, including translucency, but not 'blur' even though its enabled and no errors are reported (and its not blacklisted, at least in kwinrc). Going back to v4.4.5, I find that the Xorg-server-1.8.0/Mesa-7.8.1/libdrm-2.4.20/xf86-video-2.10.0 works, but if I upgrade any or all of these 5 pkgs then all goes to hell-random crashes, etc. (for exanple, Xorg-server-1.8.2/Mesa-7.8.2/libdrm-2.4.21/xf86-video-2.11.0 fails badly). Looks to me like its just the same old-same old Linux video -what a mess --- and NOT kde at all.. I'll stick with manually setting LIBGL_ALWAYS_INDIRECT, and carry on..
I am not a programmer but I just want to say thank you to all for looking into this bug. Also, I want to add that I am using Compiz-fusion as a WM and it is working flawlessly on my Intel 4500M. If anything, my perception is that it is much smoother than KWin... and many effects that do not work in KWin do work in Compiz. E.g. cover switch and zooming. (They do work in KWin but I have to disable the functionality checks). My point is that it is possible to have compositing working properly with the same underlying libraries, drivers, etc. Here are the packages I have installed in Arch. [me@arch ~]$ pacman -Q | grep -e xorg-server -e mesa -e libdrm -e xf86-video lib32-libdrm 2.4.21-1 lib32-mesa 7.8.2-1 libdrm 2.4.21-2 mesa 7.8.2-1 xf86-video-intel 2.12.0-1 xf86-video-vesa 2.3.0-2 xorg-server 1.8.1.902-1 xorg-server-utils 7.5-5 If a problem exists upstream, can one of us who is more knowledgeable please submit a bug report or a patch to them. I would... but I wouldn't know what I'd be talking about. I would much rather use Kwin than Compiz. Thanks.
Will a fix/workaround be included in the upcoming KDE 4.5.1 (tag is tomorrow)? Otherwise is sufficient to set the "LIBGL_ALWAYS_INDIRECT" to have a manual workaround? Maybe it should be written in the release notes :-) Thanks all!
What follows is what I have observed, immediately going from 4.4 to 4.5 with the same Xorg, same Mesa, same Intel, same Dri etc. (Arch Linux): #1: Fresh $HOME (everything deleted) #2: Run KDE 4.4 == OK #3: Fresh $HOME #4: Upgrade to 4.5 #5: Run KDE 4.5 == NOT OK This is an Intel GMA 950, on an Intel 945 board (laptop). The issues, seen gradually since login: * Compositing is disabled (expected default, similar to 4.4) * Enable Compositing; SLOW performance (FAST/NORMAL in 4.4) * Toggle a setting, click Apply; Weird Freeze (NORMAL in 4.4) ** SWITCHING VT and back has NO EFFECT ** Unchecking "Enable direct rendering" FAILS (compositing fails, WORKS in 4.4) ** LIBGL_ALWAYS_INDIRECT=1 (similar to above) naturally FAILS as well ** Xrender WORKS, with direct or indirect rendering So, I don't know how it could be anything but Kwin compositing code (else, some other KDE code). You guys have done stuff that is incompatible with the current and latest open-source ATI/Intel video stack used by not-so-recent hardware. I'm guessing LIBGL_ALWAYS_INDIRECT=1 or unchecking "Enable direct rendering" works with newer hardware, but those still affected by this bug.
I've just tried to set LIBGL_ALWAYS_INDIRECT=1 on my notebook (intel 945) but without any positive effects. The freeze occours even without opening systemsettings. After the login I've opened some applications (konsole, firefox, dolphin) and the desktop has been frozen again :-( I'm using xorg-server 1.8.1.902-1 (from archlinux repositories).
I agree with FiNeX, a workaround is badly needed. I updated to 4.5.0 yesterday (archlinux) on 2 computers, one with an intel and one with an ati card, both running OSS drivers and it gives a disastrous first impression of KDE. On the other hand, on a computer running the nvidia driver no problem to be noted (any kwin developer using an OSS driver?) @47: it is not necessarily in KDE. Changes in kwin may just have triggered the bug in mesa, it would not be the first time KDE pushes the envelope. Though the effect is really bad here.
This bug is about KWin freezing when applying KWin related changes in systemsettings. What comment #48 refers to propably is the intel/DRI related kernel bug. Update to 2.6.35 kernel, newest stable Mesa and Xorg, and those random hangs with intel 945 disappear (at least on my machine).
(In reply to comment #49) > @47: it is not necessarily in KDE. Changes in kwin may just have triggered the > bug in mesa, it would not be the first time KDE pushes the envelope. Though the > effect is really bad here. That is exactly what I meant by using the term "incompatible"; applies both ways.
SVN commit 1167908 by graesslin: Revert rev 1137490: it caused compositing not working with legacy NVIDIA drivers and might be responsible for freezes when changing config. BUG: 243991 CCBUG: 241402 FIXED-IN: 4.5.1 M +0 -20 kwinglutils.cpp WebSVN link: http://websvn.kde.org/?view=rev&revision=1167908
SVN commit 1167909 by graesslin: Forward port rev 1167908 Revert rev 1137490: it caused compositing not working with legacy NVIDIA drivers and might be responsible for freezes when changing config. It can be reverted as there is already a better fix for buggy drivers present in 4.5.1. Did I mention that I love drivers? CCBUG: 243991 CCBUG: 241402 M +0 -20 kwinglutils.cpp WebSVN link: http://websvn.kde.org/?view=rev&revision=1167909
(In reply to comment #46) > Will a fix/workaround be included in the upcoming KDE 4.5.1 (tag is tomorrow)? I just did a commit, but I do not know if it fixed it. I have to properly investigate the issue but currently I am lacking the time. If I get a patch before the release I will send a notice to the release team. (In reply to comment #49) > On the > other hand, on a computer running the nvidia driver no problem to be noted (any > kwin developer using an OSS driver?) Most of the devs are using NVIDIA, but I will switch to an Ati based system soon to feel with my users. But if it's too bad for me I will get an NVIDIA card again.
Tested r1167909 on trunk, still hangs when applying changes while compositing is active. Toggling compositing resumes. OpenGL vendor string: Tungsten Graphics, Inc OpenGL renderer string: Mesa DRI Intel(R) 945GM GEM 20100328 2010Q1 x86/MMX/SSE2 OpenGL version string: 1.4 Mesa 7.8.2 XOrg: 7.5 (server 1.8.0) (intel 2.12.0) lspci: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller
martins fix is for the fbo test, while apparently this one's about the inversed LIBGL_ALWAYS_INDIRECT policy ... kwin used to force it unless you exported KWIN_DIRECT_GL=1, now it prepends a GLX test and only sets it if the test fails - from other bugreports (kwin crashes after closing gl game blablablah ..) i however make that intel/mesa (and maby ati/mesa) uncleanly exit gl contexts - so there's actually a good chance that the test works but direct rendering is actually broken, bein unveiled on multiple context creations. it might work to run the test several times then: inc=0; while `kwin_opengl_test`; do ((++inc)); echo passed; done; echo "failed on $inc" no idea whether that's it, though *shrug*
*** Bug 246987 has been marked as a duplicate of this bug. ***
*** Bug 243889 has been marked as a duplicate of this bug. ***
*** Bug 245400 has been marked as a duplicate of this bug. ***
What looks to be the root issue here is that KDE and/or Mesa/xorg video driver/kernel video driver are unable to support Mesa direct opengl for some driver-graphics card configurations (for me Intel Corporation 82865G or 82915G/GV/910GL Integrated Graphics using i915). I believe its not KDE, probably not Mesa, but rather driver (for me, xf86-video-intel and i915) issue(s). KDE-4.4.x resolved this issue by forcing Mesa to use indirect opengl (LIBGL_ALWAYS_INDIRECT=1); for cases where Mesa direct opengl is know to work, one could override LIBGL_ALWAYS_INDIRECT=1 by setting KWIN_DIRECT_GL=1 in the kwin environment. In KDE-4.5.0, LIBGL_ALWAYS_INDIRECT=1 is set only if (a) KWIN_DIRECT_GL=1 is not set in environment, and (b) the kwin_opengl_test fails; the present 'hang' issue occurs when kwin_opengl_test passes, even though Mesa direct opengl does not work for the driver-graphics card configuration being used. On my system, if I run machtest (http://wwwvis.informatik.uni-stuttgart.de/machtest/intro.html) and glean (http://glean.sourceforge.net/) tests using direct opengl, I get many PASSES, but some notable FAILURES. Clearly KDE can do nothing about these direct opengl failures. In summary, it appears to me that the strategy to be taken here, in KDE, is to strengthen kwin_opengl_test to be able to reliably detect failures in direct opengl (reliably prevent false positives) and to then set LIBGL_ALWAYS_INDIRECT=1. I think little else can be done on the KDE side.
*** Bug 249778 has been marked as a duplicate of this bug. ***
Created attachment 51344 [details] Patch to fix the freezes in combination with reverting rev 1137668 This is a patch to solve all the issues we see with the Intel drivers when indirect rendering is enabled. As I do not have intel hardware the patch is untested. The patch should go together with reverting svn rev 1137668. It would be nice if someone could try this combinations and have a look for the following issues: * are effects enabled on kwin startup if the selfcheck is enabled (expected behaviour: breaks without patch, works with patch) * does the desktop freeze when changing settings (expected behaviour: no freeze with the patch) * does blur and lanczos get enabled without being on the blacklist (expected behaviour: are not enabled) * is the direct rendering option honoured Thanks for testings :-)
(In reply to comment #62) > Created an attachment (id=51344) [details] > Patch to fix the freezes in combination with reverting rev 1137668 > > This is a patch to solve all the issues we see with the Intel drivers when > indirect rendering is enabled. As I do not have intel hardware the patch is > untested. > > The patch should go together with reverting svn rev 1137668. It would be nice > if someone could try this combinations and have a look for the following > issues: > * are effects enabled on kwin startup if the selfcheck is enabled (expected > behaviour: breaks without patch, works with patch) > * does the desktop freeze when changing settings (expected behaviour: no freeze > with the patch) > * does blur and lanczos get enabled without being on the blacklist (expected > behaviour: are not enabled) > * is the direct rendering option honoured > > Thanks for testings :-) Sorry to say it, but no change, problem remains... * are effects enabled on kwin startup... YES (with or w/o patch) * does the desktop freeze when... YES (with or w/o patch) * does blur and lanczos get enabled... NO (blacklisted with or w/o patch) * is the direct rendering option... YES (with or w/o patch) This is for kde-4.5.1
Oh I think I did not make myself clear: if you test this patch ensure that the driver is *not* on the blacklist. It's important to know if kwin recognizes that those effects should not be loaded. If it is blacklisted the test does not say anything
I tried this patch on Kubuntu Maverick (KDE 4.5.1) on a Dell mini 10v. The only other change from the current development snapshot was I'm also running a recent mesa git snapshot (which helps considerably with solving incomplete painting and flashes with compositing enabled). With a fresh .kde and this patch the initial login is with effects temporarily suspended (and blur is enabled in the desktop effects configuration). If I disable blur, effects are activated and work well. I still have the problem of changing effects while effects are enabled causing the screen to freeze. When I logout and login again (without blur), I get effects. This is significantly better than what I get with stock 4.5.1 where I can never manage to login with effects enabled. In system settings it still says I'm using direct rendering (enable direct rendering is checked) which, if I understand the patch correctly, is not correct. 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GME Express Integrated Graphics Controller (rev 03) 00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
After I tried the patch, I noticed the mention of also reverting rev 1137668. I built another package that also has that reverted. Still get screen freezes when changing setting with effects active. No effects are always temporarily disabled on login so this seems less good than the patch without revering rev 1137668 on my system.
(In reply to comment #64) > Oh I think I did not make myself clear: if you test this patch ensure that the > driver is *not* on the blacklist. It's important to know if kwin recognizes > that those effects should not be loaded. If it is blacklisted the test does > not say anything Oops, my error, my driver (Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04) ) is not blacklisted. So the results are: * are effects enabled on kwin startup... YES (with or w/o patch) * does the desktop freeze when... YES (with or w/o patch) * does blur and lanczos get enabled... YES (with or w/o patch) * is the direct rendering option... YES (with or w/o patch) For me, this patch is actually a step backward in that there are now often kwin crashes after making desktop-settings changes.
Today I tried mgraesslin's suggestions of also reverting kdesvn 1096554. With that in addition to the patch and the other reversion I get effects on login. The ceckbox for blur is checked in the U/I, but it doesn't appear to be active. Trying to change effects with them enabled now gets a kwin crash instead of a freeze (since recovers automatically, this is progress).
Created attachment 51510 [details] 241402 patch
Ok, been doing some investigation into this issue, and here's what I've found. Looks like a KDE bug, period. First, I'll be referring to to kdebase-workspave-4.5.1 code; specifically kdebase-workspace-4.5.1.new/kwin compositingprefs.{cpp,h}. The problem occurs in CompositingPrefs::detect(), in the following lines: 145 // remember and later restore active context 146 GLXContext oldcontext = glXGetCurrentContext(); 147 GLXDrawable olddrawable = glXGetCurrentDrawable(); 148 GLXDrawable oldreaddrawable = None; 149 if( hasglx13 ) 150 oldreaddrawable = glXGetCurrentReadDrawable(); 151 152 if( initGLXContext() ) 153 { 154 detectDriverAndVersion(); 155 applyDriverSpecificOptions(); 156 } 157 if( hasglx13 ) 158 glXMakeContextCurrent( display(), olddrawable, oldreaddrawable, oldcontext ); 159 else 160 glXMakeCurrent( display(), olddrawable, oldcontext ); 161 deleteGLXContext(); I've added kWarning lines to compositingprefs.cpp and done numerous rebuilds of kdebase-workspace-4.5.1 to see whats happening. What happens on first kde boot (no ~/.kde, etc), is that on the first entry to CompositingPrefs::detect() we have oldcontext = NULL; in this case, all goes well. Once logged in, Desktop effects are active (and working). Now if we select System Settings->Desktop effects, CompositingPrefs::detect() is entered again, and once again,oldcontext = NULL. Next we select the All effects tab, and turn on some effect, say cube animation, and then click on apply; the system 'hangs'/loses mouse focus. However, keyboard focus remains as we all know, and so going to vt1 and looking at ~/.xsession-errors, I find that CompositingPrefs::detect() has been entered a third time and this time oldcontext != NULL. From the foregoing, I have then found that if lines 156-161 above are replaced with 156 } deleteGLXContext(); 157 if( hasglx13 ) 158 glXMakeContextCurrent( display(), olddrawable, oldreaddrawable, oldcontext ); 159 else 160 glXMakeCurrent( display(), olddrawable, oldcontext ); 161 //deleteGLXContext(); then the entire issue disappears. Further, if I modify the code so that a new context is created, used, and then destroyed only if oldcontext = NULL (i.e., initGLXContext used only when oldcontext = NULL), and when oldcontext != NULL, I simply use whatever current context is active, then again, the issue entirely disappears. Now, as the code in CompositingPrefs::detect() looks more or less sound (with the exception of possible failures to free some X-related resources such as visinfo and colormap), I think the problem lies in the external process environment. In fact, it is quite simple to put together a simple standalone problem which closely mimics CompositingPrefs::detect(); I have done so, and all works as expected, no leaks caught by valgrind, etc. Based on this, I think the problem is not in Mesa/Xorg/libdrm/video drivers. Since the CompositingPrefs::detect() code depends crucially the X connection ( display() here ), I suspect that this value may be dynamic even while CompositingPrefs::detect() is running; if this is so, then deleteGLXContext() could end up trying to free stuff from a Display associated at the time with mGLContext. To test this, I have created a patch which basically uses a new X connection in initGLXContext(), rather than display(), and this appears thus far to work nicely. I'd like to emphasize that this patch (241402 patch), is not necessarily a proper fix, but is intended to hopefully shed light on this issue.. Anyway, anyone with time, and more expertise here, please feel free to jump in.
Comment on attachment 51510 [details] 241402 patch --- kdebase-workspace-4.5.1.old/kwin/compositingprefs.h 2010-01-26 19:22:26.000000000 -0500 +++ kdebase-workspace-4.5.1.new/kwin/compositingprefs.h 2010-09-10 04:55:27.061000036 -0400 @@ -92,6 +92,9 @@ #ifdef KWIN_HAVE_OPENGL_COMPOSITING GLXContext mGLContext; Window mGLWindow; + XVisualInfo *mVisinfo; + Colormap mColormap; + Display *mDpy; #endif }; --- kdebase-workspace-4.5.1.old/kwin/compositingprefs.cpp 2010-06-24 12:28:18.000000000 -0400 +++ kdebase-workspace-4.5.1.new/kwin/compositingprefs.cpp 2010-09-10 04:55:34.844000003 -0400 @@ -166,6 +166,11 @@ { #ifdef KWIN_HAVE_OPENGL_COMPOSITING mGLContext = NULL; + mDpy = NULL; + mVisinfo = NULL; + mGLWindow = 0; + mColormap = 0; + KXErrorHandler handler; // Most of this code has been taken from glxinfo.c QVector<int> attribs; @@ -175,39 +180,44 @@ attribs << GLX_BLUE_SIZE << 1; attribs << None; - XVisualInfo* visinfo = glXChooseVisual( display(), DefaultScreen( display()), attribs.data() ); - if( !visinfo ) + mDpy = XOpenDisplay(0); + if ( !mDpy ) + { + kDebug( 1212 ) << "Error: XOpenDisplay(0) failed"; + return false; + } + mVisinfo = glXChooseVisual( mDpy, DefaultScreen( mDpy ), attribs.data() ); + if( !mVisinfo ) { attribs.last() = GLX_DOUBLEBUFFER; attribs << None; - visinfo = glXChooseVisual( display(), DefaultScreen( display()), attribs.data() ); - if (!visinfo) + mVisinfo = glXChooseVisual( mDpy, DefaultScreen( mDpy ), attribs.data() ); + if( !mVisinfo ) { kDebug( 1212 ) << "Error: couldn't find RGB GLX visual"; return false; } } - mGLContext = glXCreateContext( display(), visinfo, NULL, True ); - if ( !mGLContext ) - { + mGLContext = glXCreateContext( mDpy, mVisinfo, NULL, True ); + if( !mGLContext ) + { kDebug( 1212 ) << "glXCreateContext failed"; - XDestroyWindow( display(), mGLWindow ); return false; - } + } XSetWindowAttributes attr; attr.background_pixel = 0; attr.border_pixel = 0; - attr.colormap = XCreateColormap( display(), rootWindow(), visinfo->visual, AllocNone ); + mColormap = XCreateColormap( mDpy, RootWindow( mDpy, mVisinfo->screen ), mVisinfo->visual, AllocNone ); + attr.colormap = mColormap; attr.event_mask = StructureNotifyMask | ExposureMask; unsigned long mask = CWBackPixel | CWBorderPixel | CWColormap | CWEventMask; int width = 100, height = 100; - mGLWindow = XCreateWindow( display(), rootWindow(), 0, 0, width, height, - 0, visinfo->depth, InputOutput, - visinfo->visual, mask, &attr ); + mGLWindow = XCreateWindow( mDpy, RootWindow( mDpy, mVisinfo->screen ), 0, 0, width, height, + 0, mVisinfo->depth, InputOutput, mVisinfo->visual, mask, &attr ); - return glXMakeCurrent( display(), mGLWindow, mGLContext ) && !handler.error( true ); + return glXMakeCurrent( mDpy, mGLWindow, mGLContext ) && !handler.error( true ); #else return false; #endif @@ -216,10 +226,31 @@ void CompositingPrefs::deleteGLXContext() { #ifdef KWIN_HAVE_OPENGL_COMPOSITING - if( mGLContext == NULL ) - return; - glXDestroyContext( display(), mGLContext ); - XDestroyWindow( display(), mGLWindow ); + if( mDpy != NULL ) + { + if( mGLWindow != 0 ) + { + XDestroyWindow( mDpy, mGLWindow ); + mGLWindow = 0; + } + if( mGLContext != NULL ) + { + glXDestroyContext( mDpy, mGLContext ); + mGLContext = NULL; + } + if( mColormap != 0 ) + { + XFreeColormap( mDpy, mColormap); + mColormap = 0; + } + if( mVisinfo != NULL ) + { + XFree( mVisinfo ); + mVisinfo = NULL; + } + XCloseDisplay( mDpy ); + mDpy = NULL; + } #endif } @@ -319,7 +350,6 @@ // } } - bool CompositingPrefs::detectXgl() { // Xgl apparently uses only this specific X version return VendorRelease(display()) == 70000001;
Created attachment 51511 [details] kdebase-workspace-4.5.1-compositingprefs_detect.patch Sorry, attached the wrong file in attachment 51510 [details]. This is the correct one.
Thanks for investigating the issue. If some users can confirm that the patch fixes the issue I would apply it to trunk (after verifying that it does not break on nvidia and fglrx) and backport in about two weeks, so that this issue is solved before 4.5.2 is released.
Patch does not work for me. Was it supposed to be applied in tandom with any of the other patches/reverts described in the bug? This is on current Kubuntu Maverick (with KDE 4.5.1) and a recent Mesa git snapshot. It did change things slightly. Previously on this system (Dell mini 10v with Intel 945GME), shift+alt+f12 after a freeze would cause an X crash. Now it doesn't. It just does nothing instead.
*** Bug 250825 has been marked as a duplicate of this bug. ***
(In reply to comment #70) > From the foregoing, I have then found that if lines 156-161 above are replaced > with > > 156 } > deleteGLXContext(); > 157 if( hasglx13 ) > 158 glXMakeContextCurrent( display(), olddrawable, oldreaddrawable, > oldcontext ); > 159 else > 160 glXMakeCurrent( display(), olddrawable, oldcontext ); > 161 //deleteGLXContext(); > > then the entire issue disappears. ... what means that the intel driver can not handle multiple gl contexts (it just destroys the current one - i recall to have mentioned such impression before - bug reports reg. crashes on kwin ./. gl games -, so to me this sounds absolutely reasonable) -> The change is safe and should be applied (still, this is _clearly_ a driver bug, and no, i've not written this code :-) but this won't fix the external issues) > Further, if I modify the code so that a new context is created, used, and then destroyed only if > oldcontext = NULL (i.e., initGLXContext used only when oldcontext = NULL), and when > oldcontext != NULL, I simply use whatever current context is active, then again, the issue entirely > disappears. Because you do not bother the driver with multiple contexts anymore... - since the context is (for whatever reason) always constructed "direct"* and the actual directness nature is handled by the env var (-> why this??), kwin will unlikely run contexts on different dpys or vis and as long as the detection code does not mess up with the context but just queries some values, that's no big deal... but afaics not "correct" either... and a bit wonky if the dection ever starts to impact the "testing" context... =\ * the "True" in "glXCreateContext( display(), visinfo, NULL, True );" > Now, as the code in CompositingPrefs::detect() looks more or less sound ... except for the complete break on glXDestroyContext() ... ;-) > In fact, it is quite simple to put together a simple standalone problem which closely mimics > CompositingPrefs::detect(); I have done so, and all works as expected, ... with two open contexts at the same time? (you should attach the code) > no leaks caught by valgrind, etc. What does the leak mention refer to? (just the XFree stuff?) > Based on this, I think the problem is not in Mesa/Xorg/libdrm/video drivers. Based on your observation i'm damn sure that mesa / the intel driver can only reliably handle _one_ gl contxt at a time ;-) I wouldn't mind, but iff this really holds across processes as well, this workaround won't fix composited kwin + other gl clients ... :-( > Since the CompositingPrefs::detect() code depends crucially the X connection ( > display() here ), I suspect that this value may be dynamic even while > CompositingPrefs::detect() is running; unlikely - display() is QX11Info::display() which returns a value assigned during the QApplication construction - you could debug the value, but i doubt it ever changes once QApplication() was called (which is a requirement to do anything GUI related in Qt) Since CompositingPrefs is constructed in the options constructor, options are allocated on the heap in the kwin ::Application constructor which inits on the KApplication constructor invoking the QApplication constructor, all should be fine... "should" Ok, main questions: - You mentioned that repositioning the deleteGLXContext() call completely fixed it for you (so far): does that still hold? - Did reconnecting the xserver cause you any further improvement? - Do you still have to set the libgl_always_indirect variable or -and more important- does dri now work for you? If not, my opinion is a) commit the deleteGLXContext() repositioning b) commit the required XFree*()'s c) do NOT make the code more compex by adding a second display connection, since "0" is not necessarily the proper display string and might just cause further issues... and /theoretically/ it should not be necessary at all Finally and @Martin/Lucas/Fredrik/whoever: Can please so. elaborate why we cannot directly open an indirect context in case allocating the direct context fails but rely on the evironment variable? (unrespected by some implementations?)
> Patch to fix the freezes in combination with reverting rev 1137668 > > This is a patch to solve all the issues we see with the Intel drivers when > indirect rendering is enabled. As I do not have intel hardware the patch is > untested. I just want to say that even if the patch would fix the issue for Intel users it cannot be applied. I am currently using fglrx with indirect rendering and blur is working. That means we cannot assume that we need direct rendering for FBO or GLSL (fglrx supports FBO and ARB shaders, but not GLSL in indirect rendering). So a check for direct rendering as in the patch would mean regressions for users having a working driver. Now I don't want to favor the proprietary drivers but I do not want to remove features for users with working drivers, because there are broken free drivers. It just needs to be fixed in the right place and that is the driver :-(
I don't understand. Can't you just make an 'if statement' for the fix and put a checkbox called "compatibility mode" that turns the fix on or off? Or maybe if it's Intel anything, the fix is applied, but if it's Nvidia, it's not? I don't see how a work around can't be coded up that makes everyone happy. Sure the issue may be the driver, but are you sure the driver will EVER be fixed? You said yourself this has been going on for years. I refuse to believe a work around that makes everyone happy is not possible.
according to the replies, this patch doesn't really fix anything of this bug anyway - it's just broad sword approach to prevent using some things if sth. (weak to no related) else doesn't work and by this: wrong :-) Also this very patch is NOT related to comments #70 - #77
(In reply to comment #77) > (In reply to comment #70) > > From the foregoing, I have then found that if lines 156-161 above are replaced > > with > > > > 156 } > > deleteGLXContext(); > > 157 if( hasglx13 ) > > 158 glXMakeContextCurrent( display(), olddrawable, oldreaddrawable, > > oldcontext ); > > 159 else > > 160 glXMakeCurrent( display(), olddrawable, oldcontext ); > > 161 //deleteGLXContext(); > > > > then the entire issue disappears. > > ... what means that the intel driver can not handle multiple gl contexts (it > just destroys the current one - i recall to have mentioned such impression > before - bug reports reg. crashes on kwin ./. gl games -, so to me this sounds > absolutely reasonable) > Not so at all (see my attachment to follow (simple code, works great with no memory leaks caught by valgrind). > -> The change is safe and should be applied (still, this is _clearly_ a driver > bug, and no, i've not written this code :-) but this won't fix the external > issues) > I think the only think that clear here, and rather sad as well, is that for as long as this bug has been open, no one of any appreciable expertise has been able to put any time in. I'm not going to defend 'the drivers;' I know Mesa is a mess, I know the intel driver is far from ideal (if its anything like their buggy IPW2200 wireless driver code, which I have had the pleasure to look at)... > > Further, if I modify the code so that a new context is created, used, and then destroyed only if > > oldcontext = NULL (i.e., initGLXContext used only when oldcontext = NULL), and when > > oldcontext != NULL, I simply use whatever current context is active, then again, the issue entirely > > disappears. > > Because you do not bother the driver with multiple contexts anymore... - since > the context is (for whatever reason) always constructed "direct"* and the > actual directness nature is handled by the env var (-> why this??), kwin will > unlikely run contexts on different dpys or vis and as long as the detection > code does not mess up with the context but just queries some values, that's no > big deal... but afaics not "correct" either... and a bit wonky if the dection > ever starts to impact the "testing" context... =\ > > * the "True" in "glXCreateContext( display(), visinfo, NULL, True );" > > > Now, as the code in CompositingPrefs::detect() looks more or less sound > ... except for the complete break on glXDestroyContext() ... ;-) > > > In fact, it is quite simple to put together a simple standalone problem which closely mimics > > CompositingPrefs::detect(); I have done so, and all works as expected, > ... with two open contexts at the same time? (you should attach the code) > Again, see my attachment to follow. Of course you don't have two current contexts (in the same process or thread)-- not possible, but you can have any number of contexts defined -- you just switch among them -- and can can have multiple 'displays' as well -- I don't mean multiple servers, I mean multiple connections to a single X server. > > no leaks caught by valgrind, etc. > What does the leak mention refer to? (just the XFree stuff?) > See my attachment -- there are some sample runs -- along with the code -- you can give a try yourself on your system if you like. > > Based on this, I think the problem is not in Mesa/Xorg/libdrm/video drivers. > Based on your observation i'm damn sure that mesa / the intel driver can only > reliably handle _one_ gl contxt at a time ;-) Again, dead wrong-- see my attachment > I wouldn't mind, but iff this really holds across processes as well, this > workaround won't fix composited kwin + other gl clients ... :-( > > > Since the CompositingPrefs::detect() code depends crucially the X connection ( > > display() here ), I suspect that this value may be dynamic even while > > CompositingPrefs::detect() is running; > As you pointed out above, my suspicion is probably unfounded, this is the sort of input we need! > unlikely - display() is QX11Info::display() which returns a value assigned > during the QApplication construction - you could debug the value, but i doubt > it ever changes once QApplication() was called (which is a requirement to do > anything GUI related in Qt) > Since CompositingPrefs is constructed in the options constructor, options are > allocated on the heap in the kwin ::Application constructor which inits on the > KApplication constructor invoking the QApplication constructor, all should be > fine... "should" > "should" ?! Is that a yes or or no ? > Ok, main questions: > - You mentioned that repositioning the deleteGLXContext() call completely fixed > it for you (so far): does that still hold? Absolutely > - Did reconnecting the xserver cause you any further improvement? Possibly some effects, e.g., the rolling cube is much less choppy now, but this is quite subjective, and there is variability, so not really > - Do you still have to set the libgl_always_indirect variable or -and more > important- does dri now work for you? > I stopped setting LIBGL_ALWAYS_INDIRECT a while ago (as I mentioned, I thought). I don't set ant opengl environment variables, so its always direct. > If not, my opinion is > a) commit the deleteGLXContext() repositioning > b) commit the required XFree*()'s > c) do NOT make the code more compex by adding a second display connection, > since "0" is not necessarily the proper display string and might just cause > further issues... and /theoretically/ it should not be necessary at all > As I mentioned, the patch I submitted was NOT A FIX, but rather, I hoped that it might shed light on the issue. Simply moving deleteGLXContext() is NOT a solution-- it does, on my system, stop the immediate issue, but its not the proper thing to do (it may in fact introduce memory leaks and/or other issues downroad). > Finally and > @Martin/Lucas/Fredrik/whoever: > > Can please so. elaborate why we cannot directly open an indirect context in > case allocating the direct context fails but rely on the evironment variable? > (unrespected by some implementations?) This is ok, after all that's what was done in kde-4.4.x . Its probably the best thing to do until someone figures this thing out. I have several pcs using indirect and they run kde 4.4.x with effects very well. Really now, I love kde (except for its size!)-- in my opinion, its by far the best... so, I'm going to continue looking at this, as time permits...
Created attachment 51559 [details] glx detect/contect setting test code The code comes first, followed by a couple of test cases, one with direct, the second with indirect rendering active (g++ -lGL -o glxtest glxtest.cpp to build)
One other thing, I think this "bug" may very well encompass a lot of issues -- most of which are xorg/mesa/drm/driver related -- I'm pretty sure of this. I know from several years of experience that some combinations of these pkgs work really well, but most don't, and for me it mostly been by trial 'n error to fix things, because I know little about the details. I mentioned earlier that while v4.4.x works well for me, if I upgrade any or all of the xorg/mesa/drm/driver pkgs I system can become totally unstable. For example, xf86-video-intel-2.10.0 won't work with libdrm-2.4.21, and with Mesa less than 7.7, and xf86-video-intel-2.12.0 won't play well with xorg-server less than 1.8.0 etc, etc., and it all varies from pc to pc... yuk. My focus here, thus far has been ONLY on the 'hang' resulting from desktop effects changes, nothing else-- simply, because on my (intel-based) systems, this is the most visible problem. These ate in fact some stability issues (which may or my not be related, but they can wait). As I mentioned w/ or w/o direct rendering, I have nice desktop effects -- not all of them (no blur, no explosion, etc), but enough, I'm happy. As I recall, for me, KDE4 wasn't stable until 4.2, so I can wait..
(In reply to comment #75) > Patch does not work for me. Was it supposed to be applied in tandom with any > of the other patches/reverts described in the bug? > > This is on current Kubuntu Maverick (with KDE 4.5.1) and a recent Mesa git > snapshot. > > It did change things slightly. Previously on this system (Dell mini 10v with > Intel 945GME), shift+alt+f12 after a freeze would cause an X crash. Now it > doesn't. It just does nothing instead. No, no other patches. Depressing... You could try rebuilding kdebase-workspace-4.5.1 with only the single change in the location of deleteGLXContext(); (see comment #81) just to see if it helps... Can you give the versions (revisions if from git) of all of xorg-server, libX11, Mesa, libdrm, the xorg video driver, and linux kernal?
Created attachment 51562 [details] Another possible fix How about this version of the patch? I have been able to reproduce the bug with the r600c driver, but with this patch everything seems to be working fine. There are piglit[1] tests for destroying and switching contexts (glx-destroycontext-1 and glx-destroycontext-2), but not for the sequence of calls kwin is using at the moment. [1] http://cgit.freedesktop.org/piglit
(In reply to comment #85) > Created an attachment (id=51562) [details] > Another possible fix > > How about this version of the patch? > > I have been able to reproduce the bug with the r600c driver, but with this > patch everything seems to be working fine. > > There are piglit[1] tests for destroying and switching contexts > (glx-destroycontext-1 and glx-destroycontext-2), but not for the sequence of > calls kwin is using at the moment. > > [1] http://cgit.freedesktop.org/piglit The key point here is the repositioning the destroycontext line since -- so we have a 2nd success! Still, its not a done deal, for the reasons I've mentioned: if oldcontext is not null, then its equivalent to repositioning the destroycontext line, if oldcontext is null, then detect() will return w/o having freed up X and GL resources -- not good -- may introduce other problems. See, destroycontext prior to restoring oldcontext, means that the "destroy" is incomplete (it'll be completed by the makecurrent switch). The thing to note here is that if a mere repositioning like this has big effect (as it does on my and your system), then somethings amiss - I reported this to the Mesa folks, and they just blew it off, but if you look at the relevant Mesa code, I think they're mistaken. Even so, I still don't see this as clearly implying a Mesa bug as the cause of our problem.
(In reply to comment #86) > The key point here is the repositioning the destroycontext line since -- so we > have a 2nd success! Still, its not a done deal, for the reasons I've > mentioned: if oldcontext is not null, then its equivalent to repositioning the > destroycontext line, if oldcontext is null, then detect() will return w/o > having freed up X and GL resources -- not good -- may introduce other problems. > See, destroycontext prior to restoring oldcontext, means that the "destroy" is > incomplete (it'll be completed by the makecurrent switch). I don't quite see how we could leak anything, but maybe I'm missing something. My patch calls glxMakeCurrent() unconditionally after marking the context created by the detection code for destruction, and that should free it. That it switches to a null context shouldn't make a difference. > The thing to note here is that if a mere repositioning like this has big effect > (as it does on my and your system), then somethings amiss - I reported this to > the Mesa folks, and they just blew it off, but if you look at the relevant Mesa > code, I think they're mistaken. Even so, I still don't see this as clearly > implying a Mesa bug as the cause of our problem. I try not to assume anything before analyzing an issue, but I see nothing that suggests that the order in which kwin is making these calls is illegal. I think the best thing to do is to commit this modified patch, and submit new piglit tests for the combinations that aren't being tested currently.
you're not introducing leaks by your patch, but they're "present" (colormap & visual -at least as colormap isn't the default one, no idea whether the -default- visual is some global static instance on the sever) The call order is NOT illegal - however john's testcase seems to do similar w/o causing trouble (but it could simply depend on the state of the other gl context, from a rough look, the testcase doesn't do anything but swapping buffers) glXDestroyContext only frees the id unconditionally - the actual wipeout should not happen while the context is active. (but right afterwards)
(In reply to comment #87) > (In reply to comment #86) > > The key point here is the repositioning the destroycontext line since -- so we > > have a 2nd success! Still, its not a done deal, for the reasons I've > > mentioned: if oldcontext is not null, then its equivalent to repositioning the > > destroycontext line, if oldcontext is null, then detect() will return w/o > > having freed up X and GL resources -- not good -- may introduce other problems. > > See, destroycontext prior to restoring oldcontext, means that the "destroy" is > > incomplete (it'll be completed by the makecurrent switch). > > I don't quite see how we could leak anything, but maybe I'm missing something. > My patch calls glxMakeCurrent() unconditionally after marking the context > created by the detection code for destruction, and that should free it. That it > switches to a null context shouldn't make a difference. > The thing is calling glxMakeCurrent() when there is a current context, implicitly does a glxMakeCurrent( dpy, None, None, NULL) prior to making the the new context current. So calling glxMakeCurrent() unconditionally as you do if perfectly ok, but unnecessary (in other words, the existing detect code doing the switch back is doing the same thing your patch does, with the exception of the destroycontext placement). The reason why I suggest that resources may not be freed is that if you look at Mesa glxcmds.c and glxcurrent.c, whether destroycontext is called before or after the context switch DOES appear to make a difference (this, despite the Mesa folks, saying it doesn't)-- and, unknown reasons, the fact is, on my system it absolutely makes a difference. > > The thing to note here is that if a mere repositioning like this has big effect > > (as it does on my and your system), then somethings amiss - I reported this to > > the Mesa folks, and they just blew it off, but if you look at the relevant Mesa > > code, I think they're mistaken. Even so, I still don't see this as clearly > > implying a Mesa bug as the cause of our problem. > > I try not to assume anything before analyzing an issue, but I see nothing that > suggests that the order in which kwin is making these calls is illegal. > > I think the best thing to do is to commit this modified patch, and submit new > piglit tests for the combinations that aren't being tested currently.
(In reply to comment #88) > you're not introducing leaks by your patch, but they're "present" (colormap & > visual -at least as colormap isn't the default one, no idea whether the > -default- visual is some global static instance on the sever) Well let's please focus on one bug at a time here. (In reply to comment #89) > The thing is calling glxMakeCurrent() when there is a current context, > implicitly does a glxMakeCurrent( dpy, None, None, NULL) prior to making the > the new context current. So calling glxMakeCurrent() unconditionally as you do > if perfectly ok, but unnecessary (in other words, the existing detect code > doing the switch back is doing the same thing your patch does, with the > exception of the destroycontext placement). It should be unnecessary, but in practice the bug is still reproducible for me without that call. That's why I added it. > The reason why I suggest that resources may not be freed is that if you look at > Mesa glxcmds.c and glxcurrent.c, whether destroycontext is called before or > after the context switch DOES appear to make a difference (this, despite the > Mesa folks, saying it doesn't)-- and, unknown reasons, the fact is, on my > system it absolutely makes a difference. I guess the question then is if it's better to risk leaking a context each time the apply button is clicked (which is not that often), or have kwin freeze or crash. I certainly vote for the former. The leak also wouldn't be our bug in this case.
(In reply to comment #90) > (In reply to comment #88) > > you're not introducing leaks by your patch, but they're "present" (colormap & > > visual -at least as colormap isn't the default one, no idea whether the > > -default- visual is some global static instance on the sever) > > Well let's please focus on one bug at a time here. > > (In reply to comment #89) > > The thing is calling glxMakeCurrent() when there is a current context, > > implicitly does a glxMakeCurrent( dpy, None, None, NULL) prior to making the > > the new context current. So calling glxMakeCurrent() unconditionally as you do > > if perfectly ok, but unnecessary (in other words, the existing detect code > > doing the switch back is doing the same thing your patch does, with the > > exception of the destroycontext placement). > > It should be unnecessary, but in practice the bug is still reproducible for me > without that call. That's why I added it. > > > The reason why I suggest that resources may not be freed is that if you look at > > Mesa glxcmds.c and glxcurrent.c, whether destroycontext is called before or > > after the context switch DOES appear to make a difference (this, despite the > > Mesa folks, saying it doesn't)-- and, unknown reasons, the fact is, on my > > system it absolutely makes a difference. > > I guess the question then is if it's better to risk leaking a context each time > the apply button is clicked (which is not that often), or have kwin freeze or > crash. I certainly vote for the former. The leak also wouldn't be our bug in > this case. Agreed.
(In reply to comment #88) > you're not introducing leaks by your patch, but they're "present" (colormap & > visual -at least as colormap isn't the default one, no idea whether the > -default- visual is some global static instance on the sever) > > The call order is NOT illegal - however john's testcase seems to do similar w/o > causing trouble (but it could simply depend on the state of the other gl > context, from a rough look, the testcase doesn't do anything but swapping > buffers) > > glXDestroyContext only frees the id unconditionally - the actual wipeout should > not happen while the context is active. (but right afterwards) I'm now in 100% agreement with: I'm totally convinced this is entirely a Mesa issue. Please accept, my apologies.. I'll probably be submitting one or more patches to Mesa, but who knows how that will go. This appears to be what's happening: 1) on entry to detect, if there's no current context the all is well. This happens whenever compositing is not active: at boot, and on bringing and making a desktop effects change 2) once, desktop effects are active, however, just after making a change in effects, detect is entered when there is a current context (this happens at the prefs.detect() call in Options::reloadCompositingSettings (options.cpp) ). In prefs.detect(), the failure occurs at XDestroyContext() because Mesa then, destroys all local drawables for which there does not exist an associated Window on display(), Defaultscreen(display()). This is totally wrong in our case as the current compositing context has associated drawables also on display(), Defaultscreen(display())! I'm not 100% sure why Mesa doesn't detect this, but I guess because the compositing Window was created in a different thread. Anyway, the result is that XDestroyContext() in detect() wipes out local drawables being used by the compositing context, and hence the 'hang.' I've patched Mesa to NOT destroy any drawables that are current -- and, oi, it works. Obviously, this is not a complete fix (non-current drawables without existing Windows are still destroyed -- clearly a possible a superset of what should be destroyed). The reason why repositioning XDestroyContext() works is because (contrary to Mesa doc), calling XDestroyContext() while the context is still current, or after it has been made non-current, are not, even in the end, equivalent. This is a Mesa bug in my judgment. Calling XDestroyContext() while the context is still current, never runs the drawables destruction described above -- this only happens when XDestroyContext() is called on non-current context. The reason why having detect() use a different Xserver connection as I did in the patch works is that since detect()'s drawables are on a different display (different from the compositing context which is on QX11Info::display()), XDestroyContext() does not destroy drawables on compositing's display. For me this issue is resolved (I'll simply patch Mesa and be done with it), but I think it'd be nice if we could done something simple in kde to circumvent the problem. We could simply reposition XDestroyContext() and live with the small associated memory leak (comment #90). Another possibility (which I'd verified working), is to modify detect() to check if there is a current constant on entry, and if so, simply use it as is, i.e., add code like: if( glXGetCurrentContext() != NULL) { detectDriverAndVersion(); applyDriverSpecificOptions(); return; } In this case, there will be no need to save and restore the current context; a new 'detect' context will then only be employed when there is no current context. As I said, I've tried this and it appears to work fine. Nevetheless, I don't know kde well enough to be sure that this is an appropriate thing to do. I'll attach a patch for this momentarily, so you see what I mean, but again, it may not be the right thing to do..
Created attachment 51659 [details] Idea to explore for sidestepping Mesa bug(s)
I agree with Thomas that the ultimate cause of this issue is probably at least driver-related, since it doesn't occur for everyone. Perhaps, with some 'well-written' drivers the issue simply doesn't come up -- perhaps with intel drivers the 'screen id' is mishandled -- who knows..
Finally resolved (I think). When detect() is entered when a current context already exists, at least some of the times, this context is associated with a Pixmap, not a Window. Mesa, in glXDestroyContext() will release drawables not a associated with a valid Windows for the current display() and screen, so it releases current drawables associated with the Pixmap (which, I think is associated with active compositing). Anyway, submitted a one-liner patch to Mesa (https://bugs.freedesktop.org/show_bug.cgi?id=30220) so hopefully things will be fixed by Mesa-7.8.3
Thanks for your investigations and the time spent on this issue. I now dare to mark this bug as upstream. If the fix for mesa does not resolve this issue, please reopen.
I can confirm the proposed mesa patch solves the problem. I'm going to see if we can get it in for the next Ubuntu release. Thank you for working so hard on this.
Unfortunately, the Mesa patch I submitted may introduce unacceptable memory leak issues in Mesa because with it certain Mesa/driver resources now may end up not being released. So it may break other drivers. I Guess its like working on old plumbing: fix a leak here, cause a new leak over there, ...
KDE 4.4.x was fine, so whatever changed between that version and this should be reversed. The drivers probably won't be fixed anytime soon, and we shouldn't give a substandard KDE experience while we wait for them to correct their mistakes. There has to be a work around. When it comes to driver developers fixing their mistakes, my experience over the last ten years when it comes to Linux is that they usually don't.
Will the patch in #93 merged in kde? I don't know quite a log about the opengl or glx, but his explanation for this problem is quite well. My R600 (Ati hd 3450) works well with this patch.
> 4.4.x was fine, so whatever changed between that > version and this should be reversed. The drivers probably won't be fixed > anytime soon, and we shouldn't give a substandard KDE experience while > we wait for them to correct their mistakes. There has to be a work > around. The change cannot be reverted in the 4.5 cycle as we plain simple cannot guarantee that there wont be regressions. We know that the 4.4 variant crashed instead of freezing and we know that without the change the desktop becomes unusable as direct rendering does not support the extensions blur requires. Working around these new problems is probably not possible without introducing regressions - e.g. Desktop unusable or blur disabled for users where it worked fine.
My 2¢ i oppose patch #93 since it removes the "proper" path to store/swap/test/restore the context [1] but if swapping the destruction order fixes it, this should be applied [2] to workaround mesa. iff "not bother mesa with two contexts" is somehow mandantory, the current path should imo remain and just be blocked by the "if( glXGetCurrentContext() != NULL )" part (since it will "softcode" the NULL for the makeCurrent call) [1] no problem atm, but not right either and the production context should not be messed up by testing [2] the pot. X leaks are a complete different issue and can be fixed regardless
(In reply to comment #102) > My 2¢ > > i oppose patch #93 since it removes the "proper" path to > store/swap/test/restore the context [1] but if swapping the destruction order > fixes it, this should be applied [2] to workaround mesa. > > iff "not bother mesa with two contexts" is somehow mandantory, the current path > should imo remain and just be blocked by the "if( glXGetCurrentContext() != > NULL )" part (since it will "softcode" the NULL for the makeCurrent call) > > [1] no problem atm, but not right either and the production context should not > be messed up by testing I agree, I think its bad practice, but just wanted to get input > [2] the pot. X leaks are a complete different issue and can be fixed regardless My understanding now is that there will be no leaks incurred by re-positioning because the resources not released by doing this, will be released later when the associated Pixmaps are destroyed. So I think its the way to go.
Created attachment 51774 [details] compositing settings fix and resource cleanup I don't think we can/should wait on Mesa for this. It looks like a deep problem in the way Pixmaps are handled, and may have worsened for OpenGL-1.3 and newer. The Mesa patch I submitted may or may not be a solution because it may screw up other things internally, anyway, no response from them. This patch just does the repositioning of glXDestroyContext, and resource releases; it also include several additional files which have similar resource release issues. Even if Mesa fixes things, everything in this patch should remain ok. Hopefully, others with this issue can give this one a try
@ John Thanks for submitting a patch. Did you do this on a bug tracker or some place we can go and make some noise about this? I'm no good at coding... but if you need someone to pester them into fixing it... I'm there. Just point me in the right direction.
(In reply to comment #105) > @ John Thanks for submitting a patch. Did you do this on a bug tracker or some > place we can go and make some noise about this? I'm no good at coding... but if > you need someone to pester them into fixing it... I'm there. Just point me in > the right direction. Yup, comment #95 above, here's the bug report link https://bugs.freedesktop.org/show_bug.cgi?id=30220
*** Bug 251928 has been marked as a duplicate of this bug. ***
Will a workaround go into KDE 4.5.2? I would count this as double major. The KDE 4.5 branch from today still froze for me.
As this is 'resolved' I'm not sure this is the appropriate place for this but I'm not sure where else to post it: Some good news of sorts: I've done some testing with the following config: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (i915) xorg-server-1.9.0, xf86-video-2.13.0, libdrm-2.4.22, mesa-7.9-rc2, qt-4.7.0 kde-4.5.1, linux-2.6.35.7 Up to now 'blur' (on my system, always activated on first kde-desktop boot, but has never worked) hasn't caused any problems. Now, with Mesa-7.9rc2/xorg-server-1.9.0 it causes desktop effects to be too slow, resulting in all effects being turned off. So, if I first manually turn off this effect then desktop effects are activated and the 'hang' issue here is gone. Conclusion: Problem exists in Mesa-7.8.2/3rc1, but has been fixed in Mesa-7.9rc2. Blur, on the other hand, is quite sick (at least on my system), but this is another issue...
> Conclusion: Problem exists in Mesa-7.8.2/3rc1, but has been fixed in > Mesa-7.9rc2. Btw I am rather sure that it only exists in 7.8 as I am unable to reproduce this issue with Mesa 7.7 which is used in the Debian system I can use for Intel testing.
*** Bug 253969 has been marked as a duplicate of this bug. ***
>Btw I am rather sure that it only exists in 7.8 as I am unable to reproduce >this issue with Mesa 7.7 which is used in the Debian system I can use for >Intel testing. I can confirm this idea, as I get this bug after I update my OpenSUSE 11.2 (with Mesa 7.7) to 11.2 (with Mesa 7.8).
*** Bug 254954 has been marked as a duplicate of this bug. ***
I updated from Fedora 13 (mesa 7.8) to Fedora 14 (mesa 7.9). Problem is gone.
*** Bug 257243 has been marked as a duplicate of this bug. ***
I'm using KDE 4.6 on Opensuse 11.3 w/ Intel 4500 MHD graphics card, kwin still freezes for me when I use OpenGL. It is fine under XRender.
Yes, because it is fixed in Mesa 7.9 and 11.3 ships 7.8.