Now I am aware, that this is possibly a bug in the display driver that is triggered by kwin. If this is the case, what I'm here hoping for is to gather relevant information if anything. The bug occurs randomly (but still relatively often), when I try to drag a window from one screen to another. (I'm not sure if this is bound to multiscreen, since I rarely try to drag windows around on a single screen. Haven't seen that at least.) This can lead to the whole desktop freezing. This seems to be a complete lockup of the gpu, I can't get back to the console even after using sysrq + r. sysrq itself is still working as shown by the output. I can sync and reboot the system. Sound will continue to play and the kernel does not spit out a bug info (so it doesn't indicate that there is something wrong). I've not seen this happening for any trigger other than dragging a window. I tried with and without the transparency effect. I've also tried OpenGL 3.1 and 2.0, I've tried GLX and EGL. I think (but I'm not sure here), I've even tried with the compositor switched off. I've seen this issue since the update from 5.4.x to 5.5.0. A kernel update was not performed before that started to happen (kernel 4.3. I'm now using 4.4-rc8, but that does not affect this.). mesa was 11.0.4 when updating kde and was updated up to 11.1.0 since. I've since basically stopped using the dragging windows feature and just move windows around by using the "Move to Screen" functionality and this works fine. Only if I forget about this issue and drag a window anyway, I'm in real danger to lock up my system (as happened yesterday again). Reproducible: Sometimes
Created attachment 96536 [details] Last messages from journal when the freeze occured
Created attachment 96537 [details] qdbus-qt5 org.kde.KWin /KWin org.kde.KWin.supportInformation
*cough* > Compositing > =========== > Compositing is not active > I think (but I'm not sure here), I've even tried with the compositor switched off. Wow. > I've seen this issue since the update from 5.4.x to 5.5.0. https://git.reviewboard.kde.org/r/126266/ (A rather wild guess given the apparent GPU state - the grab should not be required and would not impact visual updates) > This seems to be a complete lockup of the gpu dmesg tail (if you can, probably via ssh?)
(In reply to Thomas Lübking from comment #3) > *cough* > > > Compositing > > =========== > > Compositing is not active Hm, that's interesting, I didn't notice that. It seems like I can't enable compositing with egl anymore. This definitely worked before, but I can't tell when it stopped working. However, yesterday when the problem occurred, I was using the glx interface. I just switched to give egl another try, but didn't notice that the compositor was inactive then. > > > I think (but I'm not sure here), I've even tried with the compositor switched off. > Wow. Like I said, I'm not sure here, but will try again when I don't fear data loss. > > I've seen this issue since the update from 5.4.x to 5.5.0. > https://git.reviewboard.kde.org/r/126266/ > (A rather wild guess given the apparent GPU state - the grab should not be > required and would not impact visual updates) I can give it a try, although the application I see it the most with is gtk (firefox). > > This seems to be a complete lockup of the gpu > dmesg tail (if you can, probably via ssh?) I normally have ssh deactivated, but will activate and see if I can get anything out of it. I highly doubt that, though. journal seemed to work, kernel logging seemed to work (it did log the sysrq sync message), I synced the file system, so any message that would have been in dmesg should have ended up in the journal as well. Could use that to debug some program when the issue happens, but for that I would require upfront information about what to do.
Created attachment 96538 [details] qdbus-qt5 org.kde.KWin /KWin org.kde.KWin.supportInformation with compositing enabled
Even gtk+ windows should just kick the NETWM moveresize in the WM > kwin4_effect_translucency Try to disable this. > electricBorderMaximize: true > electricBorderTiling: true and this Do you have autohiding panels on one of the edges?
PS: and HW acceleration in FF! ("about:config" iirc, filter for "accel")
(In reply to Thomas Lübking from comment #6) > Even gtk+ windows should just kick the NETWM moveresize in the WM > > > kwin4_effect_translucency > Try to disable this. Will do. > > electricBorderMaximize: true > > electricBorderTiling: true > and this ok. > Do you have autohiding panels on one of the edges? No.
(In reply to Thomas Lübking from comment #7) > PS: and HW acceleration in FF! ("about:config" iirc, filter for "accel") apz.fling_accel_base_mult 1.0 apz.fling_accel_interval_ms 500 apz.fling_accel_supplemental_mult 1.0 layers.acceleration.disabled false layers.acceleration.draw-fps false layers.acceleration.force-enabled false
> layers.acceleration.disabled false layers.acceleration.disabled true Isn't double negation a pleasure for everyone? ;-)
(In reply to Thomas Lübking from comment #10) > > layers.acceleration.disabled false > layers.acceleration.disabled true This is the first thing I'm trying and so far it looks good, I haven't had a freeze since. Will require some more time to be sure, though. > Isn't double negation a pleasure for everyone? ;-) Oh yeah … ;)
(In reply to Bernd Steinhauser from comment #11) > (In reply to Thomas Lübking from comment #10) > > > layers.acceleration.disabled false > > layers.acceleration.disabled true > This is the first thing I'm trying and so far it looks good, I haven't had a > freeze since. Will require some more time to be sure, though. Was wrong here. The day after I wrote this, I had a freeze with a completely unrelated non-gtk and non-Qt application. (It's actually a java one.) The acceleration in FF was disabled at the time. So this was not it. The next thing I tried was this: > > kwin4_effect_translucency > Try to disable this. I've disabled this since the 15th of January. I've been happily dragging windows around and I haven't seen a freeze since then. Since that was over 2 weeks ago, I think it's relatively safe to so that the issue was caused by the translucency effect. Of course this means that without the compositor enabled this should not happen and this > I think (but I'm not sure here), I've even tried with the compositor switched off would have been wrong? Are there special OpenGL extensions that translucency requires? Maybe I could have a look at the kernel/mesa changes related to that and see if there were changes that could cause this kind of behavior?
(In reply to Bernd Steinhauser from comment #12) > Are there special OpenGL extensions that translucency requires? Maybe I > could have a look at the kernel/mesa changes related to that and see if > there were changes that could cause this kind of behavior? No, I rather suspect sth. along bug #350327 - can you gdb into KWin when this happens and check what it's doing? https://community.kde.org/KWin/Debugging
Looking at that bug it seems different to me in that for me it leads to a complete freeze (cannot even kill X with sysrq+k (security access key). For allan, things seemed to get working again once he switched to VT7 and back. Will try to repoduce by using the snapping functions and if that happens, try to ssh and gdb the thing.
Ok, tried to reproduce the quick-tiling thing in two ways: 1) Using shortcuts for quick tile left, right and top 2) Dagging the window quickly (works best in the top corner where kwin switches between side, top-side and top) Neither of these had any effect other than the window jumping around as it should. There was no freeze and I do not observe the behaviour described in that bug. So I'm pretty sure this one is different.
No, it's indeed unlikely. gdb is likely to be unspecific as well (still worth a shot ;-), but dmesg of the life system might be very relevant. reloading the radeon kernel module also might revive the system?
Last time it happened I forgot to enable ssh before, so I could not check. After that I started testing the setting mentioned above and it didn't happen again. I will reenable translucency and see if I can reproduce the bug and ssh into the system. Unfortunately, I'm not using radeon as a module, so reloading it won't be possible unless I recompile my kernel.
Ok, I'm closing this as an upstream bug. In February I tried some more configs, but could not track it down to something specific. The configurations didn't seem to change the behaviour. After upgrading mesa and the kernel, I haven't seen this anymore. I've seen freezes happening, but those always happened when a video was running, thus seem to be related to that. Currently I'm using the amdgpu driver and with that I haven't seen a freeze at all no matter what settings I'm using. Thus, I'm pretty certain now that if it still happens, it's an upstream bug in the radeon/radeonsi driver. And if so, it was likely fixed.