Bug 357728 - Dragging window leads to display freeze
Summary: Dragging window leads to display freeze
Status: RESOLVED UPSTREAM
Alias: None
Product: kwin
Classification: Plasma
Component: general (other bugs)
Version First Reported In: 5.5.2
Platform: Exherbo Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-09 09:47 UTC by Bernd Steinhauser
Modified: 2016-04-05 15:37 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Last messages from journal when the freeze occured (1.20 KB, text/plain)
2016-01-09 09:48 UTC, Bernd Steinhauser
Details
qdbus-qt5 org.kde.KWin /KWin org.kde.KWin.supportInformation (3.57 KB, text/plain)
2016-01-09 09:48 UTC, Bernd Steinhauser
Details
qdbus-qt5 org.kde.KWin /KWin org.kde.KWin.supportInformation with compositing enabled (5.68 KB, text/plain)
2016-01-09 10:20 UTC, Bernd Steinhauser
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bernd Steinhauser 2016-01-09 09:47:29 UTC
Now I am aware, that this is possibly a bug in the display driver that is triggered by kwin.
If this is the case, what I'm here hoping for is to gather relevant information if anything.

The bug occurs randomly (but still relatively often), when I try to drag a window from one screen to another.
(I'm not sure if this is bound to multiscreen, since I rarely try to drag windows around on a single screen. Haven't seen that at least.)
This can lead to the whole desktop freezing. This seems to be a complete lockup of the gpu, I can't get back to the console even after using sysrq + r.
sysrq itself is still working as shown by the output. I can sync and reboot the system.
Sound will continue to play and the kernel does not spit out a bug info (so it doesn't indicate that there is something wrong).

I've not seen this happening for any trigger other than dragging a window. I tried with and without the transparency effect.
I've also tried OpenGL 3.1 and 2.0, I've tried GLX and EGL. I think (but I'm not sure here), I've even tried with the compositor switched off.

I've seen this issue since the update from 5.4.x to 5.5.0. A kernel update was not performed before that started to happen (kernel 4.3. I'm now using 4.4-rc8, but that does not affect this.).
mesa was 11.0.4 when updating kde and was updated up to 11.1.0 since.

I've since basically stopped using the dragging windows feature and just move windows around by using the "Move to Screen" functionality and this works fine.
Only if I forget about this issue and drag a window anyway, I'm in real danger to lock up my system (as happened yesterday again).

Reproducible: Sometimes
Comment 1 Bernd Steinhauser 2016-01-09 09:48:30 UTC
Created attachment 96536 [details]
Last messages from journal when the freeze occured
Comment 2 Bernd Steinhauser 2016-01-09 09:48:58 UTC
Created attachment 96537 [details]
qdbus-qt5 org.kde.KWin /KWin org.kde.KWin.supportInformation
Comment 3 Thomas Lübking 2016-01-09 09:57:11 UTC
*cough*

> Compositing
> ===========
> Compositing is not active

> I think (but I'm not sure here), I've even tried with the compositor switched off.
Wow.

> I've seen this issue since the update from 5.4.x to 5.5.0.
https://git.reviewboard.kde.org/r/126266/
(A rather wild guess given the apparent GPU state - the grab should not be required and would not impact visual updates)

> This seems to be a complete lockup of the gpu
dmesg tail (if you can, probably via ssh?)
Comment 4 Bernd Steinhauser 2016-01-09 10:19:34 UTC
(In reply to Thomas Lübking from comment #3)
> *cough*
> 
> > Compositing
> > ===========
> > Compositing is not active
Hm, that's interesting, I didn't notice that. It seems like I can't enable compositing with egl anymore.
This definitely worked before, but I can't tell when it stopped working.
However, yesterday when the problem occurred, I was using the glx interface. I just switched to give egl another try, but didn't notice that the compositor was inactive then.

> 
> > I think (but I'm not sure here), I've even tried with the compositor switched off.
> Wow.
Like I said, I'm not sure here, but will try again when I don't fear data loss.

> > I've seen this issue since the update from 5.4.x to 5.5.0.
> https://git.reviewboard.kde.org/r/126266/
> (A rather wild guess given the apparent GPU state - the grab should not be
> required and would not impact visual updates)
I can give it a try, although the application I see it the most with is gtk (firefox).

> > This seems to be a complete lockup of the gpu
> dmesg tail (if you can, probably via ssh?)
I normally have ssh deactivated, but will activate and see if I can get anything out of it.
I highly doubt that, though. journal seemed to work, kernel logging seemed to work (it did log the sysrq sync message), I synced the file system, so any message that would have been in dmesg should have ended up in the journal as well.

Could use that to debug some program when the issue happens, but for that I would require upfront information about what to do.
Comment 5 Bernd Steinhauser 2016-01-09 10:20:46 UTC
Created attachment 96538 [details]
qdbus-qt5 org.kde.KWin /KWin org.kde.KWin.supportInformation with compositing enabled
Comment 6 Thomas Lübking 2016-01-09 10:26:04 UTC
Even gtk+ windows should just kick the NETWM moveresize in the WM

> kwin4_effect_translucency
Try to disable this.

> electricBorderMaximize: true
> electricBorderTiling: true
and this

Do you have autohiding panels on one of the edges?
Comment 7 Thomas Lübking 2016-01-09 10:27:01 UTC
PS: and HW acceleration in FF! ("about:config" iirc, filter for "accel")
Comment 8 Bernd Steinhauser 2016-01-09 10:27:58 UTC
(In reply to Thomas Lübking from comment #6)
> Even gtk+ windows should just kick the NETWM moveresize in the WM
> 
> > kwin4_effect_translucency
> Try to disable this.
Will do.

> > electricBorderMaximize: true
> > electricBorderTiling: true
> and this
ok.

> Do you have autohiding panels on one of the edges?
No.
Comment 9 Bernd Steinhauser 2016-01-09 10:30:50 UTC
(In reply to Thomas Lübking from comment #7)
> PS: and HW acceleration in FF! ("about:config" iirc, filter for "accel")

apz.fling_accel_base_mult 1.0
apz.fling_accel_interval_ms 500
apz.fling_accel_supplemental_mult 1.0
layers.acceleration.disabled false
layers.acceleration.draw-fps false
layers.acceleration.force-enabled false
Comment 10 Thomas Lübking 2016-01-09 10:34:03 UTC
> layers.acceleration.disabled false
layers.acceleration.disabled true

Isn't double negation a pleasure for everyone? ;-)
Comment 11 Bernd Steinhauser 2016-01-14 16:36:53 UTC
(In reply to Thomas Lübking from comment #10)
> > layers.acceleration.disabled false
> layers.acceleration.disabled true
This is the first thing I'm trying and so far it looks good, I haven't had a freeze since. Will require some more time to be sure, though.
> Isn't double negation a pleasure for everyone? ;-)
Oh yeah … ;)
Comment 12 Bernd Steinhauser 2016-02-02 18:15:26 UTC
(In reply to Bernd Steinhauser from comment #11)
> (In reply to Thomas Lübking from comment #10)
> > > layers.acceleration.disabled false
> > layers.acceleration.disabled true
> This is the first thing I'm trying and so far it looks good, I haven't had a
> freeze since. Will require some more time to be sure, though.
Was wrong here. The day after I wrote this, I had a freeze with a completely unrelated non-gtk and non-Qt application. (It's actually a java one.)
The acceleration in FF was disabled at the time.

So this was not it. The next thing I tried was this:
> > kwin4_effect_translucency
> Try to disable this.
I've disabled this since the 15th of January. I've been happily dragging windows around and I haven't seen a freeze since then.
Since that was over 2 weeks ago, I think it's relatively safe to so that the issue was caused by the translucency effect. Of course this means that without the compositor enabled this should not happen and this
> I think (but I'm not sure here), I've even tried with the compositor switched off
would have been wrong?

Are there special OpenGL extensions that translucency requires? Maybe I could have a look at the kernel/mesa changes related to that and see if there were changes that could cause this kind of behavior?
Comment 13 Thomas Lübking 2016-02-03 20:51:45 UTC
(In reply to Bernd Steinhauser from comment #12)

> Are there special OpenGL extensions that translucency requires? Maybe I
> could have a look at the kernel/mesa changes related to that and see if
> there were changes that could cause this kind of behavior?

No, I rather suspect sth. along bug #350327 - can you gdb into KWin when this happens and check what it's doing?

https://community.kde.org/KWin/Debugging
Comment 14 Bernd Steinhauser 2016-02-03 21:18:21 UTC
Looking at that bug it seems different to me in that for me it leads to a complete freeze (cannot even kill X with sysrq+k (security access key). For allan, things seemed to get working again once he switched to VT7 and back.

Will try to repoduce by using the snapping functions and if that happens, try to ssh and gdb the thing.
Comment 15 Bernd Steinhauser 2016-02-05 19:51:57 UTC
Ok, tried to reproduce the quick-tiling thing in two ways:
1) Using shortcuts for quick tile left, right and top
2) Dagging the window quickly (works best in the top corner where kwin switches between side, top-side and top)

Neither of these had any effect other than the window jumping around as it should. There was no freeze and I do not observe the behaviour described in that bug.

So I'm pretty sure this one is different.
Comment 16 Thomas Lübking 2016-02-05 21:57:47 UTC
No, it's indeed unlikely.
gdb is likely to be unspecific as well (still worth a shot ;-), but dmesg of the life system might be very relevant.
reloading the radeon kernel module also might revive the system?
Comment 17 Bernd Steinhauser 2016-02-05 22:31:24 UTC
Last time it happened I forgot to enable ssh before, so I could not check. After that I started testing the setting mentioned above and it didn't happen again.
I will reenable translucency and see if I can reproduce the bug and ssh into the system. Unfortunately, I'm not using radeon as a module, so reloading it won't be possible unless I recompile my kernel.
Comment 18 Bernd Steinhauser 2016-04-05 15:37:18 UTC
Ok, I'm closing this as an upstream bug.
In February I tried some more configs, but could not track it down to something specific. The configurations didn't seem to change the behaviour.

After upgrading mesa and the kernel, I haven't seen this anymore. I've seen freezes happening, but those always happened when a video was running, thus seem to be related to that.
Currently I'm using the amdgpu driver and with that I haven't seen a freeze at all no matter what settings I'm using.

Thus, I'm pretty certain now that if it still happens, it's an upstream bug in the radeon/radeonsi driver.
And if so, it was likely fixed.