Bug 357988

Summary: Black screen when reconnecting display
Product: [Plasma] kwin Reporter: Bernd Steinhauser <linux>
Component: generalAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED UPSTREAM    
Severity: normal    
Priority: NOR    
Version: 5.5.3   
Target Milestone: ---   
Platform: Exherbo   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Log output when changing the screen
qdbus-qt5 org.kde.KWin /KWin org.kde.KWin.supportInformation
glxinfo -l

Description Bernd Steinhauser 2016-01-14 16:35:12 UTC
This sounds very similar to bug 353975, but I'm pretty sure, that there are two bugs, one in plasmashell and one in kwin or X11 or driver. bug 356322 could be related as well but is for 4.x

The bug in plasmashell causes one screen to misbehave, there is no desktop background shown, no context menu available etc., but windows can be moved there and are usable.

The bug in kwin behaves different. The screen is completely black, only the mouse pointer is visible if moved there. Windows can be moved there, but will not show up.
Neither `kwin_x11 --replace` nor a restart of plasmashell will fix this. Suspending compositing does not help either.
It will however be fixed when switching to the console (i.e. Ctrl+Alt+F2) and back.
There is no flickering when doing so.

Currently, it happens every time I disconnect and reconnect the screen, but I've also seen a rate of approx. 50% 2 or 3 weeks ago.

Reproducible: Always




The driver is the radeonsi driver. mesa is currently scm, but have seen this with 11.x as well.
Kernel is 4.4, xorg-server is at 1.18.0.

These are the messages I get when disconnecting the screen. Upon reconnection, there are no messages from kwin directly:
QXcbConnection: XCB error: 3 (BadWindow), sequence: 47691, resource id: 12582942, major code: 19 (DeleteProperty), minor code: 0
QXcbConnection: XCB error: 3 (BadWindow), sequence: 47693, resource id: 12582942, major code: 19 (DeleteProperty), minor code: 0
QXcbConnection: XCB error: 3 (BadWindow), sequence: 47694, resource id: 12582942, major code: 18 (ChangeProperty), minor code: 0
QXcbConnection: XCB error: 3 (BadWindow), sequence: 47695, resource id: 12582942, major code: 19 (DeleteProperty), minor code: 0
QXcbConnection: XCB error: 3 (BadWindow), sequence: 47696, resource id: 12582942, major code: 19 (DeleteProperty), minor code: 0
QXcbConnection: XCB error: 3 (BadWindow), sequence: 47697, resource id: 12582942, major code: 19 (DeleteProperty), minor code: 0
QXcbConnection: XCB error: 3 (BadWindow), sequence: 47698, resource id: 12582942, major code: 7 (ReparentWindow), minor code: 0
QXcbConnection: XCB error: 3 (BadWindow), sequence: 47699, resource id: 12582942, major code: 6 (ChangeSaveSet), minor code: 0
QXcbConnection: XCB error: 3 (BadWindow), sequence: 47700, resource id: 12582942, major code: 2 (ChangeWindowAttributes), minor code: 0
QXcbConnection: XCB error: 3 (BadWindow), sequence: 47701, resource id: 12582942, major code: 10 (UnmapWindow), minor code: 0
Comment 1 Thomas Lübking 2016-01-14 16:37:42 UTC
please dump "qdbus org.kde.KWin /KWin supportInformation" before and when this happens and ideally also when "resolving" the situation.

Does restarting the compositor (SHIFT+Alt+F12, twice) "fix" it?
Comment 2 Bernd Steinhauser 2016-01-14 16:38:06 UTC
Created attachment 96637 [details]
Log output when changing the screen
Comment 3 Bernd Steinhauser 2016-01-14 16:38:26 UTC
Created attachment 96638 [details]
qdbus-qt5 org.kde.KWin /KWin org.kde.KWin.supportInformation
Comment 4 Martin Flöser 2016-01-14 16:41:42 UTC
can you please also output glxinfo -l
Comment 5 Bernd Steinhauser 2016-01-14 16:42:01 UTC
(In reply to Thomas Lübking from comment #1)
> please dump "qdbus org.kde.KWin /KWin supportInformation" before and when
> this happens and ideally also when "resolving" the situation.
Wow, you're fast. :D

During the disconnect, the only difference is the removed screen:
-Number of Screens: 2
+Number of Screens: 3

+Name: DisplayPort-0
+Geometry: 1920,0,1920x1200
+Refresh Rate: 59.9502
+
+Screen 2:
+---------

After the reconnect, the output is identical to the one before the disconnect.

> Does restarting the compositor (SHIFT+Alt+F12, twice) "fix" it?
Nope.
Comment 6 Bernd Steinhauser 2016-01-14 16:42:52 UTC
Created attachment 96639 [details]
glxinfo -l
Comment 7 Bernd Steinhauser 2016-01-14 16:43:51 UTC
(In reply to Bernd Steinhauser from comment #5)
> During the disconnect, the only difference is the removed screen:
> -Number of Screens: 2
> +Number of Screens: 3
> 
> +Name: DisplayPort-0
> +Geometry: 1920,0,1920x1200
> +Refresh Rate: 59.9502
> +
> +Screen 2:
> +---------
Meh, diff in the wrong direction …
Comment 8 Thomas Lübking 2016-01-14 17:36:05 UTC
Sum up:
- kwin has a proper idea of the screen geometries
- not even restarting the compositor (complete rebuild of GL context) "fixes" it
- switching framebuffers (VT2 and back) works

Smells a lot like an X11 driver issue (which only paints the cursor into the framebuffer, but not actual X11)

- What's the state when just suspending the compositor - is the screen black as well?
- What if you even "kwin_x11 --replace &"
- Does this affect all screens or only the displayport one?
Comment 9 Bernd Steinhauser 2016-01-14 18:06:13 UTC
(In reply to Thomas Lübking from comment #8)
> - What's the state when just suspending the compositor - is the screen black
> as well?
No, that works fine.
> - What if you even "kwin_x11 --replace &"
Fine.
> - Does this affect all screens or only the displayport one?
Seems like it's only that screen (an Eizo).
Neither the HDMI one nor the DVI one show that behaviour.

I think the Eizo Monitor uses DP 1.2, it's giving me a headache from time to time anyway (i.e. the DP Connection is cut off when switching off the screen, which is why I'm seeing this even when just leaving for 20mins). Don't know if that is the reason, but I will try with the Dell screen which I would have to switch from HDMI to Displayport, though. For that one I can switch DP 1.2 on and off.
Comment 10 Bernd Steinhauser 2016-01-16 09:11:26 UTC
Two more things that might be worth mentioning:
1. I switched to Console with the screen disconnected and back to see if it occurs at well. After Reconnecting the screen, I saw screen corruption. However, using Ctrl+Alt+F2 still fixed it.
2. Once Plasma went wild on me and hung up. Then when reconnecting the screen, I saw the wallpaper instead of a black screen. The rest behaved as described. Pointer was visible, could move windows there, but they don't become visible. (Still fixable with Ctrl+Alt+F2.) This leads to the conclusion, that the "black" part of the bug is introduced by Plasma somehow. And it seems like it creates that view with the wallpaper or the black screen that overlays the windows. Is that possible?
Comment 11 Thomas Lübking 2016-01-16 22:20:12 UTC
(In reply to Bernd Steinhauser from comment #10)

> somehow. And it seems like it creates that view with the wallpaper or the
> black screen that overlays the windows. Is that possible?

That would suggest that switching the VT would alter the stack position of the plasma window, would it? Unlikely. Also interaction w/ the plasma window should work (eg. the wheel to change the virtual desktop?) and at least popup menus should still show up on top.

Also you suggested that windows moved to the dead screen are "usable" (ie. react to interaction, resp. can be picked with eg. Alt+LMB and dragged back)

Since this is not related to the compositor, it's more likely a static scanout buffer on that screen, sometimes before and sometimes after a plasma window was moved/created there.
Comment 12 Bernd Steinhauser 2016-01-17 08:48:09 UTC
(In reply to Thomas Lübking from comment #11)
> That would suggest that switching the VT would alter the stack position of
> the plasma window, would it? Unlikely. Also interaction w/ the plasma window
> should work (eg. the wheel to change the virtual desktop?) and at least
> popup menus should still show up on top.
Nope, could only see the background and the mouse pointer.
Context menus don't shop up (those are popups, right?).

> 
> Also you suggested that windows moved to the dead screen are "usable" (ie.
> react to interaction, resp. can be picked with eg. Alt+LMB and dragged back)
Yes, that's possible. And not only that. I can see the pointer shape changes. I.e. if I have this window on the dead screen and move the pointer where this text field is supposed to be, I can see the pointer changing to the "I" shape.

(In reply to Thomas Lübking from comment #11)
> Since this is not related to the compositor, it's more likely a static
> scanout buffer on that screen, sometimes before and sometimes after a plasma
> window was moved/created there.
I just tried another thing, maybe that helps to analyze this. The test was motivated by the popup comment above. I wanted to open a popup menu on that screen, namely the "About" dialog of Firefox.
So I moved Firefox half way to that screen (to check if the dialog opens there and not on the other screen) and switched off the screen. (And back on again.)
The interesting thing here was, that I could see a screenshot of where the window was (it obviously moved after the disconnect because kwin moves windows around if you disconnect the output they are shown on). The rest of the screen was black.
However, this is only the case if the window at the time has focus. If I move it there, switch to a different window and then turn the screen off and on again, the screen is completely black.

I might be wrong, but that to me seems like a confirmation that this is a bug in kwin and not the driver?
Comment 13 Thomas Lübking 2016-01-17 11:35:06 UTC
- not even restarting the compositor (complete rebuild of GL context) "fixes" it
- switching framebuffers (VT2 and back) fixes it

Ie. the part of the GL frontbuffer that is related to the particular output is not updated/copied into the scanout buffer until X11 looses and regains the framebuffer.
The screen in question is in the middle (so the context isn't too small) and a compositor restart would cause a complete update of the entire scene in a new GL context unconditionally.

I'm fairly sure that it's not a bug in KWin, sorry.


KWin waits briefly (like 200ms) before moving windows to the remaining workspace, plasmashell (the desktop) might react instantly, so the static framebuffer will be taken at a spot where plasmashell already removed the desktop and kwin has not yet moved the firefox window.

If you
- suspend the compositor
- run glxgears
- move it to be partially on the left and partially on the (middle) DP output and
- re-attach the DP output:
a) glxgears should still cross the screens, otherwise move it into such position
b) how much of glxgears updates?
Comment 14 Bernd Steinhauser 2016-01-17 12:07:25 UTC
(In reply to Thomas Lübking from comment #13)
> If you
> - suspend the compositor
> - run glxgears
> - move it to be partially on the left and partially on the (middle) DP
> output and
> - re-attach the DP output:
> a) glxgears should still cross the screens, otherwise move it into such
> position
Yep.

> b) how much of glxgears updates?
Around 60 fps, no change there. The left part of glxgears is still updated, the right part is just a static image.
Comment 15 Thomas Lübking 2016-01-17 13:16:31 UTC
(In reply to Bernd Steinhauser from comment #14)
> Around 60 fps, no change there. The left part of glxgears is still updated,
> the right part is just a static image.

Well, since it's not kwin specific, we'll have to assume it's in the GL/X11 driver ;-)

Since the uncomposited glxgears runs at 60Hz, I assume you've got (flipping disabling)
Option "TearFree" "on"
set? (Xorg.0.log)
What about turning that off?
Comment 16 Bernd Steinhauser 2016-01-17 14:40:43 UTC
(In reply to Thomas Lübking from comment #15)
> (In reply to Bernd Steinhauser from comment #14)
> > Around 60 fps, no change there. The left part of glxgears is still updated,
> > the right part is just a static image.
> 
> Well, since it's not kwin specific, we'll have to assume it's in the GL/X11
> driver ;-)
> 
> Since the uncomposited glxgears runs at 60Hz, I assume you've got (flipping
> disabling)
> Option "TearFree" "on"
> set? (Xorg.0.log)
> What about turning that off?
Tried that and if I turn that off and do the above steps, all of my screens go black which does not seem to be recoverable. I see these messages in my journal:
Jan 17 15:33:08 orionis kernel: [drm:radeon_dp_link_train] *ERROR* displayport link status failed
Jan 17 15:33:08 orionis kernel: [drm:radeon_dp_link_train] *ERROR* clock recovery failed

However that led me to the other option I set for the radeon driver: DRI3.
I switched to DRI2 and this issue is gone. I did see a black screen when trying, but that was the other Plasma bug. Windows etc. show up on the screen and can be used.

I'll report a bug upstream.
Comment 17 Bernd Steinhauser 2016-02-18 04:33:31 UTC
FYI, a bug was reported upstream, the issue was found and got fixed:
https://bugs.freedesktop.org/show_bug.cgi?id=93746