Bug 353428 - KWin freezes until drkonqi is closed from a terminal
Summary: KWin freezes until drkonqi is closed from a terminal
Status: RESOLVED FIXED
Alias: None
Product: kwin
Classification: Plasma
Component: compositing (show other bugs)
Version: 5.4.1
Platform: Fedora RPMs Linux
: NOR crash
Target Milestone: ---
Assignee: KWin default assignee
URL: https://git.reviewboard.kde.org/r/126...
Keywords:
: 357536 358416 358891 359473 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-10-02 03:48 UTC by Robin Laing
Modified: 2016-04-18 12:18 UTC (History)
9 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.6
thomas.luebking: ReviewRequest+


Attachments
text file of the kwin related software. (68 bytes, text/plain)
2015-10-02 03:50 UTC, Robin Laing
Details
strace file of kwin freezing. (46 bytes, text/plain)
2015-10-02 03:56 UTC, Robin Laing
Details
strace of system freeze. (48 bytes, text/plain)
2015-10-02 03:57 UTC, Robin Laing
Details
KWin before crash as a reference. (34.99 KB, text/plain)
2015-10-07 03:34 UTC, Robin Laing
Details
gdb during freeze of the same session as the before crash post. (38.17 KB, text/plain)
2015-10-07 03:36 UTC, Robin Laing
Details
gdb trace of frozen session (38.08 KB, text/plain)
2015-11-14 04:43 UTC, Robin Laing
Details
gdb trace for non-crashed first session. (32.88 KB, text/plain)
2015-11-14 04:56 UTC, Robin Laing
Details
Trace of the locked kwin (32.03 KB, text/plain)
2016-04-17 22:23 UTC, Vincenzo Di Massa
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robin Laing 2015-10-02 03:48:05 UTC
Twice in about 10 minutes tonight, kwin froze and wouldn't refresh the display.  I can move the mouse around but I cannot click on any object or enter any text.  Keyboard doesn't do anything to the display either.

Open a terminal window, I can kill the drkonqi process and the desktop starts working again.

drkonqi is shown when typing in ps.

System call for drkonqi is:

/usr/libexec/drkonqi-platform xcb -display:0 --appname kwin_X11 --apppath /usr/bin --signal 11 --pid 16418 -- appversion 5.4.1 --programname kwin  --bugaddress submit @Bugs.kde..... (cannot see the rest)

Happened a second time after logging out and back in.
PID 18080





Reproducible: Always

Steps to Reproduce:
1.  Login to system.
2. Do something with a window or have a second call like a pop-up and the desktop crashes.
3.

Actual Results:  
Desktop freezes other than mouse movement.  Can CTRL+ALT+F(x) to get a terminal window.  If I go back to F1 screen, the second screen is now displayed on the first screen.

Kill the drkonqi process, the screen is again working.  Repeated twice.  Used to occur on a 386 version of Fedora 22.

Expected Results:  
No freezes.

Present two crases seem to be related to a second window opening on the desktop.  Maybe two windows.

I ran strace on the two fozen processes tonight.

Will attach files.
Comment 1 Robin Laing 2015-10-02 03:50:20 UTC
Created attachment 94805 [details]
text file of the kwin related software.
Comment 2 Robin Laing 2015-10-02 03:56:17 UTC
Created attachment 94806 [details]
strace file of kwin freezing.
Comment 3 Robin Laing 2015-10-02 03:57:31 UTC
Created attachment 94807 [details]
strace of system freeze.
Comment 4 Thomas Lübking 2015-10-02 07:20:38 UTC
The attachments contain no actual data.
It's also not relevant - important is to get the backtrace from kwin (it should not crash)

The process "freeze" is a bug in DrKonqui, the kwin process isn't frozen but sigstop'ped (there was a comment in some kwin bug indicating this happens) sending the KWin process a SIGCONT will continue (and kill) it. (Stopping is required by drkonqi to conditionally obtain the backtrace)

Please gdb into the crashed & stopped KWin process and dump the output, here's a cheat sheet:
https://community.kde.org/KWin/Debugging
Comment 5 Robin Laing 2015-10-03 04:16:43 UTC
Next time it happens I will do that.  It happened twice in a short time last night.  I will print out the cheat sheet for reference since my desktop is frozen.
Comment 6 Thomas Lübking 2015-10-03 07:18:28 UTC
Notice that since kwin already crashed, you do not want to "continue" it.
Comment 7 Robin Laing 2015-10-06 00:12:00 UTC
I have done what I can.  I get nothing when running gdb.  Even if I continue, the automatic bug reporter states that there is nothing to upload.

On a side question, is there an easy way to get the debug rpms to update at the same time as the main packages?  Should I enable the debuginfo in the yum.repos files?  Had a freeze but due to the updates, I needed to update all the debuginfo files to continue.
Comment 8 Robin Laing 2015-10-07 03:15:53 UTC
Happened multiple times tonight.  Tried to find something with drkonqi or the plasma desktop but nothing provided any data.

What I did find interesting was that the issue seems to occur when a new window or dialog box appears.  Password for Firefox password manager or Thunderbird.  Even a cookie request for Firefox will have the screen freeze.

If drkonqi is killed, some times the screen doesn't work as expected.  Had an experience where Thunderbird opened at the top left corner of the screen with no titlebar and when the password prompt appeared, I could click on it but not enter any text into the password box.

Tried to retrace the pid for KWin but it was not running.  

I now have a process kwin_X11 --crashes 1

I can open a terminal window with no issues.  I still have a working desktop and I am typing this on the same crashed desktop.
Comment 9 Robin Laing 2015-10-07 03:34:25 UTC
Created attachment 94869 [details]
KWin before crash as a reference.

As I can almost create a crash at will, I decided to create a before crash trace.
Comment 10 Robin Laing 2015-10-07 03:36:27 UTC
Created attachment 94870 [details]
gdb during freeze of the same session as the before crash post.

This is the gdb data during the freeze.  It is the same process as the kwin trace before the crash.
Comment 11 Robin Laing 2015-10-07 03:41:56 UTC
As I can almost create the crash at will with an updated machine as of last night, I am trying to find what I can about the issue.

When there is a freeze, there are two kwin_x11 processes.  In ps they are kwin_x11 and /usr/bin/kwin_x11 . I am assuming that the extra process is part of the call process.  I wonder if this causes a conflict on the screen refresh and thus freezes the screen because it doesn't know what to do.

After a killing drkonqi, the second kwin_x11 process is the one that is remaining.

Hope this helps.
Comment 12 Thomas Lübking 2015-10-07 06:20:47 UTC
The crash is bug #351839 - don't use the aurorae decoration engine but breeze.
=> crashes gone?

(Notice that the bug is claimed fixed, but it's very most likely a bug in Qt and we didn't fix anything about it)

Since some dialog seem to be a reliable trigger, can you please obtain the outputs of "xprop" and "xwininfo" on them (you may use the breeze decoration at this time, I'm interested in window hints that may restrict/cause some titlebar features) - run each command from konsole and click the window after the cursor turned into a "+"
Comment 13 Villu Ruusmann 2015-11-04 11:34:57 UTC
I am experiencing exactly the same problem that drkonqi is freezing my desktop.

The problem is manifested every time the computer is rebooted, typically during the first minute of KDE  session. Also, the only solution is to open the terminal using Crtl+Alt+F2, do "ps x", and kill the drkonqi process. When switching back to desktop, things resume their operations as if nothing had happened. So far, the problem occurs only once during a session.

The problem appeared after upgrading from Fedora 20 to 22. Today, I upgraded to Fedora 23 and the problem still persists.
Comment 14 Thomas Lübking 2015-11-04 22:23:48 UTC
Brainstorm:
DrKonqi will SIGSTOP us before we release the WM selection, so we'll likely better release the WM selection (or just xcb_disconnect(connection())) in Application::crashHandler()

Unfortunately, this will likely imply to have to copy the crashHandler into main_x11 and main_wayland, since we probably don't want to poke around in the vtable once we hit a segfault?!
Comment 15 Martin Flöser 2015-11-05 07:31:36 UTC
hmm, calling into xcb sounds also dangerous in the crash handler. What if that crashes?

Anyway: releasing the WM selection sounds more robust to me.
Comment 16 Villu Ruusmann 2015-11-05 08:29:23 UTC
I think that in my case KWin is crashing very early into the session (eg. opening the first decorated window) because of the graphics card setup. I'm running Fedora 23/KDE on a Lenovo laptop that has a NVIDIA graphics card that is supported by the dreaded "nouveau" driver.

Anyway, after killing the drkonqi process the system appears to function perfectly, so I'm somewhat reluctant to try to reconfigure my graphics card and/or its drivers.
Comment 17 Robin Laing 2015-11-13 01:01:01 UTC
I am having an issue with KWin freezing under a different circumstance.   If KWin can block access to the keyboard and mouse, it may be related.  I am trying to recreate it on my system and it may be related to multiple sessions as I cannot recreate the issue with only one session running.

If I open a session that has already had the problem with drkonqi, if the second session crashes, the keyboard is locked and the mouse as well.  I can ssh into the computer and kill all processes related to the account except KWin.  Both times it has happened were very late at night and I was not up to trying to debug the kwin session.

Is it possible that Thomas Lübking comment (14) is causing something in the background and leaving the system in a strange state when drkonqi is killed?

If I can recreate the problem (when I am not trying to get to bed) and get a backtrace, I will create a new bug report.
Comment 18 Thomas Lübking 2015-11-13 08:00:59 UTC
If there's a remaining (stopped) kwin_x11 process, it's possible that it still holds an input grab or an invisible window catching all mouse input (but global shortcuts should remain operative in the latter case)

The resolution would in either case just be to "kill -SIGCONT" the stopped kwin process (it should then die and disconnect from the server, releasing all grabs and loosing all windows implicitly)
Comment 19 Robin Laing 2015-11-14 04:37:01 UTC
Okay, just after I read your email, I froze the second session.  One started with sddm and I was able to get to a point that I could cleanly shutdown using an ssh session.  Full keyboard and monitor locked out.  I did get a trace before shutting down.

I will upload them.
Comment 20 Robin Laing 2015-11-14 04:43:34 UTC
Created attachment 95487 [details]
gdb trace of frozen session

This is the session trace for the kwin session (second session running) that froze keyboard and mouse.  The screen saver did kick in though.

Session was started using the sddm session control.  

System froze when audacity crashed and I tried to kill the process.  The error message asking me to wait or terminate the program had popped up.  System froze at the exact time.

Previous (second) session freezes seemed to occur when there was a screen change from a slide show or a video.

I tried to kill the kwin process but I finally could kill this session by killing the related sddm-helper for the second kwin session.  I could then kill the kwin session.  Crashed session was killed but I didn't have to clean up everything via a terminal before doing a clean shutdown.
Comment 21 Robin Laing 2015-11-14 04:56:23 UTC
Created attachment 95488 [details]
gdb trace for non-crashed first session.

This is the trace of kwin for the still working session during the crash.  It was created before the second session had been killed.

I am just providing it as a reference.
Comment 22 Robin Laing 2015-11-27 03:33:16 UTC
Okay, I am still having this issue but it is only in an account that I am carrying over from pre F21 an up.  A new account that was setup after F21 doesn't seem to have the same issues.

I have deleted lots of old config files but I still have the issue.

I guess, I have to wipe the configurations and start from scratch.
Comment 23 Villu Ruusmann 2015-12-04 09:12:38 UTC
Has any work been done on this? I've been applying all Fedora 23 updates, and I just realized that my issue has probably gone away - there has been no need to kill drkonqi during last sessions.
Comment 24 Thomas Lübking 2015-12-05 16:11:45 UTC
No, but this is no deterministic error (depends on when the crash occurs and when the next xcb flush takes place) - I personally saw it once or twice only.
Also kwin should not crash itfp. and maybe your distro has simply backported the patch that prevents 0x0 QScreen deref segfault ... (less crashes, less problems with crash handling ;-)
Comment 25 bzi@samadhi-institute.org 2015-12-12 10:26:03 UTC
I experience absolute similar behavior, display freezes only mouse to move, however I am not able to  Strg-Alt-Fx (USB Keyboard) to a terminal window. So the only way to recover is a reset.
Comment 26 Robin Laing 2015-12-16 02:59:31 UTC
For part of the discussion.

I wonder if others that are having this issue have upgraded and those that don't have a problem are using a clean install.

I have a couple of test accounts on the same machine but never seem to have the issue on those test accounts.  Only on my personal account.

I have tried to clean up the configuration files but with things moving, I have not found everything.  I am loath to setup up a fully clean personal account since the settings are not fully back to where they were.

FWIW, it happened to me last night so I this issue is not FIXED.
Comment 27 Robin Laing 2015-12-16 03:18:06 UTC
Bzi, comment 25

I use a USB keyboard and can Ctrl+Alt+Fx to the terminal.  I usually use F5 out of years of habit.  I hope your comment is a typo.  Try to Ctrl+Alt+Fx before the lockup and see if that works.

I have had some strange system pauses with media files from time to time and have no clue what is causing it.  In some cases it seems that the machine has died when it hasn't.

Do you have another computer to ssh into the "frozen" machine?  I have used this to troubleshoot issues and to get out of complicated issues.
Comment 28 Thomas Lübking 2015-12-16 08:13:20 UTC
The inability to switch to another VT means a problem in the kernel (at least framebuffer control)
Nobody said this is fixed and I assume you hit this on "your" account because of some config there crashes kwin itfp (auorae decoration?, there're plenty of bug reports on the QML V4 engine)

See comment #14 - unfortunately releasing the WM selection w/o calling into the xcb connection is a bit "tricky" ;-) (comment #15)
Comment 29 Bert Zimpel 2015-12-16 08:40:52 UTC
Hallo Robin,

thank you for your comment! 
>  I usually use F5 out of years of habit. I hope your comment is a typo. (In reply to Robin Laing from comment #27)
Fx represents F1 to F6, the available terminal screens at my system. Hopes that  helps to clarify the "typo".

Bert
Comment 30 Bert Zimpel 2015-12-16 09:47:22 UTC
(In reply to Thomas Lübking from comment #28)
> The inability to switch to another VT means a problem in the kernel (at
> least framebuffer control)

Hallo Thjomas, 

partially right, however the freeze and subsequently the inability to change to a terminal window does +not+ occur in Xfce (alternative Windows Manager). I had to change to Xfce as the KDE system tends to freeze always at the worst point (when files are "open") in a productive process.

Bert
Comment 31 Thomas Lübking 2015-12-16 12:29:09 UTC
I do not question that the KWin GL context "somehow™" triggers this condition - but it really sounds like a kernel bug (and is hard to debug for this)
You could simply avoid the ugly conditions by eg. switching to the XRender compositing backend (afair, xfce uses xrender exclusively for compositing)
Comment 32 Thomas Lübking 2015-12-16 12:30:06 UTC
PS, dev note: we could investigate in "stealing" the WM selection from the restarted instance if there's a crashcounter flag in the args.
Comment 33 Thomas Lübking 2015-12-16 15:18:02 UTC
In theory™, this should do:

diff --git a/main_x11.cpp b/main_x11.cpp
index 2c13743..1736f0d 100644
--- a/main_x11.cpp
+++ b/main_x11.cpp
@@ -173,7 +173,8 @@ void ApplicationX11::performStartup()
                                                                                                                  maskValues)));
         if (!redirectCheck.isNull()) {
             fputs(i18n("kwin: another window manager is running (try using --replace)\n").toLocal8Bit().constData(), stderr);
-            ::exit(1);
+            if (!wasCrash()) // if this is a crash-restart, DrKonqi may have stopped the process w/o killing the connection
+                ::exit(1);
         }
 
         createInput();
@@ -185,7 +186,7 @@ void ApplicationX11::performStartup()
     });
     // we need to do an XSync here, otherwise the QPA might crash us later on
     Xcb::sync();
-    owner->claim(m_replace, true);
+    owner->claim(m_replace || wasCrash(), true);
 
     createAtoms();
 }



But actually just stopping kwin ain't enough - and the behavior is still not deterministic.
So let's wait whether I somewhen get a stopped kwin + no resolution....
Comment 34 Robin Laing 2015-12-17 04:16:27 UTC
No, it was the "Strg-Alt-Fx" you wrote that I hope was a typo.  

Your freezing is similar to a situation that I run into every once in a while that is different.  My desktop will totally freeze but it is not related to this bug.

I have ssh'd into the machine but the X server won't refresh the screen.  I think it is related to the motherboard having integrated graphics.  Hasn't happened lately but I have had situations where the system seemed to move to a crawl.
Comment 35 Robin Laing 2015-12-17 04:23:47 UTC
Noticed this before but didn't report it.

If I type in 
    ps aux |grep kwin
response is 
   robin    14161  0.4  0.5 3471124 170420 ?      Sl   Dec14  16:41 /usr/bin/kwin_x11 --crashes 1

This is after a freeze but no reboot.

The other open session that didn't crash shows.
   test_acct   16890  0.0  0.2 3090448 79320 ?       Sl   Dec15   1:20 kwin_x11

As stated before.  The test_acct never seems to go into the freeze mode.

As I have said before, it is normally when a pop-up dialog is being called.  Example: Firefox attempting to send a pop-up to reload the previous session.

Another thing that I have noticed is the pop-up will appear in the top left corner of the screen after killing drkonqi.
Comment 36 Robin Laing 2016-01-02 19:49:12 UTC
Just an update.

After major problems with KDE after the latest KDE updates, I decided to wipe out all my old configuration files and start my account from scratch.  Since doing this, I have not had a single crash.

When time permits, I will play and see if I can narrow the cause down to a single file as I didn't delete all the files.

Hope to move to Fedora 23 tomorrow and hope everything is still working.
Comment 37 Thomas Lübking 2016-01-14 10:52:53 UTC
*** Bug 357536 has been marked as a duplicate of this bug. ***
Comment 38 Thomas Lübking 2016-01-14 23:18:55 UTC
Git commit 69aa80750f8d61a5db6311c33751461041a260d5 by Thomas Lübking.
Committed on 14/01/2016 at 22:40.
Pushed by luebking into branch 'master'.

force restart on crash

We don't want to actively release claims on segfaults, but then
drkonqi can stop us while we're still holding the WM privs.

=> If KWin performs a crash-restart, it forcefully takes WM privs
(since the old instance shall be replaced for quite sure)
Related: bug 348834, bug 353030

REVIEW: 126741
FIXED-IN: 5.6

M  +3    -2    main_x11.cpp

http://commits.kde.org/kwin/69aa80750f8d61a5db6311c33751461041a260d5
Comment 39 Thomas Lübking 2016-01-23 11:24:52 UTC
*** Bug 358416 has been marked as a duplicate of this bug. ***
Comment 40 Thomas Lübking 2016-02-01 21:02:44 UTC
*** Bug 358891 has been marked as a duplicate of this bug. ***
Comment 41 Thomas Lübking 2016-02-16 18:24:52 UTC
*** Bug 359473 has been marked as a duplicate of this bug. ***
Comment 42 Vincenzo Di Massa 2016-04-17 22:23:20 UTC
Created attachment 98434 [details]
Trace of the locked kwin

This trace is probably of little usefulness,  also because the lock happened literally 500KM away from my external screen.
Comment 43 Thomas Lübking 2016-04-18 07:47:28 UTC
The backtrace doesn't look like the kwin process had segfaulted at all - sure this is the SIGSTOP'd kwin process?
Comment 44 Vincenzo Di Massa 2016-04-18 12:18:48 UTC
Comment on attachment 98434 [details]
Trace of the locked kwin

Sorry I posted it on the wrong bug report.