Bug 372963 - plasmashell does not handle changes of the primary screen
Summary: plasmashell does not handle changes of the primary screen
Status: RESOLVED FIXED
Alias: None
Product: plasmashell
Classification: Plasma
Component: generic-multiscreen (show other bugs)
Version: 5.8.4
Platform: Other Linux
: NOR grave
Target Milestone: 1.0
Assignee: Aleix Pol
URL:
Keywords:
: 373098 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-11-26 17:31 UTC by Jan-Matthias Braun
Modified: 2016-12-15 13:51 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Remove a bizzare include. (355 bytes, patch)
2016-11-27 20:36 UTC, Jan-Matthias Braun
Details
Reimplement desktopForScreen without using m_ScreenPool, now providing correct results and fixing the primary output switching problem. (1015 bytes, patch)
2016-11-27 20:37 UTC, Jan-Matthias Braun
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan-Matthias Braun 2016-11-26 17:31:56 UTC
Plasmashell does not cope with changes of the primary screen.

1. First of all, it ignores primary screen changes at runtime when two screens are connected.
    Scenario: I have an external screen (DP-0) connected to my Laptop (LVDS-0)
    LVDS-0 is primary. When I do
        xrandr --output DP-0 --primary
    the shell flickers (goes black and redisplays on the same screen) but still
    all widgets stay on LVDS-0. When reverting with
        xrandr --output LVDS-0 --primary
    it doesn't even flicker.

    As mentioned in other bugs, a restart of plasmashell displays at the right
    display, i.e., DP-0. Now the behaviour is identical, the shell flickers
    when switching to LVDS-0.

    This behaviour is rock-stable. I can repeat it as often as I want.

2. When I configure the DP-0 to be the primary screen and restart plasmashell
    I do have the desired configuration. But when I now disconnecting DP-0,
    plasmashell may crash (cmp #340904). The backtrace is appended below.
    .xsession-error shows:
        kscreen.kded: Change detected
        kscreen.kded: KScreen::Output( 647   "LVDS-0" connected enabled QPoint(0,0) QSize(1600, 900) "648" )
        KCrash: Attempting to start /usr/bin/plasmashell from kdeinit
        KCrash: Application 'plasmashell' crashing...
    Of course, after the restart everything is back up on the laptop panel.
    Mostly, but sometimes plasma has a hickup and stops working. This is
    especially tragic when you rely on suspend if the lid is closed,
    as all this behaviour is only working when the gui is not crashed.

    In the aftermath, when now replugging the DP-0, we see the behaviour from
    1.: plasmashell on LVDS-0 flickers, but although the primary is correctly
    communicated, the widgets do not move. The logs from .xsession-errors:
        Old primary output: QScreen(0x2196ba0, name="LVDS-0") New primary output: QScreen(0x32d6130, name="DP-0")
    show that plasmashell does recognize the new situation.


    The funny thing is, when plasmashell is restarted after the crash, 
    all widgets stays on the wrong screen.

Now the backtrace:
Thread 1 "plasmashell" received signal SIGSEGV, Segmentation fault.
0x00007f6f30310574 in PlasmaQuick::ContainmentView::containment() const () from /usr/lib64/libKF5PlasmaQuick.so.5
(gdb) bt
#0  0x00007f6f30310574 in PlasmaQuick::ContainmentView::containment() const () from /usr/lib64/libKF5PlasmaQuick.so.5
#1  0x0000000000440321 in ShellCorona::screenForContainment(Plasma::Containment const*) const ()
#2  0x0000000000440276 in ShellCorona::screenForContainment(Plasma::Containment const*) const ()
#3  0x00007f6e8ae637ee in NotificationsApplet::onScreenChanges() ()
   from /usr/lib64/qt5/plugins/plasma/applets/plasma_applet_notifications.so
#4  0x00007f6f2e1d9d1c in QtPrivate::QSlotObjectBase::call (a=0x7ffdf02a3980, r=0x1d11310, this=<optimized out>)
    at ../../include/QtCore/../../src/corelib/kernel/qobject_impl.h:130
#5  QMetaObject::activate (sender=0x7f6f2f325e20 <(anonymous namespace)::Q_QGS_g_kwmInstanceContainer::innerFunction()::holder>,
    signalOffset=<optimized out>, local_signal_index=<optimized out>, argv=<optimized out>) at kernel/qobject.cpp:3723
#6  0x00007f6f28378118 in NETEventFilter::nativeEventFilter(QByteArray const&, void*, long*) ()
   from /usr/lib64/qt5/plugins/kf5/org.kde.kwindowsystem.platforms/KF5WindowSystemX11Plugin.so
#7  0x00007f6f2e1ac22f in QAbstractEventDispatcher::filterNativeEvent (this=<optimized out>, eventType=...,
    message=message@entry=0x7f6f24003300, result=result@entry=0x7ffdf02a3b48) at kernel/qabstracteventdispatcher.cpp:466
#8  0x00007f6f296c8a04 in QXcbConnection::handleXcbEvent (this=this@entry=0x7d0780, event=event@entry=0x7f6f24003300)
    at qxcbconnection.cpp:1103
#9  0x00007f6f296cc3ee in QXcbConnection::processXcbEvents (this=0x7d0780) at qxcbconnection.cpp:1735
#10 0x00007f6f2e1da821 in QObject::event (this=0x7d0780, e=<optimized out>) at kernel/qobject.cpp:1263
#11 0x00007f6f2edaf32c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /usr/lib64/libQt5Widgets.so.5
#12 0x00007f6f2edb4739 in QApplication::notify(QObject*, QEvent*) () from /usr/lib64/libQt5Widgets.so.5
#13 0x00007f6f2e1aef18 in QCoreApplication::notifyInternal2 (receiver=0x7d0780, event=event@entry=0x7f6f24003350)
    at kernel/qcoreapplication.cpp:988
#14 0x00007f6f2e1b15ad in QCoreApplication::sendEvent (event=0x7f6f24003350, receiver=<optimized out>)
    at kernel/qcoreapplication.h:231
#15 QCoreApplicationPrivate::sendPostedEvents (receiver=receiver@entry=0x0, event_type=event_type@entry=0, data=0x7b9860)
    at kernel/qcoreapplication.cpp:1649
#16 0x00007f6f2e1b1a28 in QCoreApplication::sendPostedEvents (receiver=receiver@entry=0x0, event_type=event_type@entry=0)
    at kernel/qcoreapplication.cpp:1503
#17 0x00007f6f2e200ae3 in postEventSourceDispatch (s=0x806c30) at kernel/qeventdispatcher_glib.cpp:276
#18 0x00007f6f2c3d5917 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#19 0x00007f6f2c4502d8 in g_main_context_iterate.isra.42.lto_priv () from /usr/lib64/libglib-2.0.so.0
#20 0x00007f6f2c3d7a3c in g_main_context_iteration () from /usr/lib64/libglib-2.0.so.0
#21 0x00007f6f2e200eef in QEventDispatcherGlib::processEvents (this=0x804500, flags=...) at kernel/qeventdispatcher_glib.cpp:423
#22 0x00007f6f2e1ad04a in QEventLoop::exec (this=this@entry=0x7ffdf02a4140, flags=..., flags@entry=...) at kernel/qeventloop.cpp:210
#23 0x00007f6f2e1b548d in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1261
#24 0x000000000041c0ea in main ()
Comment 1 Jan-Matthias Braun 2016-11-26 17:34:05 UTC
Stupid me: please ignore this fragment. It survived some editing but should
have been deleted.

>     The funny thing is, when plasmashell is restarted after the crash, 
>     all widgets stays on the wrong screen.
Comment 2 Jan-Matthias Braun 2016-11-26 17:45:22 UTC
When disconnecting DP-0, in my debug-build I do see
    ASSERT: "m_desktopViewforId.value(idx) == desktopView" in file plasma-workspace-5.8.4/shell/shellcorona.cpp

situated in ShellCorona::removeDesktop. This method is called after primaryOutputChanged() returned.

When reconnecting DP-0, I do not gain any new information. primaryOutputChanged() is called and returns cleanly.

So my guess is: primaryOutputChanged() does not work as intended. But the crash on the disconnect actually leads to a restart which leads to the intended result.
Comment 3 Jan-Matthias Braun 2016-11-26 17:46:59 UTC
Side-Note: I am using qt 5.7, gcc 6.2, frameworks 5.28, plasma 5.8.4, and applications 16.08.3.
Comment 4 Jan-Matthias Braun 2016-11-27 11:11:28 UTC
Hi!

I will try to dig through this, slowly... Right now I am trying to understand what happens in removeDesktop to trigger the Q_ASSERT in hopes that I find a good indication of what goes against expectations when the primary display switches away from the disconnected screen.

So. In my case, when removing DP-0, removeDesktop is called with
    DesktopView: 0x19e2fd0
    desktopView->screenToFollow()->name() is LVDS-0
[Comment: This seems reasonable, as the view on LVDS-0 is going to be replaced by the one from DP-0. On the other hand, removeDesktop was called by screenRemoved, where I would have expected the QScreen for DP-0 to be removed. Thus I am assuming now a mixup in the screen numbering, when the logical screen order is modified in terms of the primary screen. But as I have no clue of the underlying APIs, what they should do etc. pp., this is just a wild guess.]

In the end, this situation leads to (after mapping via the ScreenPool)
    idx = 0

Now, I cannot say much on panels, as I don't use one for like 10 years. Thus I will ignore the next loop.

Of course, the desktop view associated to idx via m_desktopViewforId now doesn't match the method parameter. Still, I don't what should happen in this case, e.g., should the primary view be transferred from the disconnected screen to the still available, or should the first view reconfigure itself to take on the contents of the other screen. Thus, I don't know if the assert is wrong. But as stated above, my intuition is, that the screen numbering in the internal representations mixes up.

Hope this helps,

Jan
Comment 5 Jan-Matthias Braun 2016-11-27 19:38:20 UTC
There aren't many (0!) comments indicating what should happen.
But from looking at other parts of the code,
    QScreen *oldPrimary = m_desktopViewforId.value(0)->screen();

the primary DesktopView always has id 0.

Now, primaryOutputChanged is called before removeDesktop. As far as I understand it, the screen for the primary desktopView is switched and thus the assert does
not hold anymore, when removeDesktop is called now.
Comment 6 Jan-Matthias Braun 2016-11-27 20:35:44 UTC
Okay. Looks like the m_screenPool-Numbering is inconsistend with the m_desktopViewforId-Numbering. THerefore, in primaryOutputChanged, the id for the old as well as the new primary DesktopView is == 0 and no swap takes place!

In general, desktopForScreen does not return the correct DesktopView. This should be the reason, why the assert fails. When patching desktopForScreen, the Q_ASSERT does not explode.

When using the patched desktopForScreen insted of the m_screenPool in primaryOutputChanged, then it does not explode, too, and the primary desktop is at the right place.

I will attach the patch now.
Comment 7 Jan-Matthias Braun 2016-11-27 20:36:19 UTC
Created attachment 102480 [details]
Remove a bizzare include.

This patch removes a ktexteditor include in shellcorona.cpp.
Comment 8 Jan-Matthias Braun 2016-11-27 20:37:11 UTC
Created attachment 102481 [details]
Reimplement desktopForScreen without using m_ScreenPool, now providing correct results and fixing the primary output switching problem.
Comment 9 Jan-Matthias Braun 2016-11-27 20:43:15 UTC
Of course, for the final patches I forgot to go to the root folder. They have to be applied directly in the shell folder...
Comment 10 Jan-Matthias Braun 2016-11-28 17:32:20 UTC
Unluckily, this does not solve problems with another Testcase:
Config 1: VGA-0 + DP-4 (primary)
Config 2: LVDS-0 (primary)

Here things are still going south and plasma falls out.
No matter if I remove the laptop in a running state from the docking station or in sleep mode, chances are high, that plasmashell doesn't even register all screen changes and is in a somewhat bizarre state.
For example in Config 2, only the part of the desktop view is shown, where DP-4 overlaps with the coordinate system of LVDS-0.
Comment 11 Jakub Gocoł 2016-11-30 19:11:23 UTC
I think this patch https://phabricator.kde.org/D3519 will fix your problem. I hope it will be merged soon.
Comment 12 Jan-Matthias Braun 2016-11-30 20:58:07 UTC
(In reply to Jakub Gocoł from comment #11)
> I think this patch https://phabricator.kde.org/D3519 will fix your problem.
> I hope it will be merged soon.

Hi! I found your patch and am already testing with it. Unluckily, testing involves not only plasma crashes but sometimes freezes when suspending & switching monitors quite often. This means, I won't have useful results before next week.
Especially in the case where the primary screen is on an external monitor and LVDS-0 (the internal display) seems to crash in any combination of all our patches. :-( The switch-displays-logic seems to go boinc in this case. But this case is really annoying to reproduce.
For the simple case with one additional external display I do think so, too. :-)
Comment 13 Jakub Gocoł 2016-11-30 21:37:26 UTC
Try to reproduce and then describe simple scenarios in as few steps as possible. Because you described so many scenarios, I don't understand what exactly works and what doesn't work. 

1. In first message, in point 1, you described scenario with change of primary screen. Did my patch resolved that issue for you?
2. Can you provide exact steps to crash plasma? I don't understand what exactly do you do.
Comment 14 Marco Martin 2016-12-12 11:49:48 UTC
*** Bug 373098 has been marked as a duplicate of this bug. ***
Comment 15 Jan-Matthias Braun 2016-12-12 21:27:41 UTC
(In reply to Jakub Gocoł from comment #11)
> I think this patch https://phabricator.kde.org/D3519 will fix your problem.
> I hope it will be merged soon.

Hi!

I was unable to reproduce this crash anymore. It looks like your patch fixes the problem. I don't know what happend with the last crash I had seen.

Thanks!
Comment 16 Marco Martin 2016-12-15 13:28:49 UTC
(In reply to Jakub Gocoł from comment #11)
> I think this patch https://phabricator.kde.org/D3519 will fix your problem.
> I hope it will be merged soon.

any news on this? it's marked as accepted since a while but still not submitted, can you push it?
Comment 17 Marco Martin 2016-12-15 13:41:01 UTC
(In reply to Jan-Matthias Braun from comment #8)
> Created attachment 102481 [details]
> Reimplement desktopForScreen without using m_ScreenPool, now providing
> correct results and fixing the primary output switching problem.

can you push it?
the patch makes sense, i can push it if needed
Comment 18 David Edmundson 2016-12-15 13:42:31 UTC
Git commit bdfa0f3fb726f4c639d2ea86d829c25c89779025 by David Edmundson, on behalf of Jakub Gocoł.
Committed on 15/12/2016 at 13:42.
Pushed by davidedmundson into branch 'Plasma/5.8'.

Move updating of primary screen in screenpool after fetching its id

Summary: We need old id of new primary screen. After we update primary screen in screenpool, it will always return id = 0. It causes invalid m_desktopViewforId mapping and  panel doesn't move next time when we change primary screen.

Test Plan:
Preconditions:
Computer with 1 display, running plasmashell

Test steps:
1. Connect one external screen (first screen is primary)
2. Change primary screen to second screen
3. Change primary screen back to first screen
4.  Unplug second screen

Expected:
In step 3 panel moves to first screen
In step 4 plasmashell keeps running

Actual (before change):
In step 3 panel remains on second display
In step 4 plasmashell crashes

Reviewers: #plasma, davidedmundson

Reviewed By: #plasma, davidedmundson

Subscribers: plasma-devel

Tags: #plasma

Differential Revision: https://phabricator.kde.org/D3519

M  +1    -2    shell/shellcorona.cpp

https://commits.kde.org/plasma-workspace/bdfa0f3fb726f4c639d2ea86d829c25c89779025
Comment 19 Jan-Matthias Braun 2016-12-15 13:51:16 UTC
(In reply to Marco Martin from comment #17)
> (In reply to Jan-Matthias Braun from comment #8)
> > Created attachment 102481 [details]
> > Reimplement desktopForScreen without using m_ScreenPool, now providing
> > correct results and fixing the primary output switching problem.
> 
> can you push it?
> the patch makes sense, i can push it if needed

As I was fiddling in unknown code, of which I didn't (and still don't) know exactly know everywhere how it should behave, I am unsure about the use of my patch. I do have the impression, that the crash which occurs when the new primary display was not in use before, e.g., when two external displays are disconnected and the laptop display is reactivated to become the primary display, was connected to my try on a fix.

This could be possible, because in this case the used hash shouldn't provide a reference to the new primary screen.

But I did not test this hypothesis, because the other patch on phabricater seems to fix the issue better. I think, I saw this crash without my patch too., but ... this whole multi-display thingy is quite unstable in my experience, still, and therefore ugly to test & debug.

You see, I am somewhat guarded against using my patch. The other patch does good, so lets go with that. Probably that guy knew better. :-)