Bug 361437 - startkde -> startplasmacompositor -> startkde = "blackscreen"
Summary: startkde -> startplasmacompositor -> startkde = "blackscreen"
Status: RESOLVED FIXED
Alias: None
Product: plasmashell
Classification: Plasma
Component: general (show other bugs)
Version: 5.7.0
Platform: Arch Linux Linux
: NOR crash
Target Milestone: 1.0
Assignee: David Edmundson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-06 01:51 UTC by James
Modified: 2016-07-11 00:41 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description James 2016-04-06 01:51:29 UTC
Arch Linux
plasma-workspace 5.6.1-1
systemd 229-3
AMD/ATI Redwood PRO Radeon HD 5570 with triple display
Intel Core2 Quad CPU    Q6600

I'm not sure where this report should go. I am submitting it here only because plasma-workspace owns startplasmacompositor. I seem to have found some kind of workaround.

Running startplasmacompositor just once will cause a previously working X11 KDE to fail repeatedly thereafter.  "Replacing" the socket /var/run/user/1000/bus seems to restore normal X11 KDE functionality.  This is a problem because anyone who just "dabbles" with Wayland KDE can make their X11 KDE unusable.  Rebooting the machine is an extreme but effective method of "replacing" the "bus" socket.

Problem:
1) run startx with startkde - it starts normally, with the splashscreen disappearing quickly.
2) exit
3) run startplasmacompositor - it starts normally, but the splashscreen hangs for about 25 seconds before fading.
4) exit
5) run startx with startkde - it starts with a splashscreen which hangs for about 25 seconds.
Then, fade to "blackscreen".
The mouse works, X applications can be run, but the desktop is unresponsive.

After trying things like deleting ~/.cache and ~/.kde4, which made no difference, I tried:
$ rm -R /var/run/user/1000/
which allowed startkde to produce a working desktop.

It make no difference to delete the files like
/var/run/user/1000/klauncherXM6740.1.slave-socket

And it makes no difference to kill the process like
/usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation

Upgrading from 5.6.0 to 5.6.1 makes no difference.

After some trial and error, I found it necessary to restart the process
/usr/lib/systemd/systemd --user

It is not enough to log out, which leaves this process running, and it is not effective to kill it and try to run it manually, which gives permission errors. So, it is necessary to find the process with something like "$ ps lU 1000", then "$ kill <pid>", then log out, then log back in. After all of that, startx with startkde will function again.

Rebooting the machine will have an equivalent effect, but that is an extreme remedy.

It seems odd to me that logging out will not automatically kill the "systemd --user" process, in particular, when there are no other logins on other virtual terminals, but I suppose that that is a systemd thing.

I notice that "systemd --user" creates the file "/var/run/user/1000/bus", so this may be a dbus issue.

But then, killling "systemd --user" also - finally - kills other "lurking" background user processes. Still, comparing the process list after exiting a working startkde session and after a startplasmacompositor session, the only change is a new process id for
/usr/bin/pulseaudio --daemonize=no
and
/usr/lib/pulse/gconf-helper
Killing these processes before startkde make no difference. So I would expect that this is a dbus related issue.


Reproducible: Always
Comment 1 James 2016-04-07 01:26:20 UTC
I've now upgraded to 5.6.2.
Something I've noticed:

Prior to the upgrade, and continuing after the upgrade, I have now gotten kde into some mode where, independent of startplasmacompositor, kactivitymanagerd always segfaults on kde exit, and plasmashell fails to start on startx with startkde.  But interestingly, there is a difference before and after running startplasmacompositor.

Before running startplasmacompositor, I can open a terminal and run plasmashell manually, and it seems to act normally.  After running startplasmacompositor, plasmashell still does not start automatically, but when I start it manually, it complains about a now missing kglobalaccel5, and, though it will run, it appears not to do anything, leaving the desktop black and unresponsive.  So there is some effect on plasmashell, before and after running startplasmacompositor.

Killling "systemd --user", then logging out and logging in, still restores the working plasmashell behaviour.
Comment 2 Martin Flöser 2016-04-07 10:28:11 UTC
before running startplasmacompositor please use
export $(dbus-launch)
Comment 3 James 2016-04-07 14:18:24 UTC
Ok, thanks.
After exiting startkde the first time, this being the context for startplasmacompositor, there is:
ps lU 1000
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4  1000 22450     1  20   0  34048  4624 ep_pol Ss   ?          0:00 /usr/lib/systemd/systemd --user
5  1000 22455 22450  20   0 105628  1984 -      S    ?          0:00 (sd-pam)
4  1000 22460 22443  20   0  16696  4668 wait   Ss   tty1       0:00 -bash
0  1000 22501 22450  20   0  32956  3968 ep_pol Ss   ?          0:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation
0  1000 22579 22450  20   0 478892 13260 poll_s Sl   ?          0:00 /usr/lib/telepathy/mission-control-5
0  1000 22621 22450  20   0 176596  4996 poll_s Sl   ?          0:00 /usr/lib/dconf/dconf-service
0  1000 22629 22450  20   0 747796 32172 poll_s Sl   ?          0:00 /usr/lib/gnome-online-accounts/goa-daemon
0  1000 22678 22450  20   0 270808  5528 poll_s Ssl  ?          0:00 /usr/lib/gvfs/gvfsd
0  1000 22684 22450  20   0 338908  7364 futex_ Sl   ?          0:00 /usr/lib/gvfs/gvfsd-fuse /run/user/1000/gvfs -f -o big_writes
0  1000 22689 22450  20   0 280960  6096 poll_s Sl   ?          0:00 /usr/lib/gnome-online-accounts/goa-identity-service
0  1000 22791 22450  20   0  60676  5588 poll_s S    ?          0:00 /usr/lib/GConf/gconfd-2
0  1000 23027 22450  20   0 339876  5812 poll_s Ssl  ?          0:00 /usr/lib/at-spi2-core/at-spi-bus-launcher
0  1000 23051 23027  20   0  32576  3132 ep_pol S    ?          0:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
1  1000 26669     1  20   0 165076   856 poll_s Ss   ?          0:00 gpg-agent --homedir /home/james/.gnupg --use-standard-socket --daemon
0  1000 27759 22450   9 -11 436200 11620 poll_s S<sl ?          0:00 /usr/bin/pulseaudio --daemonize=no
0  1000 27777 27759  20   0  81868  3460 poll_s S    ?          0:00 /usr/lib/pulse/gconf-helper
0  1000 27790 22460  20   0  32320  3152 -      R+   tty1       0:00 ps lU 1000

Then:
$ set -xv
$ export $(dbus-launch)
export $(dbus-launch)
dbus-launch
++ dbus-launch
+ export DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-Zc69OZLGUs,guid=fdba438e0e61f1b8b0061bc257065e65 DBUS_SESSION_BUS_PID=27829
+ DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-Zc69OZLGUs,guid=fdba438e0e61f1b8b0061bc257065e65
+ DBUS_SESSION_BUS_PID=27829
$ set -

So now, there are three dbus-daemon's running:
22501 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation
23051 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
27829 /usr/bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session

Then, startplasmacompositor.  Here, with kwin_wayland, I can start plasmashell manually, and it gives a working desktop, on one "desktop" of four, on one display, which is I think normal for now.

After exiting from startplasmacompositor, which is the context for the second startkde, there is:
$ ps lU 1000
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4  1000 22450     1  20   0  34048  4624 ep_pol Ss   ?          0:00 /usr/lib/systemd/systemd --user
5  1000 22455 22450  20   0 105628  1984 -      S    ?          0:00 (sd-pam)
4  1000 22460 22443  20   0  16696  4672 wait   Ss   tty1       0:00 -bash
0  1000 22501 22450  20   0  32956  3968 ep_pol Ss   ?          0:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation
0  1000 22579 22450  20   0 478892 13260 poll_s Sl   ?          0:00 /usr/lib/telepathy/mission-control-5
0  1000 22621 22450  20   0 176596  4996 poll_s Sl   ?          0:00 /usr/lib/dconf/dconf-service
0  1000 22629 22450  20   0 747796 32172 poll_s Sl   ?          0:00 /usr/lib/gnome-online-accounts/goa-daemon
0  1000 22678 22450  20   0 270808  5528 poll_s Ssl  ?          0:00 /usr/lib/gvfs/gvfsd
0  1000 22684 22450  20   0 404444  7364 futex_ Sl   ?          0:00 /usr/lib/gvfs/gvfsd-fuse /run/user/1000/gvfs -f -o big_writes
0  1000 22689 22450  20   0 280960  6096 poll_s Sl   ?          0:00 /usr/lib/gnome-online-accounts/goa-identity-service
0  1000 22791 22450  20   0  60676  5588 poll_s S    ?          0:00 /usr/lib/GConf/gconfd-2
0  1000 23027 22450  20   0 339876  5812 poll_s Ssl  ?          0:00 /usr/lib/at-spi2-core/at-spi-bus-launcher
0  1000 23051 23027  20   0  32576  3132 ep_pol S    ?          0:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
1  1000 26669     1  20   0 165076   856 poll_s Ss   ?          0:00 gpg-agent --homedir /home/james/.gnupg --use-standard-socket --daemon
0  1000 27759 22450   9 -11 436200 11684 poll_s S<sl ?          0:00 /usr/bin/pulseaudio --daemonize=no
0  1000 27777 27759  20   0  81868  3460 poll_s S    ?          0:00 /usr/lib/pulse/gconf-helper
1  1000 27829     1  20   0  32836  2724 ep_pol Ss   ?          0:00 /usr/bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
0  1000 27929     1  20   0 478896 13440 poll_s Sl   ?          0:00 /usr/lib/telepathy/mission-control-5
0  1000 27938     1  20   0 747796 32272 poll_s Sl   ?          0:00 /usr/lib/gnome-online-accounts/goa-daemon
0  1000 27944     1  20   0 270808  5948 poll_s Sl   ?          0:00 /usr/lib/gvfs/gvfsd
0  1000 27957     1  20   0 280960  6276 poll_s Sl   ?          0:00 /usr/lib/gnome-online-accounts/goa-identity-service
0  1000 28089 22460  20   0  34440  3068 -      R+   tty1       0:00 ps lU 1000

Then, startx with startkde, and run plasmashell manually from konsole:
$ plasmashell
Couldn't start kglobalaccel from org.kde.kglobalaccel.service: QDBusError("org.freedesktop.DBus.Error.Spawn.ChildExited", "Process org.kde.kglobalaccel exited with status 1")
Failed to connect to the kglobalaccel daemon QDBusError("org.freedesktop.DBus.Error.Spawn.ChildExited", "Process org.kde.kglobalaccel exited with status 1")
Failed to connect to the kglobalaccel daemon QDBusError("org.freedesktop.DBus.Error.Spawn.ChildExited", "Process org.kde.kglobalaccel exited with status 1")
Failed to connect to the kglobalaccel daemon QDBusError("org.freedesktop.DBus.Error.Spawn.ChildExited", "Process org.kde.kglobalaccel exited with status 1")
kscreen: starting external backend launcher for ""
kscreen: Failed to request backend: "org.freedesktop.DBus.Error.Spawn.ChildExited" : "Process org.kde.KScreen exited with status 1"
Error found while setting up ShellCorona's KScreen:  "Failed to prepare backend"

Desktop is "black" and unresponsive, even though plasmashell is running.  So then try:

^C
$ kglobalaccel5 
And, in another terminal:
$ plasmashell
kscreen: starting external backend launcher for ""
kscreen: Failed to request backend: "org.freedesktop.DBus.Error.Spawn.ChildExited" : "Process org.kde.KScreen exited with status 1"
Error found while setting up ShellCorona's KScreen:  "Failed to prepare backend"

Desktop is still "black" and unresponsive while plasmashell is running.

So, "export $(dbus-launch)" before startplasmacompositor has made no difference.

BTW, I see that /usr/bin/startkde has:
XDG_DATA_DIRS="/usr/share:/usr/share:/usr/local/share"

which looks like a typo, especially where the comments says:
# Make sure that the KDE prefix is first in XDG_DATA_DIRS
Comment 4 James 2016-04-12 19:56:57 UTC
Ok - finally - yes "export $(dbus-launch)" - BUT - apparently there is a running:
 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation

that comes up with "startkde" that plasmashell cannot use.  So, instead, first kill the running dbus-daemon, then run:
 export $(dbus-launch)

and then run plasmashell, and it works again.

There is some odd relationship to startplasmacompositor, in that running plasmacompositor causes the subsequent startkde to start a dbus-daemon that plasmashell cannot use, with "connection refused" messages in the log.

After starting and then exiting from startplasmacompositor, there are:
$ ps waxl|grep dbus-daemon
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4    81   490     1  20   0  33520  2792 -      Ss   ?          0:04 /usr/bin/dbus-daemon --system --address=systemd: --nofork
     --nopidfile --systemd-activation
0  1000 32450 32420  20   0  32916  3784 ep_pol Ss   ?          0:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork
   --nopidfile --systemd-activation

$ ps waxl|grep 32420
4  1000 32420     1  20   0  33956  4520 ep_pol Ss   ?          0:00 /usr/lib/systemd/systemd --user
5  1000 32422 32420  20   0 105164  1772 -      S    ?          0:00 (sd-pam)
0  1000 32450 32420  20   0  32916  3784 ep_pol Ss   ?          0:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork
   --nopidfile --systemd-activation
0  1000 32520 32420  20   0 478900 15480 poll_s Sl   ?          0:00 /usr/lib/telepathy/mission-control-5
0  1000 32529 32420  20   0 748468 32348 poll_s Sl   ?          0:00 /usr/lib/gnome-online-accounts/goa-daemon
0  1000 32536 32420  20   0 270808  5572 poll_s Ssl  ?          0:00 /usr/lib/gvfs/gvfsd
0  1000 32541 32420  20   0 338908  7328 futex_ Sl   ?          0:00 /usr/lib/gvfs/gvfsd-fuse /run/user/1000/gvfs -f -o big_writes
0  1000 32550 32420  20   0 280960  8292 poll_s Sl   ?          0:00 /usr/lib/gnome-online-accounts/goa-identity-service
0  1000 32633 32420   9 -11 446820 12624 poll_s S<sl ?          0:04 /usr/bin/pulseaudio --daemonize=no
0  1000 32637 32420  20   0  60676  5132 poll_s S    ?          0:00 /usr/lib/GConf/gconfd-2

and then, because I've discovered that this second dbus-daemon is causing problems, suppose:
$ kill 32450
$ ps lU 1000
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
0  1000   325 32430  20   0  34440  2980 -      R+   tty1       0:00 ps lU 1000
4  1000 32420     1  20   0  34048  4532 ep_pol Ss   ?          0:00 /usr/lib/systemd/systemd --user
5  1000 32422 32420  20   0 105164  1772 -      S    ?          0:00 (sd-pam)
4  1000 32430 32413  20   0  16696  4732 wait   Ss   tty1       0:00 -bash
0  1000 32633 32420   9 -11 446820 12624 poll_s S<sl ?          0:05 /usr/bin/pulseaudio --daemonize=no
0  1000 32635 32633  20   0  82000  5692 poll_s S    ?          0:00 /usr/lib/pulse/gconf-helper

Then after startkde, there is:
$ ps waxl|grep dbus-daemon
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
0  1000   374 32420  20   0  32836  3672 ep_pol Ss   ?          0:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation
4    81   490     1  20   0  33520  2792 -      Ss   ?          0:04 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
...

So now, two dbus-daemon processes have been started, one by the PID 1 "systemd --system", and one by my login "systemd --user".

Plasmashell will not run properly.  Perhaps it tries to use the "wrong" dbus-daemon?

I can kill "my" PID 374 dbus-daemon, and plasmashell still does not run properly.

So then, "export dbus-launch --exit-with-session", which has
+ export DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-kW9FbJrDeq,guid=c39cc3939fe85074db1f6c09570d400c DBUS_SESSION_BUS_PID=1195
DBUS_SESSION_BUS_WINDOWID=10485761
+ DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-kW9FbJrDeq,guid=c39cc3939fe85074db1f6c09570d400c
+ DBUS_SESSION_BUS_PID=1195
+ DBUS_SESSION_BUS_WINDOWID=10485761

and there is a new dbus-daemon session:
1  1000  1195     1  20   0  32576   324 ep_pol Ss   ?          0:00 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session

And plasmashell now runs normally.

Does that make any sense, why plasmashell will not run with the first dbus-daemon, started with "startkde"?

And then, what is the effect being caused by previously running startplasmacompositor?

If I start over, after killing all user processes, then log out and log in, what is the dbus-daemon state when startkde comes up before startplasmacompositor?

After login:
$ ps lU 1000
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4  1000  1991     1  20   0  33956  4476 ep_pol Ss   ?          0:00 /usr/lib/systemd/systemd --user
5  1000  1995  1991  20   0 105496  2016 -      S    ?          0:00 (sd-pam)
4  1000  2000  1983  20   0  16696  4892 wait   Ss   tty1       0:00 -bash
0  1000  2007  2000  20   0  34440  3148 -      R+   tty1       0:00 ps lU 1000

After startkde
$ ps waxl|grep dbus-daemon
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4    81   490     1  20   0  33520  2792 -      Ss   ?          0:04 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
0  1000  2044  1991  20   0  32836  3812 ep_pol Ss   ?          0:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation
0  1000  2298  2259  20   0  10760  2168 pipe_w S+   pts/2      0:00 grep dbus-daemon

Comparing the form of the commands, the "not working with plasmashell" PID 374, and the "works with plasmashell" PID 2044, they are exactly the same.  But the first cannot be used by plasmashell, and the second can.

When plasmashell fails, in the log there are things like:
Apr 12 10:39:42 topaz kglobalaccel5[31802]: Failed to create display (Connection refused)
Apr 12 10:39:43 topaz kscreen_backend_launcher[31807]: Failed to create display (Connection refused)

and "strace plasmashell" has:
write(2, "kscreen: Failed to request backe"..., 132kscreen: Failed to request backend: "org.freedesktop.DBus.Error.Spawn.ChildExited" : "Process org.kde.KScreen exited with status 1"

The log message is not very useful, because it fails to state "what connection" is being refused.  That is a bug on its own.  But is the connection being referenced to the dbus-daemon?  And then, why or how would the connection be refused?
Comment 5 James 2016-04-22 19:39:50 UTC
Now plasma-workspace 5.6.3-1

Still strange.  The environment does not change, before and after running startplasmacompositor.
systemd does not clear the files in /tmp/, just logging out and in again, so /tmp/ files do not seem a likely cause.

I fixed my problem with plasmshell not starting automatically.  That was caused by a fragile and obfuscated startkde, not setting XDG environment variables, when XDG_CONFIG_DIRS has already been set in .bash_profile to some incompatible value.  startkde has:

# TODO: Use GenericConfigLocation once we depend on Qt 5.4
scriptpath=`qtpaths-qt5 --paths ConfigLocation | tr ':' '\n' | sed 's,$,/plasma-workspace,g'`

which can, apparently, fail when XDG_CONFIG_DIRS has already been set.  But that is a separate issue.

Any thoughts about how to make plasmashell produce some actually usable debugging output, describing why it has "hung", and not started the other kde background processes?
Comment 6 James 2016-07-11 00:41:32 UTC
Ok, whatever was causing this issue, I just tested again with Arch plasma-workspace 5.7.0-1, and the issue seems to be no more.  I suppose that we can call this "fixed" and let it go at that.