Bug 439873 - Switching users isn't working on Wayland
Summary: Switching users isn't working on Wayland
Status: RESOLVED FIXED
Alias: None
Product: kwin
Classification: Plasma
Component: wayland-generic (show other bugs)
Version: 5.22.4
Platform: Neon Linux
: HI normal (vote)
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords: wayland
Depends on:
Blocks:
 
Reported: 2021-07-15 08:52 UTC by techxgames
Modified: 2021-09-28 18:07 UTC (History)
8 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.23


Attachments
Backtrace (git master) (11.18 KB, text/plain)
2021-09-26 14:17 UTC, Ash Blake
Details
Screen recording showing the problem (1.56 MB, video/mp4)
2021-09-26 14:19 UTC, Ash Blake
Details
Another backtrace, crash in DrmPipeline::populateAtomicValues (14.32 KB, text/plain)
2021-09-27 00:40 UTC, Ash Blake
Details
Debugging session with both good and bad VT switches (27.12 KB, text/markdown)
2021-09-27 13:50 UTC, Ash Blake
Details
KWin DRM log messages (18.96 KB, text/x-log)
2021-09-27 16:56 UTC, Ash Blake
Details
KWin DRM log from another machine (AMD GPU) (9.74 KB, text/x-log)
2021-09-27 17:03 UTC, Ash Blake
Details
KWin DRM log messages from full Plasma session (6.61 KB, text/plain)
2021-09-27 20:32 UTC, Ash Blake
Details

Note You need to log in before you can comment on or make changes to this bug.
Description techxgames 2021-07-15 08:52:55 UTC
SUMMARY
Switching users causes the screen to go black. Had to do a hard reboot.

STEPS TO REPRODUCE
1. Open Application Launcher
2. Select Switch User

OBSERVED RESULT
Screen goes black and freezes.

EXPECTED RESULT
Should be able to switch users normally

SOFTWARE/OS VERSIONS
Operating System: KDE neon Unstable Edition
KDE Plasma Version: 5.22.80
KDE Frameworks Version: 5.85.0
Qt Version: 5.15.3
Kernel Version: 5.8.0-59-generic (64-bit)
Graphics Platform: Wayland
Processors: 4 × Intel® Core™ i7-4510U CPU @ 2.00GHz
Memory: 15.5 GiB of RAM
Graphics Processor: Mesa DRI Intel® HD Graphics 4400

ADDITIONAL INFORMATION
Not sure if being Wayland has anything to do with it.
Comment 1 Nate Graham 2021-08-03 18:04:37 UTC
I *think* this may be a known issue on Wayland. Does work if you switch users from X11?
Comment 2 Nate Graham 2021-08-03 18:20:33 UTC
Yeah, known issue. We can use this to track it. Added to https://community.kde.org/Plasma/Wayland_Showstoppers
Comment 3 David Edmundson 2021-09-20 16:16:31 UTC
>Does work if you switch users from X11?

Marking as needs info.

Also, lets avoid tagging wayland regressions as VHI, it's not our default setup.
Comment 4 Nate Graham 2021-09-20 16:23:04 UTC
(In reply to David Edmundson from comment #3)
> >Does work if you switch users from X11?
> 
> Marking as needs info.
Yeah, user switching works for me on X11.

On Wayland, I get to the login screen, but logging into the other user fails; after I enter the other user's password and click the login button I get kicked back to the lock screen of my existing session. When I unlock, no apps are running, as if KWin crashed in the background. However I do not see any coredumpctl logs about a KWin crash. I do have a ksmserver crash though. The only relevant part of the backtrace is this:

#8  0x000000000040a621 in main (argc=<optimized out>, argv=0x7ffdc1355bb8)
    at /home/nate/kde/src/plasma-workspace/ksmserver/main.cpp:214


> Also, lets avoid tagging wayland regressions as VHI, it's not our default
> setup.
OK.
Comment 5 Nate Graham 2021-09-20 17:37:36 UTC
In fact ksmserve doesn't even want to launch at all, even manually:

~/kde/usr/bin/ksmserver
org.kde.kf5.ksmserver: Cannot connect to the X server
qt.qpa.xcb: could not connect to display :1
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, xcb.

Aborted (core dumped)
Comment 6 Nate Graham 2021-09-20 17:39:08 UTC
Actually maybe it logged into the other user after all, as `reboot` tells me anothe ruser is logged in:

$ reboot
User konqi is logged in on tty4.
Please retry operation after closing inhibitors and logging out other users.
Alternatively, ignore inhibitors and users with 'systemctl reboot -i'.

Maybe the bug is that it didn't succeed in logging *me* into that user?
Comment 7 Ash Blake 2021-09-26 14:08:33 UTC
(In reply to Nate Graham from comment #4)
> On Wayland, I get to the login screen, but logging into the other user
> fails; after I enter the other user's password and click the login button I
> get kicked back to the lock screen of my existing session. 

This works for me, the new user's session starts every time.
However, either the old user's session or the new user's session will crash when switching.
I have kwin_wayland coredumps from both users and I'll upload the backtraces soon.

(In reply to Nate Graham from comment #6)
> Maybe the bug is that it didn't succeed in logging *me* into that user?
(In reply to Nate Graham from comment #5) 
> ~/kde/usr/bin/ksmserver

It probably failed to log in as the other user because you tried to run the same development KDE session, and the other user wouldn't be able to read and execute anything in your home directory. startplasma-wayland won't run, so the new user will only be running the systemd user daemon and whatever stuff it started.

Give konqi execute access to your home directory so he can cd into it, and make him the group of $HOME/kde so he can read and execute things in there:
$ setfacl -m u:konqi:x $HOME
$ chown -R $USER:konqi $HOME/kde
Comment 8 Ash Blake 2021-09-26 14:17:17 UTC
Created attachment 141922 [details]
Backtrace (git master)

The old user's and new user's backtraces look the same.
Comment 9 Ash Blake 2021-09-26 14:19:15 UTC
Created attachment 141923 [details]
Screen recording showing the problem
Comment 10 Bug Janitor Service 2021-09-26 22:43:19 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/plasma-workspace/-/merge_requests/1082
Comment 11 Ash Blake 2021-09-27 00:40:04 UTC
Created attachment 141939 [details]
Another backtrace, crash in DrmPipeline::populateAtomicValues

Another crash on VT switch. obj pointed to an unreadable location.
Comment 12 Ash Blake 2021-09-27 00:59:02 UTC
(In reply to Ash Blake from comment #11)
> Created attachment 141939 [details]


The contents of m_gpu look odd - cursor size of 262147x458757, insanely high file descriptor and some suspicious looking addresses.
It seems like DrmGpu got destroyed, but it still got used somehow.



(gdb) p *m_gpu
$11 = {
  ...
  m_backend = 0x7000700060007,
  m_eglBackend = {
    wp = {
      d = 0x7000600070005,
      value = 0x6000700020007
    }
  },
  m_devNode = {
    d = 0x3000100020003
  },
  m_cursorSize = {
    wd = 262147,
    ht = 458757
  },
  m_fd = 262145,
  m_deviceId = 1970354902204420,
  m_atomicModeSetting = 6,
  m_useEglStreams = false,
  m_gbmDevice = 0x7000100050007,
  m_eglDisplay = 0x700070003,
  m_presentationClock = 327687,
  m_socketNotifier = 0x7000700070007,
  m_addFB2ModifiersSupported = 5,
  m_planes = {
    d = 0x7000100030007
  },
  m_crtcs = {
    d = 0x556bab5abab0
  },
  m_connectors = {
    d = 0x556bab6581b0
  },
  m_pipelines = {
    d = 0x556bab56b5b0
  },
  m_drmOutputs = {
    d = 0x556babd27990
  },
  m_outputs = {
    d = 0x556babe865c0
  },
  m_leaseOutputs = {
    d = 0x7f08040086f0
  },
  m_leaseDevice = 0x556bab56b330
}
Comment 13 Ash Blake 2021-09-27 01:12:32 UTC
And with the crash from the getProp call in KWin::DrmPipeline::setSyncMode, m_crtc has been a null pointer in at least two backtraces

(gdb) p *m_pipeline
$2 = {
  m_output = 0x5592ffd79a3b,
  m_gpu = 0x5597a695a010,
  m_connector = 0x18,
  m_crtc = 0x0,
  m_primaryPlane = 0x0,
  m_primaryBuffer = {
    value = 0x3ff0000000000000,
    d = 0x3ff0000000000000
  },
  m_oldTestBuffer = {
    value = 0x408e000000000000,
    d = 0x0
  },
  m_legacyNeedsModeset = false,
  m_cursor = {
    pos = {
      xp = 0,
      yp = 1072693248
    },
    hotspot = {
      xp = 0,
      yp = 1083047936
    },
    buffer = {
      value = 0x403d000000000000,
      d = 0x408e080000000000
    },
    dirtyBo = false,
    dirtyPos = false
  },
  m_allObjects = {
    d = 0x0
  },
  m_formats = {
    d = 0x403d000000000000
  },
  m_lastFlags = 0
}
Comment 14 Zamundaaa 2021-09-27 06:42:33 UTC
Sounds a lot like https://bugs.kde.org/show_bug.cgi?id=442677
Comment 15 Ash Blake 2021-09-27 10:33:17 UTC
(In reply to Zamundaaa from comment #14)
> Sounds a lot like https://bugs.kde.org/show_bug.cgi?id=442677

It really does, but I already have the commit that fixed that bug in my KWin build. 
Seems like there's some other problem that causes the same crash on VT switches, and there's also this weird crash in KWin::DrmObject::getProp that happens sometimes too. If I notice crashes in some other places, I'll upload those backtraces too. 

The getProp crash case is particularly weird. At a quick glance it seems that the crtc in a pipeline could not suddenly end up null under normal circumstances, as there doesn't seem to be a method that changes a DrmPipeline's m_crtc after initialization. Maybe the memory for it was freed and used by something else, but something still used the pointer to the deleted pipeline? I guess a situation like this could cause all kinds of crashes in various places.

I'll try setting up breakpoints on destructors of various drm-related objects and keeping track of the objects' addresses to compare them after a crash happens to check if that is the case.
Comment 16 Ash Blake 2021-09-27 10:39:13 UTC
This crash is also quite unpredictable, sometimes I can switch a lot of times between two sessions with no crash, and sometimes it will crash on the first try. Usually if the crash already occurs in one of the sessions, it will then keep reoccuring whenever switching away from it and back.
Comment 17 Ash Blake 2021-09-27 13:50:33 UTC
Created attachment 141950 [details]
Debugging session with both good and bad VT switches

This is an annotated log from the debugging session with backtraces of each pipeline destruction, including the addresses of said pipelines. 

For convenience, you can also view it with basic formatting here: https://gist.github.com/telepathine/01bd060e5df3ece55f6b46bb63a78078

It features both the successful case and the failed one, which differs quite notably in the pipeline destruction department - one pipeline gets deleted three times, then that address happens to be reused as for some reason some DrmOutput still has it. This leads to a segfault originating from KWin::DrmPipeline::setSyncMode later on.
Comment 18 Ash Blake 2021-09-27 15:08:41 UTC
(In reply to Ash Blake from comment #17)

Nevermind, I totally forgot allocation could just happen at the same address after deleting something there and these multiple deletions may be normal. 
I'll redo it, also tracking construction this time.
Comment 19 Ash Blake 2021-09-27 15:57:20 UTC
(In reply to Ash Blake from comment #18)
> and these multiple deletions may be normal

Unfortunately, there is something wrong anyways even though it is not multiple deletion.

Right before the crash, a pipeline that was involved in it got created and then deleted exactly three times in a row, so this is the same situation as previously but it turns out the destruction behaviour is actually normal.

updateOutputs should not have received a deleted pipeline from findWorkingCombination though, so something is wrong here.


Construction:
$28 = (KWin::DrmPipeline * const) 0x56548a2aebd0
#0  KWin::DrmPipeline::DrmPipeline(KWin::DrmGpu*, KWin::DrmConnector*, KWin::DrmCrtc*, KWin::DrmPlane*) (this=this@entry=0x56548a2aebd0, gpu=0x565489679430, conn=0x565489e91be0, crtc=crtc@entry=0x5654896e4eb0, primaryPlane=primaryPlane@entry=0x5654896be1b0) at /home/ash/kde/src/kwin/src/plugins/platforms/drm/drm_pipeline.cpp:37
#1  0x00007f0549d5e49c in operator()(KWin::DrmCrtc*, KWin::DrmPlane*) const (__closure=__closure@entry=0x7ffe8d5e8660, crtc=0x5654896e4eb0, primaryPlane=0x5654896be1b0) at /home/ash/kde/src/kwin/src/plugins/platforms/drm/drm_gpu.cpp:364

Destruction:
$29 = (KWin::DrmPipeline * const) 0x56548a2aebd0
#0  KWin::DrmPipeline::~DrmPipeline() (this=0x56548a2aebd0, __in_chrg=<optimized out>) at /usr/include/c++/11.1.0/bits/atomic_base.h:479
#1  0x00007f0549d5e99e in operator()(KWin::DrmCrtc*, KWin::DrmPlane*) const (__closure=__closure@entry=0x7ffe8d5e8660, crtc=<optimized out>, primaryPlane=0x7ffe8d5e85a8) at /home/ash/kde/src/kwin/src/plugins/platforms/drm/drm_gpu.cpp:373



Relevant lines from the segfault backtrace, with yet another exact point of crash:
#0  QSharedPointer<KWin::DrmBuffer>::deref(QtSharedPointer::ExternalRefCountData*) (dd=0x565400000002) at /usr/include/qt/QtCore/qsharedpointer_impl.h:454
#1  QSharedPointer<KWin::DrmBuffer>::deref() (this=<synthetic pointer>) at /usr/include/qt/QtCore/qsharedpointer_impl.h:453
#2  QSharedPointer<KWin::DrmBuffer>::~QSharedPointer() (this=<synthetic pointer>, __in_chrg=<optimized out>) at /usr/include/qt/QtCore/qsharedpointer_impl.h:310
#3  QSharedPointer<KWin::DrmBuffer>::operator=(QSharedPointer<KWin::DrmBuffer> const&) (other=<optimized out>, other=..., this=0x56548a2aebf8) at /usr/include/qt/QtCore/qsharedpointer_impl.h:333
#4  KWin::DrmPipeline::present(QSharedPointer<KWin::DrmBuffer> const&) (this=0x56548a2aebd0, buffer=...) at /home/ash/kde/src/kwin/src/plugins/platforms/drm/drm_pipeline.cpp:81
#5  0x00007f0549d55bb8 in KWin::DrmOutput::present(QSharedPointer<KWin::DrmBuffer> const&, QRegion) (this=this@entry=0x565489e97d50, buffer=..., damagedRegion=...) at /home/ash/kde/src/kwin/src/plugins/platforms/drm/drm_output.cpp:394
Comment 20 Bug Janitor Service 2021-09-27 16:23:41 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/1466
Comment 21 Zamundaaa 2021-09-27 16:28:52 UTC
I think I found the issue. If DrmGpu::findWorkingCombination doesn't find any functional combinations then the Pipelines in DrmOutput will be deleted but neither set to nullptr nor reverted back to what they were originally.
The patch should "fix" that but I'd still like to find the actual source of the problem. 

You likely have some lines with something like "Atomic test for CommitMode::Commit failed! Invalid Argument" and a bunch of numbers below it in your ~/.local/share/sddm/wayland-session.log when KWin crashes. Could you have a look at what the exact error messages are?
Comment 22 Ash Blake 2021-09-27 16:56:52 UTC
Created attachment 141955 [details]
KWin DRM log messages

(In reply to Zamundaaa from comment #21)
> You likely have some lines with something like "Atomic test for
> CommitMode::Commit failed! Invalid Argument" and a bunch of numbers below it
> in your ~/.local/share/sddm/wayland-session.log when KWin crashes. Could you
> have a look at what the exact error messages are?

For some reason they weren't in the log anymore, so I just ran in a TTY:
$ (QT_LOGGING_RULES="kwin_wayland_drm.*=true" kwin_wayland 2>&1) > kwin_wayland_drm.log

Are these fine or should I get logs from the full Plasma session?
Comment 23 Ash Blake 2021-09-27 17:03:26 UTC
Created attachment 141956 [details]
KWin DRM log from another machine (AMD GPU)
Comment 24 Aleix Pol 2021-09-27 18:42:38 UTC
Git commit a668c7018dc61b0e0b77e19657d735ec743b5676 by Aleix Pol Gonzalez, on behalf of Aleix Pol.
Committed on 27/09/2021 at 18:41.
Pushed by apol into branch 'master'.

Address regression in VT switching code
Related: bug 442852

M  +1    -1    lookandfeel/contents/components/SessionManagementScreen.qml
M  +1    -0    lookandfeel/contents/components/UserDelegate.qml
M  +1    -0    lookandfeel/contents/components/UserList.qml
M  +6    -1    lookandfeel/contents/lockscreen/LockScreenUi.qml

https://invent.kde.org/plasma/plasma-workspace/commit/a668c7018dc61b0e0b77e19657d735ec743b5676
Comment 25 Aleix Pol 2021-09-27 18:43:03 UTC
Git commit 3201576c3fc456f066ff4ead2acd2d64c14e2e9c by Aleix Pol Gonzalez, on behalf of Aleix Pol.
Committed on 27/09/2021 at 18:42.
Pushed by apol into branch 'Plasma/5.23'.

Address regression in VT switching code
Related: bug 442852


(cherry picked from commit a668c7018dc61b0e0b77e19657d735ec743b5676)

M  +1    -1    lookandfeel/contents/components/SessionManagementScreen.qml
M  +1    -0    lookandfeel/contents/components/UserDelegate.qml
M  +1    -0    lookandfeel/contents/components/UserList.qml
M  +6    -1    lookandfeel/contents/lockscreen/LockScreenUi.qml

https://invent.kde.org/plasma/plasma-workspace/commit/3201576c3fc456f066ff4ead2acd2d64c14e2e9c
Comment 26 Ash Blake 2021-09-27 20:32:08 UTC
Created attachment 141964 [details]
KWin DRM log messages from full Plasma session

I got some of these errors in my wayland-session.log now.
They're different, all of them are 'permission denied'
Comment 27 Ash Blake 2021-09-28 13:51:04 UTC
(In reply to Zamundaaa from comment #21)
> The patch should "fix" that but I'd still like to find the actual source of
> the problem. 

The stability has definitely improved with that patch, but some crashes still happened, way less often than before.

Now I also applied the patches from MR 1467 and I can't trigger a crash anymore, and I don't see "DrmGpu::findWorkingCombination failed to find any functional combinations!" anymore in the logs.

Looks like these two merge requests resolve this bug.
Comment 28 Nate Graham 2021-09-28 16:26:37 UTC
Git commit de674e087a1910f30dba9f2a3b184071ef86be1c by Nate Graham, on behalf of Xaver Hugl.
Committed on 28/09/2021 at 16:23.
Pushed by ngraham into branch 'master'.

platforms/drm: make failure of findWorkingCombination less severe

While findWorkingCombination should never fail, in the case it does
KWin should not crash. To achieve that simply restore the old config
in case of failure.

M  +15   -3    src/plugins/platforms/drm/drm_gpu.cpp

https://invent.kde.org/plasma/kwin/commit/de674e087a1910f30dba9f2a3b184071ef86be1c
Comment 29 Nate Graham 2021-09-28 16:47:23 UTC
Git commit f18bf757928ec41e0300d61d17a68c7d9033816e by Nate Graham, on behalf of Xaver Hugl.
Committed on 28/09/2021 at 16:46.
Pushed by ngraham into branch 'Plasma/5.23'.

platforms/drm: make failure of findWorkingCombination less severe

While findWorkingCombination should never fail, in the case it does
KWin should not crash. To achieve that simply restore the old config
in case of failure.
(cherry picked from commit de674e087a1910f30dba9f2a3b184071ef86be1c)

M  +15   -3    src/plugins/platforms/drm/drm_gpu.cpp

https://invent.kde.org/plasma/kwin/commit/f18bf757928ec41e0300d61d17a68c7d9033816e
Comment 30 Nate Graham 2021-09-28 17:33:02 UTC
Git commit eb1daa0aadcbae3f4be8ca7450f648040a52013c by Nate Graham, on behalf of Vlad Zahorodnii.
Committed on 28/09/2021 at 17:31.
Pushed by ngraham into branch 'master'.

platforms/drm: Avoid re-using blobs

Blobs are not reference counted if used by other drm master, if kwin
re-uses a deleted blob in an atomic commit, it will fail. For example,
on my computer, this happens when kwin starts after xorg.

Besides that, kwin may try to destroy blobs that it doesn't own, which
is not fatal but it's strange to do so.
Related: bug 442603

M  +12   -56   src/plugins/platforms/drm/drm_object.cpp
M  +1    -3    src/plugins/platforms/drm/drm_object.h
M  +8    -5    src/plugins/platforms/drm/drm_object_connector.cpp
M  +20   -17   src/plugins/platforms/drm/drm_object_plane.cpp

https://invent.kde.org/plasma/kwin/commit/eb1daa0aadcbae3f4be8ca7450f648040a52013c
Comment 31 Nate Graham 2021-09-28 17:39:46 UTC
This is fixed by the combination of those commits! Thanks Vlad and Xaver!
Comment 32 Zamundaaa 2021-09-28 18:07:45 UTC
Git commit 6e3c3936dc3924105c49f8e0b41bf789883d173b by Xaver Hugl, on behalf of Vlad Zahorodnii.
Committed on 28/09/2021 at 18:05.
Pushed by zamundaaa into branch 'Plasma/5.23'.

platforms/drm: Avoid re-using blobs

Blobs are not reference counted if used by other drm master, if kwin
re-uses a deleted blob in an atomic commit, it will fail. For example,
on my computer, this happens when kwin starts after xorg.

Besides that, kwin may try to destroy blobs that it doesn't own, which
is not fatal but it's strange to do so.
Related: bug 442603

M  +12   -56   src/plugins/platforms/drm/drm_object.cpp
M  +1    -3    src/plugins/platforms/drm/drm_object.h
M  +8    -5    src/plugins/platforms/drm/drm_object_connector.cpp
M  +20   -17   src/plugins/platforms/drm/drm_object_plane.cpp

https://invent.kde.org/plasma/kwin/commit/6e3c3936dc3924105c49f8e0b41bf789883d173b