SUMMARY Switching users causes the screen to go black. Had to do a hard reboot. STEPS TO REPRODUCE 1. Open Application Launcher 2. Select Switch User OBSERVED RESULT Screen goes black and freezes. EXPECTED RESULT Should be able to switch users normally SOFTWARE/OS VERSIONS Operating System: KDE neon Unstable Edition KDE Plasma Version: 5.22.80 KDE Frameworks Version: 5.85.0 Qt Version: 5.15.3 Kernel Version: 5.8.0-59-generic (64-bit) Graphics Platform: Wayland Processors: 4 × Intel® Core™ i7-4510U CPU @ 2.00GHz Memory: 15.5 GiB of RAM Graphics Processor: Mesa DRI Intel® HD Graphics 4400 ADDITIONAL INFORMATION Not sure if being Wayland has anything to do with it.
I *think* this may be a known issue on Wayland. Does work if you switch users from X11?
Yeah, known issue. We can use this to track it. Added to https://community.kde.org/Plasma/Wayland_Showstoppers
>Does work if you switch users from X11? Marking as needs info. Also, lets avoid tagging wayland regressions as VHI, it's not our default setup.
(In reply to David Edmundson from comment #3) > >Does work if you switch users from X11? > > Marking as needs info. Yeah, user switching works for me on X11. On Wayland, I get to the login screen, but logging into the other user fails; after I enter the other user's password and click the login button I get kicked back to the lock screen of my existing session. When I unlock, no apps are running, as if KWin crashed in the background. However I do not see any coredumpctl logs about a KWin crash. I do have a ksmserver crash though. The only relevant part of the backtrace is this: #8 0x000000000040a621 in main (argc=<optimized out>, argv=0x7ffdc1355bb8) at /home/nate/kde/src/plasma-workspace/ksmserver/main.cpp:214 > Also, lets avoid tagging wayland regressions as VHI, it's not our default > setup. OK.
In fact ksmserve doesn't even want to launch at all, even manually: ~/kde/usr/bin/ksmserver org.kde.kf5.ksmserver: Cannot connect to the X server qt.qpa.xcb: could not connect to display :1 This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem. Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, xcb. Aborted (core dumped)
Actually maybe it logged into the other user after all, as `reboot` tells me anothe ruser is logged in: $ reboot User konqi is logged in on tty4. Please retry operation after closing inhibitors and logging out other users. Alternatively, ignore inhibitors and users with 'systemctl reboot -i'. Maybe the bug is that it didn't succeed in logging *me* into that user?
(In reply to Nate Graham from comment #4) > On Wayland, I get to the login screen, but logging into the other user > fails; after I enter the other user's password and click the login button I > get kicked back to the lock screen of my existing session. This works for me, the new user's session starts every time. However, either the old user's session or the new user's session will crash when switching. I have kwin_wayland coredumps from both users and I'll upload the backtraces soon. (In reply to Nate Graham from comment #6) > Maybe the bug is that it didn't succeed in logging *me* into that user? (In reply to Nate Graham from comment #5) > ~/kde/usr/bin/ksmserver It probably failed to log in as the other user because you tried to run the same development KDE session, and the other user wouldn't be able to read and execute anything in your home directory. startplasma-wayland won't run, so the new user will only be running the systemd user daemon and whatever stuff it started. Give konqi execute access to your home directory so he can cd into it, and make him the group of $HOME/kde so he can read and execute things in there: $ setfacl -m u:konqi:x $HOME $ chown -R $USER:konqi $HOME/kde
Created attachment 141922 [details] Backtrace (git master) The old user's and new user's backtraces look the same.
Created attachment 141923 [details] Screen recording showing the problem
A possibly relevant merge request was started @ https://invent.kde.org/plasma/plasma-workspace/-/merge_requests/1082
Created attachment 141939 [details] Another backtrace, crash in DrmPipeline::populateAtomicValues Another crash on VT switch. obj pointed to an unreadable location.
(In reply to Ash Blake from comment #11) > Created attachment 141939 [details] The contents of m_gpu look odd - cursor size of 262147x458757, insanely high file descriptor and some suspicious looking addresses. It seems like DrmGpu got destroyed, but it still got used somehow. (gdb) p *m_gpu $11 = { ... m_backend = 0x7000700060007, m_eglBackend = { wp = { d = 0x7000600070005, value = 0x6000700020007 } }, m_devNode = { d = 0x3000100020003 }, m_cursorSize = { wd = 262147, ht = 458757 }, m_fd = 262145, m_deviceId = 1970354902204420, m_atomicModeSetting = 6, m_useEglStreams = false, m_gbmDevice = 0x7000100050007, m_eglDisplay = 0x700070003, m_presentationClock = 327687, m_socketNotifier = 0x7000700070007, m_addFB2ModifiersSupported = 5, m_planes = { d = 0x7000100030007 }, m_crtcs = { d = 0x556bab5abab0 }, m_connectors = { d = 0x556bab6581b0 }, m_pipelines = { d = 0x556bab56b5b0 }, m_drmOutputs = { d = 0x556babd27990 }, m_outputs = { d = 0x556babe865c0 }, m_leaseOutputs = { d = 0x7f08040086f0 }, m_leaseDevice = 0x556bab56b330 }
And with the crash from the getProp call in KWin::DrmPipeline::setSyncMode, m_crtc has been a null pointer in at least two backtraces (gdb) p *m_pipeline $2 = { m_output = 0x5592ffd79a3b, m_gpu = 0x5597a695a010, m_connector = 0x18, m_crtc = 0x0, m_primaryPlane = 0x0, m_primaryBuffer = { value = 0x3ff0000000000000, d = 0x3ff0000000000000 }, m_oldTestBuffer = { value = 0x408e000000000000, d = 0x0 }, m_legacyNeedsModeset = false, m_cursor = { pos = { xp = 0, yp = 1072693248 }, hotspot = { xp = 0, yp = 1083047936 }, buffer = { value = 0x403d000000000000, d = 0x408e080000000000 }, dirtyBo = false, dirtyPos = false }, m_allObjects = { d = 0x0 }, m_formats = { d = 0x403d000000000000 }, m_lastFlags = 0 }
Sounds a lot like https://bugs.kde.org/show_bug.cgi?id=442677
(In reply to Zamundaaa from comment #14) > Sounds a lot like https://bugs.kde.org/show_bug.cgi?id=442677 It really does, but I already have the commit that fixed that bug in my KWin build. Seems like there's some other problem that causes the same crash on VT switches, and there's also this weird crash in KWin::DrmObject::getProp that happens sometimes too. If I notice crashes in some other places, I'll upload those backtraces too. The getProp crash case is particularly weird. At a quick glance it seems that the crtc in a pipeline could not suddenly end up null under normal circumstances, as there doesn't seem to be a method that changes a DrmPipeline's m_crtc after initialization. Maybe the memory for it was freed and used by something else, but something still used the pointer to the deleted pipeline? I guess a situation like this could cause all kinds of crashes in various places. I'll try setting up breakpoints on destructors of various drm-related objects and keeping track of the objects' addresses to compare them after a crash happens to check if that is the case.
This crash is also quite unpredictable, sometimes I can switch a lot of times between two sessions with no crash, and sometimes it will crash on the first try. Usually if the crash already occurs in one of the sessions, it will then keep reoccuring whenever switching away from it and back.
Created attachment 141950 [details] Debugging session with both good and bad VT switches This is an annotated log from the debugging session with backtraces of each pipeline destruction, including the addresses of said pipelines. For convenience, you can also view it with basic formatting here: https://gist.github.com/telepathine/01bd060e5df3ece55f6b46bb63a78078 It features both the successful case and the failed one, which differs quite notably in the pipeline destruction department - one pipeline gets deleted three times, then that address happens to be reused as for some reason some DrmOutput still has it. This leads to a segfault originating from KWin::DrmPipeline::setSyncMode later on.
(In reply to Ash Blake from comment #17) Nevermind, I totally forgot allocation could just happen at the same address after deleting something there and these multiple deletions may be normal. I'll redo it, also tracking construction this time.
(In reply to Ash Blake from comment #18) > and these multiple deletions may be normal Unfortunately, there is something wrong anyways even though it is not multiple deletion. Right before the crash, a pipeline that was involved in it got created and then deleted exactly three times in a row, so this is the same situation as previously but it turns out the destruction behaviour is actually normal. updateOutputs should not have received a deleted pipeline from findWorkingCombination though, so something is wrong here. Construction: $28 = (KWin::DrmPipeline * const) 0x56548a2aebd0 #0 KWin::DrmPipeline::DrmPipeline(KWin::DrmGpu*, KWin::DrmConnector*, KWin::DrmCrtc*, KWin::DrmPlane*) (this=this@entry=0x56548a2aebd0, gpu=0x565489679430, conn=0x565489e91be0, crtc=crtc@entry=0x5654896e4eb0, primaryPlane=primaryPlane@entry=0x5654896be1b0) at /home/ash/kde/src/kwin/src/plugins/platforms/drm/drm_pipeline.cpp:37 #1 0x00007f0549d5e49c in operator()(KWin::DrmCrtc*, KWin::DrmPlane*) const (__closure=__closure@entry=0x7ffe8d5e8660, crtc=0x5654896e4eb0, primaryPlane=0x5654896be1b0) at /home/ash/kde/src/kwin/src/plugins/platforms/drm/drm_gpu.cpp:364 Destruction: $29 = (KWin::DrmPipeline * const) 0x56548a2aebd0 #0 KWin::DrmPipeline::~DrmPipeline() (this=0x56548a2aebd0, __in_chrg=<optimized out>) at /usr/include/c++/11.1.0/bits/atomic_base.h:479 #1 0x00007f0549d5e99e in operator()(KWin::DrmCrtc*, KWin::DrmPlane*) const (__closure=__closure@entry=0x7ffe8d5e8660, crtc=<optimized out>, primaryPlane=0x7ffe8d5e85a8) at /home/ash/kde/src/kwin/src/plugins/platforms/drm/drm_gpu.cpp:373 Relevant lines from the segfault backtrace, with yet another exact point of crash: #0 QSharedPointer<KWin::DrmBuffer>::deref(QtSharedPointer::ExternalRefCountData*) (dd=0x565400000002) at /usr/include/qt/QtCore/qsharedpointer_impl.h:454 #1 QSharedPointer<KWin::DrmBuffer>::deref() (this=<synthetic pointer>) at /usr/include/qt/QtCore/qsharedpointer_impl.h:453 #2 QSharedPointer<KWin::DrmBuffer>::~QSharedPointer() (this=<synthetic pointer>, __in_chrg=<optimized out>) at /usr/include/qt/QtCore/qsharedpointer_impl.h:310 #3 QSharedPointer<KWin::DrmBuffer>::operator=(QSharedPointer<KWin::DrmBuffer> const&) (other=<optimized out>, other=..., this=0x56548a2aebf8) at /usr/include/qt/QtCore/qsharedpointer_impl.h:333 #4 KWin::DrmPipeline::present(QSharedPointer<KWin::DrmBuffer> const&) (this=0x56548a2aebd0, buffer=...) at /home/ash/kde/src/kwin/src/plugins/platforms/drm/drm_pipeline.cpp:81 #5 0x00007f0549d55bb8 in KWin::DrmOutput::present(QSharedPointer<KWin::DrmBuffer> const&, QRegion) (this=this@entry=0x565489e97d50, buffer=..., damagedRegion=...) at /home/ash/kde/src/kwin/src/plugins/platforms/drm/drm_output.cpp:394
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/1466
I think I found the issue. If DrmGpu::findWorkingCombination doesn't find any functional combinations then the Pipelines in DrmOutput will be deleted but neither set to nullptr nor reverted back to what they were originally. The patch should "fix" that but I'd still like to find the actual source of the problem. You likely have some lines with something like "Atomic test for CommitMode::Commit failed! Invalid Argument" and a bunch of numbers below it in your ~/.local/share/sddm/wayland-session.log when KWin crashes. Could you have a look at what the exact error messages are?
Created attachment 141955 [details] KWin DRM log messages (In reply to Zamundaaa from comment #21) > You likely have some lines with something like "Atomic test for > CommitMode::Commit failed! Invalid Argument" and a bunch of numbers below it > in your ~/.local/share/sddm/wayland-session.log when KWin crashes. Could you > have a look at what the exact error messages are? For some reason they weren't in the log anymore, so I just ran in a TTY: $ (QT_LOGGING_RULES="kwin_wayland_drm.*=true" kwin_wayland 2>&1) > kwin_wayland_drm.log Are these fine or should I get logs from the full Plasma session?
Created attachment 141956 [details] KWin DRM log from another machine (AMD GPU)
Git commit a668c7018dc61b0e0b77e19657d735ec743b5676 by Aleix Pol Gonzalez, on behalf of Aleix Pol. Committed on 27/09/2021 at 18:41. Pushed by apol into branch 'master'. Address regression in VT switching code Related: bug 442852 M +1 -1 lookandfeel/contents/components/SessionManagementScreen.qml M +1 -0 lookandfeel/contents/components/UserDelegate.qml M +1 -0 lookandfeel/contents/components/UserList.qml M +6 -1 lookandfeel/contents/lockscreen/LockScreenUi.qml https://invent.kde.org/plasma/plasma-workspace/commit/a668c7018dc61b0e0b77e19657d735ec743b5676
Git commit 3201576c3fc456f066ff4ead2acd2d64c14e2e9c by Aleix Pol Gonzalez, on behalf of Aleix Pol. Committed on 27/09/2021 at 18:42. Pushed by apol into branch 'Plasma/5.23'. Address regression in VT switching code Related: bug 442852 (cherry picked from commit a668c7018dc61b0e0b77e19657d735ec743b5676) M +1 -1 lookandfeel/contents/components/SessionManagementScreen.qml M +1 -0 lookandfeel/contents/components/UserDelegate.qml M +1 -0 lookandfeel/contents/components/UserList.qml M +6 -1 lookandfeel/contents/lockscreen/LockScreenUi.qml https://invent.kde.org/plasma/plasma-workspace/commit/3201576c3fc456f066ff4ead2acd2d64c14e2e9c
Created attachment 141964 [details] KWin DRM log messages from full Plasma session I got some of these errors in my wayland-session.log now. They're different, all of them are 'permission denied'
(In reply to Zamundaaa from comment #21) > The patch should "fix" that but I'd still like to find the actual source of > the problem. The stability has definitely improved with that patch, but some crashes still happened, way less often than before. Now I also applied the patches from MR 1467 and I can't trigger a crash anymore, and I don't see "DrmGpu::findWorkingCombination failed to find any functional combinations!" anymore in the logs. Looks like these two merge requests resolve this bug.
Git commit de674e087a1910f30dba9f2a3b184071ef86be1c by Nate Graham, on behalf of Xaver Hugl. Committed on 28/09/2021 at 16:23. Pushed by ngraham into branch 'master'. platforms/drm: make failure of findWorkingCombination less severe While findWorkingCombination should never fail, in the case it does KWin should not crash. To achieve that simply restore the old config in case of failure. M +15 -3 src/plugins/platforms/drm/drm_gpu.cpp https://invent.kde.org/plasma/kwin/commit/de674e087a1910f30dba9f2a3b184071ef86be1c
Git commit f18bf757928ec41e0300d61d17a68c7d9033816e by Nate Graham, on behalf of Xaver Hugl. Committed on 28/09/2021 at 16:46. Pushed by ngraham into branch 'Plasma/5.23'. platforms/drm: make failure of findWorkingCombination less severe While findWorkingCombination should never fail, in the case it does KWin should not crash. To achieve that simply restore the old config in case of failure. (cherry picked from commit de674e087a1910f30dba9f2a3b184071ef86be1c) M +15 -3 src/plugins/platforms/drm/drm_gpu.cpp https://invent.kde.org/plasma/kwin/commit/f18bf757928ec41e0300d61d17a68c7d9033816e
Git commit eb1daa0aadcbae3f4be8ca7450f648040a52013c by Nate Graham, on behalf of Vlad Zahorodnii. Committed on 28/09/2021 at 17:31. Pushed by ngraham into branch 'master'. platforms/drm: Avoid re-using blobs Blobs are not reference counted if used by other drm master, if kwin re-uses a deleted blob in an atomic commit, it will fail. For example, on my computer, this happens when kwin starts after xorg. Besides that, kwin may try to destroy blobs that it doesn't own, which is not fatal but it's strange to do so. Related: bug 442603 M +12 -56 src/plugins/platforms/drm/drm_object.cpp M +1 -3 src/plugins/platforms/drm/drm_object.h M +8 -5 src/plugins/platforms/drm/drm_object_connector.cpp M +20 -17 src/plugins/platforms/drm/drm_object_plane.cpp https://invent.kde.org/plasma/kwin/commit/eb1daa0aadcbae3f4be8ca7450f648040a52013c
This is fixed by the combination of those commits! Thanks Vlad and Xaver!
Git commit 6e3c3936dc3924105c49f8e0b41bf789883d173b by Xaver Hugl, on behalf of Vlad Zahorodnii. Committed on 28/09/2021 at 18:05. Pushed by zamundaaa into branch 'Plasma/5.23'. platforms/drm: Avoid re-using blobs Blobs are not reference counted if used by other drm master, if kwin re-uses a deleted blob in an atomic commit, it will fail. For example, on my computer, this happens when kwin starts after xorg. Besides that, kwin may try to destroy blobs that it doesn't own, which is not fatal but it's strange to do so. Related: bug 442603 M +12 -56 src/plugins/platforms/drm/drm_object.cpp M +1 -3 src/plugins/platforms/drm/drm_object.h M +8 -5 src/plugins/platforms/drm/drm_object_connector.cpp M +20 -17 src/plugins/platforms/drm/drm_object_plane.cpp https://invent.kde.org/plasma/kwin/commit/6e3c3936dc3924105c49f8e0b41bf789883d173b
The problem is intermittant on opensuse tumbleweed KDE Wayland Operating System: openSUSE Tumbleweed 20230619 KDE Plasma Version: 5.27.5 KDE Frameworks Version: 5.107.0 Qt Version: 5.15.10 Kernel Version: 6.3.7-1-default (64-bit) Graphics Platform: Wayland Processors: 4 × Intel® Core™ i5-3570K CPU @ 3.40GHz Memory: 15.5 GiB of RAM Graphics Processor: PITCAIRN 1. Log into a Wayland user account 2. Switch Users 3. Select another User Account from the SDDM screen to log into 4. Switch Users RESULT: *Sometimes* it works, *Sometimes* you get stuck at the SDDM screen, in such a latter case, i have used the CTRL+Backspace key combo to force close the session and re-open the SDDM screen again In some cases a hard-reset has been nessessary
This is a fairly old bug report and the code has changed a lot since it was reported. There's a very good chance the issue you're experiencing is caused by something else, even if the outward symptoms look and feel the same. Can you please submit a new bug report? Thank you!
(In reply to Nate Graham from comment #34) Was a report ever filed about that? I'm pretty sure I just experienced the same thing.