Bug 438789 - kwin_wayland crashed in KWin::DrmGpu::updateOutputs while un/re-plugging monitor
Summary: kwin_wayland crashed in KWin::DrmGpu::updateOutputs while un/re-plugging monitor
Status: RESOLVED FIXED
Alias: None
Product: kwin
Classification: Plasma
Component: platform-drm (show other bugs)
Version: 5.22.0
Platform: openSUSE Linux
: NOR crash
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords: wayland
: 439208 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-06-17 05:58 UTC by Jiri Slaby
Modified: 2021-07-12 13:07 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.22.4


Attachments
~/.local/share/sddm/wayland-session.log (23.69 KB, text/plain)
2021-06-20 06:40 UTC, Jiri Slaby
Details
~/.local/share/sddm/wayland-session.log (23.69 KB, application/x-xz)
2021-07-07 05:38 UTC, Jiri Slaby
Details
wayland-session.log (245.84 KB, text/plain)
2021-07-08 13:26 UTC, Markus Knetschke
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jiri Slaby 2021-06-17 05:58:58 UTC
SUMMARY
When I unplugged and replugged an external monitor the following crash occurred. I don't know if the unplug or replug caused this as I did it both in one second or so.

> #0  qDeleteAll<KWin::DrmConnector* const*>(KWin::DrmConnector* const*, KWin::DrmConnector* const*) (end=0x558831b810d0, begin=0x558831b810c8) at /usr/include/qt5/QtCore/qalgorithms.h:320
> #1  qDeleteAll<QVector<KWin::DrmConnector*> >(QVector<KWin::DrmConnector*> const&) (c=<optimized out>, c=...) at /usr/include/qt5/QtCore/qalgorithms.h:328
> #2  KWin::DrmGpu::updateOutputs() [clone .isra.0] (this=<optimized out>) at /usr/src/debug/kwin5-5.22.0-2.1.x86_64/src/plugins/platforms/drm/drm_gpu.cpp:308
> #3  0x00007f3f56df8c2e in KWin::DrmBackend::updateOutputs() (this=0x5588308190a0) at /usr/src/debug/kwin5-5.22.0-2.1.x86_64/src/plugins/platforms/drm/drm_backend.cpp:328
> #4  KWin::DrmBackend::updateOutputs() (this=this@entry=0x5588308190a0) at /usr/src/debug/kwin5-5.22.0-2.1.x86_64/src/plugins/platforms/drm/drm_backend.cpp:320
> #5  0x00007f3f56df9a1b in KWin::DrmBackend::handleUdevEvent() (this=0x5588308190a0) at /usr/src/debug/kwin5-5.22.0-2.1.x86_64/src/plugins/platforms/drm/drm_backend.cpp:230
> #6  0x00007f3f5d3f40f3 in QtPrivate::QSlotObjectBase::call(QObject*, void**) (a=0x7fff5456c7b0, r=0x5588308190a0, this=0x55883083ca90) at ../../include/QtCore/../../src/corelib/kernel/qobjectdefs_impl.h:398
> #7  doActivate<false>(QObject*, int, void**) (sender=0x558830839290, signal_index=3, argv=0x7fff5456c7b0) at kernel/qobject.cpp:3886
> #8  0x00007f3f5d3ed5bf in QMetaObject::activate(QObject*, QMetaObject const*, int, void**) (sender=sender@entry=0x558830839290, m=m@entry=0x7f3f5d6a0aa0, local_signal_index=local_signal_index@entry=0, argv=argv@entry=0x7fff5456c7b0) at kernel/qobject.cpp:3946
> #9  0x00007f3f5d3f74bf in QSocketNotifier::activated(QSocketDescriptor, QSocketNotifier::Type, QSocketNotifier::QPrivateSignal) (this=this@entry=0x558830839290, _t1=..., _t2=<optimized out>, _t3=...) at .moc/moc_qsocketnotifier.cpp:178
> #10 0x00007f3f5d3f7cbb in QSocketNotifier::event(QEvent*) (this=0x558830839290, e=0x7fff5456c8d0) at kernel/qsocketnotifier.cpp:302
> #11 0x00007f3f5e2cda5f in QApplicationPrivate::notify_helper(QObject*, QEvent*) (this=<optimized out>, receiver=0x558830839290, e=0x7fff5456c8d0) at kernel/qapplication.cpp:3632
> #12 0x00007f3f5d3bdaaa in QCoreApplication::notifyInternal2(QObject*, QEvent*) (receiver=0x558830839290, event=0x7fff5456c8d0) at kernel/qcoreapplication.cpp:1063
> #13 0x00007f3f5d4124ab in QEventDispatcherUNIXPrivate::activateSocketNotifiers() (this=0x5588307c3a10) at kernel/qeventdispatcher_unix.cpp:304
> #14 0x00007f3f5d41290b in QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (this=<optimized out>, flags=...) at kernel/qeventdispatcher_unix.cpp:511
> #15 0x000055882f39339d in QUnixEventDispatcherQPA::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (this=<optimized out>, flags=...) at qunixeventdispatcher.cpp:63
> #16 0x00007f3f5d3bc4bb in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (this=this@entry=0x7fff5456ca60, flags=..., flags@entry=...) at ../../include/QtCore/../../src/corelib/global/qflags.h:69
> #17 0x00007f3f5d3c4790 in QCoreApplication::exec() () at ../../include/QtCore/../../src/corelib/global/qflags.h:121
> #18 0x000055882f33626a in main(int, char**) (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/kwin5-5.22.0-2.1.x86_64/src/main_wayland.cpp:727

> (gdb) fram 2
> #2  KWin::DrmGpu::updateOutputs() [clone .isra.0] (this=<optimized out>) at /usr/src/debug/kwin5-5.22.0-2.1.x86_64/src/plugins/platforms/drm/drm_gpu.cpp:308
> 308         qDeleteAll(oldConnectors);
> (gdb) l
> 303
> 304         for(DrmOutput *removedOutput : removedOutputs) {
> 305             removeOutput(removedOutput);
> 306         }
> 307
> 308         qDeleteAll(oldConnectors);
> 309         qDeleteAll(oldCrtcs);
> 310         return true;
> 311     }
> 312



STEPS TO REPRODUCE
1. Unplug thunderbolt (behind which a monitor is connected)
2. Replug thunderbolt
3. boom


SOFTWARE/OS VERSIONS
Operating System: openSUSE Tumbleweed 20210614
KDE Plasma Version: 5.22.0
KDE Frameworks Version: 5.82.0
Qt Version: 5.15.2
Kernel Version: 5.12.10-3.g332b26c-default (64-bit)
Graphics Platform: Wayland
Processors: 4 × Intel® Core™ i7-6600U CPU @ 2.60GHz
Memory: 15.1 GiB of RAM
Graphics Processor: Mesa DRI Intel® HD Graphics 520

ADDITIONAL INFORMATION
Running in wayland.
Comment 1 Zamundaaa 2021-06-18 16:15:26 UTC
Can you reproduce it again and then attach the resulting ~/.local/share/sddm/wayland-session.log file?

Does this only happen when you do it fast, or is there no crash if you first unplug, wait for a bit and then replug?
Comment 2 Jiri Slaby 2021-06-20 06:40:34 UTC
Created attachment 139531 [details]
~/.local/share/sddm/wayland-session.log

(In reply to Zamundaaa from comment #1)
> Can you reproduce it again and then attach the resulting
> ~/.local/share/sddm/wayland-session.log file?
> 
> Does this only happen when you do it fast, or is there no crash if you first
> unplug, wait for a bit and then replug?

We had some electricity outages during the night and this happened again. They were like 1 s, so rather fast.
Comment 3 Zamundaaa 2021-06-20 15:55:46 UTC
Hmm, there's a lot of (possibly unrelated) errors in that log before and after the crash which suggest there's either another bug in KWin or a bug in the graphics driver.

I assume the thunderbolt dock has its own power supply? Does it also work without it? And did you need to install a separate driver like DisplayLink to get it to work?

Could you also add
QT_LOGGING_RULES="kwin_*.debug=true;kwin_libinput.debug=false"
to /etc/environment, reboot and provide the same file again in the following situations:
1. without any crash or unplug (to check for the other error)
2. when you plug it out, wait for 10 seconds and plug it in again
3. with the crash

That should tell us whether or not it's on unplug or hotplug, and how much time between the two events being processed by KWin passed.
Comment 4 Jiri Slaby 2021-06-23 03:19:16 UTC
(In reply to Zamundaaa from comment #3)
> Hmm, there's a lot of (possibly unrelated) errors in that log before and
> after the crash which suggest there's either another bug in KWin or a bug in
> the graphics driver.

FWIW, it sometimes crashes (with the same stack trace) also when I only unplug the connector.

> I assume the thunderbolt dock has its own power supply?

Yes.

> Does it also work without it?

I don't know, the dock powers the laptop (over thunderbolt).

> And did you need to install a separate driver like DisplayLink
> to get it to work?

No, I don't even know about its existence.

> Could you also add
> QT_LOGGING_RULES="kwin_*.debug=true;kwin_libinput.debug=false"
> to /etc/environment, reboot and provide the same file again in the following
> situations:
> 1. without any crash or unplug (to check for the other error)
> 2. when you plug it out, wait for 10 seconds and plug it in again
> 3. with the crash
> 
> That should tell us whether or not it's on unplug or hotplug, and how much
> time between the two events being processed by KWin passed.

I set the env var and will restart the session later, but I answered the question I think.
Comment 5 Zamundaaa 2021-06-26 18:00:17 UTC
I looked into it a bit more and there's a bunch of
> failed to open drm device at ""
in the wayland session log which mean that KWin is detecting a GPU hotplug (not unplug!) of a device with an empty path, but failing to use the device (as intended, as the path is nonsense).
When it crashes adding a hotplugged device seems to suceed but it then fails updating the outputs on that device.

As to what device gets added or how that would cause a crash, 5.22.1 contains some more and improved debug logging for the drm stuff and specifically for hot(un)plugging GPUs, so if you use the env var and 5.22.1 that could yield some more useful information in the wayland-session log
Comment 6 David Edmundson 2021-06-27 22:27:09 UTC
*** Bug 439208 has been marked as a duplicate of this bug. ***
Comment 7 Jiri Slaby 2021-07-07 05:38:56 UTC
Created attachment 139913 [details]
~/.local/share/sddm/wayland-session.log

(In reply to Zamundaaa from comment #3)
> Could you also add
> QT_LOGGING_RULES="kwin_*.debug=true;kwin_libinput.debug=false"
> to /etc/environment, reboot and provide the same file again in the following
> situations:
...
> 3. with the crash

I believe this is "3." log. Happened during unplug:
> kwin_wayland_drm: failed to open drm device at ""

And a lot of:

> kwin_wayland_drm: drmModeAddFB2 and drmModeAddFB both failed! Nepřípustný argument

$ errno -l|grep 'Nepřípustný argument'
EINVAL 22 Nepřípustný argument


But those second messages are there even long before the crash.
Comment 8 Zamundaaa 2021-07-07 12:52:39 UTC
It seems like the logging rules aren't applied, did you add them? You can check if it worked with
echo $QT_LOGGING_RULES
it should print out the "kwin_*.debug=true;kwin_libinput.debug=false" line
Comment 9 Jiri Slaby 2021-07-08 05:27:18 UTC
(In reply to Zamundaaa from comment #8)
> It seems like the logging rules aren't applied, did you add them?
Ah, I am stupid. I am not using sddm for a couple of weeks as it doesn't start any session, so the log was old. I am using this from console:
/usr/lib64/libexec/plasma-dbus-run-session-if-needed /usr/bin/startplasma-wayland

I assume the output is logged nowhere. Let me redirect the output to a file.

> You can check if it worked with
> echo $QT_LOGGING_RULES

That's correctly set:
$ echo $QT_LOGGING_RULES
kwin_*.debug=true;kwin_libinput.debug=false


BTW I also applied the patch from bug 439208#c3, but kwin still crashes with the very same backtrace.
Comment 10 Jiri Slaby 2021-07-08 05:38:58 UTC
(In reply to Jiri Slaby from comment #9)
> BTW I also applied the patch from bug 439208#c3, but kwin still crashes with
> the very same backtrace.

Just checked with the core file:
> (gdb) l DrmConnector::DrmConnector
> 25          if (m_conn) {
> 26              for (int i = 0; i < m_conn->count_encoders; ++i) {
> 27                  m_encoders << m_conn->encoders[i];
> 28              }
> 29          } else {
> 30              qCWarning(KWIN_DRM) << "drmModeGetConnector failed!" << strerror(errno);
> 31          }
> 32      }
> 33
> 34      DrmConnector::~DrmConnector() = default;
Comment 11 Markus Knetschke 2021-07-08 13:26:20 UTC
Created attachment 139952 [details]
wayland-session.log

Created with kwin 5.22.1 and QT_LOGGING_RULES="kwin_*.debug=true;kwin_libinput.debug=false" on gentoo. Crash happend while undocking
Comment 12 Bug Janitor Service 2021-07-11 23:06:32 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/1164
Comment 13 Zamundaaa 2021-07-11 23:08:07 UTC
Thanks for all the information, was really helpful in pinning it down.
Comment 14 Jiri Slaby 2021-07-12 08:07:23 UTC
(In reply to Zamundaaa from comment #13)
> Thanks for all the information, was really helpful in pinning it down.

Thanks, patched, waiting for no crashes :).
Comment 15 Zamundaaa 2021-07-12 12:46:01 UTC
I'll merge it now, if it does happen again please do reopen!
Comment 16 Zamundaaa 2021-07-12 12:46:09 UTC
Git commit 54ce400764184eee067dc4f3d8d81cee2ec25537 by Xaver Hugl.
Committed on 11/07/2021 at 22:56.
Pushed by zamundaaa into branch 'Plasma/5.22'.

platforms/drm: don't delete connectors in DrmGpu::removeOutput

In DrmGpu::updateOutputs the connector is in the oldConnectors vector,
in DrmGpu::~DrmGpu it's in m_connectors. In both cases that's causing a
double free.

M  +0    -2    src/plugins/platforms/drm/drm_gpu.cpp

https://invent.kde.org/plasma/kwin/commit/54ce400764184eee067dc4f3d8d81cee2ec25537