Summary: | crash-at-exit in QtCurve::Style::disconnectDBus | ||
---|---|---|---|
Product: | [Frameworks and Libraries] QtCurve | Reporter: | RJVB <rjvbertin> |
Component: | qt5 | Assignee: | Yichao Yu <yyc1992> |
Status: | CLOSED FIXED | ||
Severity: | crash | CC: | beojan, demm, edisonalvaringo, egorov_egor, eugene.shalygin+bugzilla.kde, hein, jazzvoid, jodr666, kde, kv, luke-jr+kdebugs, simonandric5, tehnick-8, tuxolinux |
Priority: | NOR | ||
Version: | git | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | All | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
prevent DBus-disconnection crashing
prevent DBus-disconnection crashing prevent DBus-disconnection crashing prevent DBus-disconnection crashing prevent DBus-disconnection crashing |
Description
RJVB
2016-05-31 17:01:19 UTC
I asked on the Qt interest ML and got this answer from Thiago Macieira: > > just like the ctor does when setting up DBus. Is that correct practice or > > should one somehow cache the result of sessionBus() in order to call it > > only once? > > It's not correct practice. > > The problem is attempting to use QtDBus during global destruction. > Apparently QtDBus has already destroyed its internals. > > I can add a few protections for that. https://codereview.qt-project.org/161056 I guess that makes sense. OTOH, the commit that introduces this is due to the observation that the dbus callbacks are being called **after** qtcurve is `dlclose`d which segfault on the function call (after all the code isn't there anymore). I'm not really sure what's the right solutions here (or what changed on Qt 5.6 that's causing this). > not really sure what's the right solutions
As in not sure what action to take here. From the description in the qt review you linked, it seems that this can be fixed properly on Qt side.
Interesting, I've never seen those segfaults you mention unless they were those that got fixed when Qt stopped unloading/closing plugins at exit (that may have been after 5.5.1). I also never tested QtCurve under a full-blown Plasma session yet. I haven't tested the latest QtCurve under X11 yet either, so maybe the crash is due to some specific interplay between DBus/OS X and Qt/OS X. My crash went away when I cached DBusConnection::sessionDBus() (not required I know now) and skipped the KWin specific connections on OS X. Then I got a deadlock though, so now I'm just skipping the disconnect completely. I'll test Thiago's patch tomorrow. I don't thinkt he crash is OSX specific (fwiw I think this is a dup of https://bugs.kde.org/show_bug.cgi?id=362907). I'm guessing this can be due to other plugins here that uses dbus too which is why I smell a global destruction ordering issue and (as mentioned above) I'm not sure how to guess the right behavior. (i.e. if qdbus is destructed before us then I don't need to disconnect the connection since it's better been disconnected already). I'm interested to know if the Qt patch helps. It seems to fix things for me. The best thing you can do is test for yourself and give feedback on the Qt codereview ticket (esp. if it doesn't work for you). This will be all the more useful if you have a test case. I've just updated Qt 5.6.1 and crashes still presents. Application: kactivitymanagerd (kactivitymanagerd), signal: Segmentation fault Using host libthread_db library "/lib64/libthread_db.so.1". [Current thread is 1 (Thread 0x7ff44c8ec8c0 (LWP 30505))] Thread 2 (Thread 0x7ff43f9e1700 (LWP 30510)): #0 0x00007ff44a109a1d in poll () at /lib64/libc.so.6 #1 0x00007ff4498b7410 in () at /usr/lib64/libxcb.so.1 #2 0x00007ff4498b91a9 in xcb_wait_for_event () at /usr/lib64/libxcb.so.1 #3 0x00007ff4425c6579 in () at /usr/lib64/libQt5XcbQpa.so.5 #4 0x00007ff44a80d2d8 in () at /usr/lib64/libQt5Core.so.5 #5 0x00007ff448cf1474 in start_thread () at /lib64/libpthread.so.0 #6 0x00007ff44a1123ed in clone () at /lib64/libc.so.6 Thread 1 (Thread 0x7ff44c8ec8c0 (LWP 30505)): [KCrash Handler] #6 0x00007ff44a804ff4 in QMutex::lock() () at /usr/lib64/libQt5Core.so.5 #7 0x00007ff44b349c27 in () at /usr/lib64/libQt5DBus.so.5 #8 0x00007ff44b34acec in QDBusConnection::sessionBus() () at /usr/lib64/libQt5DBus.so.5 #9 0x00007ff432e96c4d in QtCurve::Style::disconnectDBus() () at /usr/lib64/qt5/plugins/styles/qtcurve.so #10 0x00007ff432ed2e26 in QtCurve::StylePlugin::~StylePlugin() () at /usr/lib64/qt5/plugins/styles/qtcurve.so #11 0x00007ff432ed2e59 in QtCurve::StylePlugin::~StylePlugin() () at /usr/lib64/qt5/plugins/styles/qtcurve.so #12 0x00007ff44a9d98a9 in () at /usr/lib64/libQt5Core.so.5 #13 0x00007ff44a9cf4f2 in () at /usr/lib64/libQt5Core.so.5 #14 0x00007ff44a9cf629 in () at /usr/lib64/libQt5Core.so.5 #15 0x00007ff44aa19927 in QObject::~QObject() () at /usr/lib64/libQt5Core.so.5 #16 0x00007ff44a9ce6be in QFactoryLoader::~QFactoryLoader() () at /usr/lib64/libQt5Core.so.5 #17 0x00007ff44b7944c9 in () at /usr/lib64/libQt5Widgets.so.5 #18 0x00007ff44a061278 in __run_exit_handlers () at /lib64/libc.so.6 #19 0x00007ff44a0612c5 in () at /lib64/libc.so.6 #20 0x0000000000413fd9 in () #21 0x00007ff44aa1224c in QMetaObject::activate(QObject*, int, int, void**) () at /usr/lib64/libQt5Core.so.5 #22 0x00007ff44b3a2642 in QDBusServiceWatcher::serviceRegistered(QString const&) () at /usr/lib64/libQt5DBus.so.5 #23 0x00007ff44b3a2dcf in () at /usr/lib64/libQt5DBus.so.5 #24 0x00007ff44b3a31a0 in QDBusServiceWatcher::qt_metacall(QMetaObject::Call, int, void**) () at /usr/lib64/libQt5DBus.so.5 #25 0x00007ff44b355028 in () at /usr/lib64/libQt5DBus.so.5 #26 0x00007ff44aa12cf9 in QObject::event(QEvent*) () at /usr/lib64/libQt5Core.so.5 #27 0x00007ff44b725aec in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib64/libQt5Widgets.so.5 #28 0x00007ff44b72af8f in QApplication::notify(QObject*, QEvent*) () at /usr/lib64/libQt5Widgets.so.5 #29 0x00007ff44a9e4a20 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib64/libQt5Core.so.5 #30 0x00007ff44a9e699c in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () at /usr/lib64/libQt5Core.so.5 #31 0x00007ff44aa3a6c3 in () at /usr/lib64/libQt5Core.so.5 #32 0x00007ff4487c6e57 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0 #33 0x00007ff4487c70c0 in () at /usr/lib64/libglib-2.0.so.0 #34 0x00007ff4487c716c in g_main_context_iteration () at /usr/lib64/libglib-2.0.so.0 #35 0x00007ff44aa3aacf in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib64/libQt5Core.so.5 #36 0x00007ff44a9e276a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib64/libQt5Core.so.5 #37 0x00007ff44a9eaf6c in QCoreApplication::exec() () at /usr/lib64/libQt5Core.so.5 #38 0x0000000000411b2b in main () *** Bug 366815 has been marked as a duplicate of this bug. *** *** Bug 362907 has been marked as a duplicate of this bug. *** Is this the same bug for which Beojan sent a patch (created by a Debian package maintainer)? I think, the bug "added" in commit 3d8622c https://quickgit.kde.org/?p=qtcurve.git&a=commitdiff&h=3d8622c419a32033e36e940e8cb09b591ad93e29 Maybe it's not only cause. But version compiled before this commit - works without crash. That commit re-activated a DBus registration which *I think* serves to let QtCurve signal or react to configuration changes so that they're applied in all running applications. The patch by the Debian maintainer(s) does something else; it removes the cleanup mechanism. I haven't checked when that mechanism was introduced (or exactly why) but it does seem to introduce complexity that is probably supposed to be unnecessary. So removing should make it easier to understand what's going on, at least. NB: I've seen a number of crashes in other code due to some kind of race conditions that somehow result from trying to do proper cleanup, notably related to unloading plugins. In most cases the solution has been just to let the system clean up. I wouldn't be surprised if in the end it's better to adopt a similar attitude with DBus functionality too. > Is this the same bug for which Beojan sent a patch (created by a Debian package maintainer)? Yes, but the debian patch is wrong. > I think, the bug "added" in commit 3d8622c https://quickgit.kde.org/?p=qtcurve.git&a=commitdiff&h=3d8622c419a32033e36e940e8cb09b591ad93e29 (The "bug" was added**). It's not a qtcurve bug, it's a QtDBus one and should be fixed there. It just happens that in some configuration the QtDBus is unloaded before QtCurve and after in other cases. The commit should only be reverted if 1. Qt guarantees that any callback registered will not be called again automatically (i.e. without unregistering in the destructor) when the plugin is unloaded (noted that the destructor of the Style itself is never called AFAICT) and/or, 2. the plugin is never unloaded (i.e. never unmap'd) > The patch by the Debian maintainer(s) does something else; it removes the cleanup mechanism. I haven't checked when that mechanism was introduced (or exactly why) but it does seem to introduce complexity that is probably supposed to be unnecessary. The reason is that `Style` is never destructed so doing clean up in the destructor doesn't work. It can't be removed if this is not fixed in Qt, after which the Debian patch won't work anymore. Close as upstream bug for now. Feel free to request for reopening/opening a new one if the bug persists after the Qt bug is fixed (see PR linked above) or there's consensus from upstream how the global destruction order can be handled in another way (possibly by **not** doing destruction, including the two bullet points I mentioned above.) In what sense is the Debian patch wrong? If it helps avoiding the crash then I'd say it may be wrong as a fix but acceptable as a workaround (which is OK for a distribution-level patch). Either way, I forgot I'm using Thiago's patch in my Qt builds, so it's probably not surprising that I'm no longer seeing this kind of bug anymore. It's wrong mainly in the sense that causes segfault in situation that doesn't segfault, i.e. it causes segfault on my setup due to (I'm guessing) interaction with other plugins. It is possibly the best a distro can do (other than avoiding the buggy qt versions) to minimize the number of users that sees the segfault so I think it's okay for debian to use it. However the current master should be more likely the right long term solution since with the qt patch it is very likely still segfault in the case fixed by the commit on the qtcurve master. > (The "bug" was added**). It's not a qtcurve bug, it's a QtDBus one and
> should be fixed there. It just happens that in some configuration the QtDBus
> is unloaded before QtCurve and after in other cases. The commit should only
> be reverted if
Well, "bug", my apologies:)
What patches (Debian, Thiago's) are you talking about? And what version of Qt do use?
thnx
Not sure how to find the public link to the patch but it basically revert the commit you link above. Th Qt bug is in 5.6+ IIRC, this is the Qt version that starts to unload (unmap) the plugin file which cause segfault if the callbakcs are not removed. I use Qt 5.6.1 with the patch under review at https://codereview.qt-project.org/161056 (I'm setting this report to "closed" as it certainly isn't "resolved" IMHO) > I use Qt 5.6.1 with the patch under review at https://codereview.qt-project.org/161056 Forgot to say that it's great to know the patch is working for you. What's blocking the patch right now? > (I'm setting this report to "closed" as it certainly isn't "resolved" IMHO) Sorry. Too many states to remember ;-p Well, I guess it's hard to patch a bug that isn't easily reproducible. From what I understand Thiago couldn't reproduce the crash himself and has been waiting for more (detailed) backtraces. (In reply to RJVB from comment #20) > Well, I guess it's hard to patch a bug that isn't easily reproducible. From > what I understand Thiago couldn't reproduce the crash himself and has been > waiting for more (detailed) backtraces. I can reproduce it. Qtcurve style triggers this bug, just install qtcurve and try to reopen systemsettings. With oxygen or breeze still it won't crash just refresh systemsettings with qtcurve it crashes. I mean install qtcurve and try to open systemsettings through the menu or through konsole. With oxygen and breeze it doesn't crash with qtcurve systemsettings it crashes. twice *** Bug 374641 has been marked as a duplicate of this bug. *** *** Bug 378723 has been marked as a duplicate of this bug. *** *** Bug 377353 has been marked as a duplicate of this bug. *** *** Bug 379175 has been marked as a duplicate of this bug. *** In the kactivitymanager case, it's at startup, not exit... Can you show a backtrace of the "kactivitymanager case" with debug information installed so we see where the crash occurs exactly? I have a hunch we might have a patch cooking that should solve that particular case.
> I mean install qtcurve and try to open systemsettings through the menu or through konsole.
Do you mean launch systemsettings or `kcmshell5 style` and then select the QtCurve style? Does it crash immediately or does only when you apply the style?
In the meantime I think we should be realist and accept the fact that whatever bug there is in Qt it's not going to be fixed anytime soon. If we want people and distributions to consider QtCurve we'll have to implement a workaround. For me the Debian patch is as good an approach to that as any: it simplifies things (m_dbusConnected becomes a boolean and not a pointer to a cleanup handler which may remain NULL when registering fails; leave certain things to low-level dtors instead of doing them explicitly, ...). The alternative could be to cache `QDBusConnection::sessionBus()` and check whether it's connected; done right that should avoid calling a stale QDbusConnection instance through QDBusConnection::sessionBus()). Intuitively I'd say that should make sense; what's the point in (re)opening a session bus connection if you're about to disconnect from the D-Bus anyway and maybe were never connected to it? I can confirm that the Debian patch prevents the crash on Linux and Mac with all Qt versions I've tried; the KaOS maintainer also uses it in his packaging. I think the least we could do is maintain a branch with this patch in place. Which doesn't prevent us from filing a(nother) Qt bug report with a fresh backtrace, of course. But FWIW, from the QDbusConnection doc (5.8.0): "he sessionBus() and systemBus() functions return open connections to the session server daemon and the system server daemon, respectively. Those connections are opened when first used and are closed when the QCoreApplication destructor is run." That means that it's probably a good idea not to call those functions in code that is or can be run *after* the QCoreApplication dtor ... and that we should disconnect from the DBus before, like in a slot connected to QCoreApplication::QCoreApplication::aboutToQuit(). (In reply to RJVB from comment #29) > Can you show a backtrace of the "kactivitymanager case" with debug > information installed so we see where the crash occurs exactly? I have a > hunch we might have a patch cooking that should solve that particular case. https://bugs.kde.org/show_bug.cgi?id=379175#c0 Hmmm, I wonder if this isn't all related to this observation, after connecting Style::disconnectDBus to QCoreApplication::aboutToQuit : #> oxygen-demo5 void QtCurve::Style::connectDBus() QtCurve::Style(0x7f7fa400c600) connected to QCoreApplication::aboutToQuit from QApplication(0x7fff5066dc98) : true org.kde.kwindowsystem: Loaded plugin "/opt/local/share/qt5/plugins/kf5/org.kde.kwindowsystem.platforms/KF5WindowSystemX11Plugin.so" for platform "xcb" void QtCurve::Style::connectDBus() QtCurve::Style(0x7f7fa402ea00) connected to QCoreApplication::aboutToQuit from QApplication(0x7fff5066dc98) : true void QtCurve::Style::connectDBus() QtCurve::Style(0x7f7fa4036000) connected to QCoreApplication::aboutToQuit from QApplication(0x7fff5066dc98) : true void QtCurve::Style::connectDBus() QtCurve::Style(0x7f7fa4036c00) connected to QCoreApplication::aboutToQuit from QApplication(0x7fff5066dc98) : true void QtCurve::Style::connectDBus() QtCurve::Style(0x7f7fa383d000) connected to QCoreApplication::aboutToQuit from QApplication(0x7fff5066dc98) : true void QtCurve::Style::connectDBus() QtCurve::Style(0x7f7fa3852a00) connected to QCoreApplication::aboutToQuit from QApplication(0x7fff5066dc98) : true void QtCurve::Style::connectDBus() QtCurve::Style(0x7f7fa4091200) connected to QCoreApplication::aboutToQuit from QApplication(0x7fff5066dc98) : true void QtCurve::Style::disconnectDBus() QtCurve::Style(0x7f7fa400c600, name = "qtcurve") Disconnecting from "qt_default_session_bus" / ":1.1133" void QtCurve::Style::disconnectDBus() QtCurve::Style(0x7f7fa402ea00, name = "qtcurve") Disconnecting from "qt_default_session_bus" / ":1.1133" void QtCurve::Style::disconnectDBus() QtCurve::Style(0x7f7fa4036000, name = "qtcurve") Disconnecting from "qt_default_session_bus" / ":1.1133" void QtCurve::Style::disconnectDBus() QtCurve::Style(0x7f7fa4036c00, name = "qtcurve") Disconnecting from "qt_default_session_bus" / ":1.1133" void QtCurve::Style::disconnectDBus() QtCurve::Style(0x7f7fa383d000, name = "qtcurve") Disconnecting from "qt_default_session_bus" / ":1.1133" void QtCurve::Style::disconnectDBus() QtCurve::Style(0x7f7fa3852a00, name = "qtcurve") Disconnecting from "qt_default_session_bus" / ":1.1133" void QtCurve::Style::disconnectDBus() QtCurve::Style(0x7f7fa4091200, name = "qtcurve") Disconnecting from "qt_default_session_bus" / ":1.1133" virtual QtCurve::Style::~Style() QtCurve::Style(0x7f7fa402ea00, name = "qtcurve") virtual QtCurve::Style::~Style() QtCurve::Style(0x7f7fa4036000, name = "qtcurve") virtual QtCurve::Style::~Style() QtCurve::Style(0x7f7fa4036c00, name = "qtcurve") virtual QtCurve::Style::~Style() QtCurve::Style(0x7f7fa3852a00, name = "qtcurve") virtual QtCurve::Style::~Style() QtCurve::Style(0x7f7fa383d000, name = "qtcurve") virtual QtCurve::Style::~Style() QtCurve::Style(0x7f7fa4091200, name = "qtcurve") virtual QtCurve::Style::~Style() QtCurve::Style(0x7f7fa400c600, name = "qtcurve") virtual QtCurve::StylePlugin::~StylePlugin() QtCurve::StylePlugin(0x7f7fa3625460) IOW, starting an application leads to a whole series of Style instances being created. It seems reasonable to supposed that only one of those will ultimately be used ... and that doing DBus disconnects in an order that isn't necessarily the same as (or inverse of) the connect order could lead to problems. Now I don't know to what extent it's a bad idea to do multiple connects to the same signals on a given dbus connection (note that they are all disconnected at the end) but apart from that I don't see anything wrong with the patch I'm going to upload in a moment. It makes things a lot simpler by doing away with the home-baked cleanup mechanism and leaving it to the Style class to connect and disconnect to the DBus at a moment that we should be certain that the DBus connection still exists and it will no longer be required. FWIW we could also listen to the QCoreApplication::destroyed signal and ignore DBus connections that we think are still open after that. One thing: can some of you please add a line qWarning() << Q_FUNC_INFO << this; to Style::connectDBus() and Style::~Style() (in qt5/qtcurve.cpp) and StylePlugin::~StylePlugin() (in qt5/qtcurve_plugin.cpp) and try to provoke the crash from the commandline? I'd be curious to know in what order the output appears. In my testing the StylePlugin() dtor is ALWAYS called after all Style instances have been deleted, so theoretically there should be nothing left cleanup (disconnect from)! FWIW, doing things just in the destructor doesn't work and that's exactly why I need to do this complicated mess. Apparently some application doesn't desctruct the Style before unloading the plugin...... Created attachment 105206 [details]
prevent DBus-disconnection crashing
The patch announced above.
In addition to the disconnect and m_dbusConnected simplification this also avoids connecting to signals sent out by "our own" application (= KWin).
(In reply to Yichao Yu from comment #34) > FWIW, doing things just in the destructor doesn't work What is it that doesn't work in that approach? > why I need to do this complicated mess. Apparently some application doesn't > desctruct the Style before unloading the plugin...... Possibly, but in that case one should apparently not try to clean up. Or maybe the plugin is loaded twice in some situations. In that case the crash we're seeing happens when the 2nd one is unloaded. I haven't checked but if your cleanup registry accepts multiple instances of that closure that would indeed explain the crash. Either way, the patch I uploaded does the DBus cleanup at the opportune moment, i.e. when the application is about to be taken down. That way you won't get accesses to stale (deleted) instances. The only possible improvement could be to turn m_dbusConnected into a counter keeping track of the number of connections. I'll have another look at how Breeze and Oxygen approach this, too. > What is it that doesn't work in that approach? What I've seen happens is 1. Plugin was unloaded 2. Callback triggers 3. Jumps to the plugin code section which is unmmapped. 4. Segfault > Possibly, but in that case one should apparently not try to clean up. The above sequence will still segfault so the clean up has to be done before the plugin is unloaded. > What is it that doesn't work in that approach? What I've seen happens is 1. Plugin was unloaded 2. Callback triggers 3. Jumps to the plugin code section which is unmmapped. 4. Segfault > Possibly, but in that case one should apparently not try to clean up. The above sequence will still segfault so the clean up has to be done before the plugin is unloaded. (Sorry, not sure why I commented twice, definitely didn't refresh page.... and not sure how to remove it.................) I maintain that the plugin shouldn't trigger callbacks if it cannot be sure that those callbacks are safe. Especially not if that cleanup is optional. It will be performed anyway, presumably when the QDBusConnection is destroyed. Both Breeze and Oxygen follow this principle, which for me is also one of the nice things of OOP. Normally I'm all for doing things cleanly (it still "bothers" me that I don't have to check the result of a "new" operation :)) but if things are cleaned up for me when their context goes out of scope, parent is destroyed, etc I'm not going to be begged to use that feature :) > I maintain that the plugin shouldn't trigger callbacks if it cannot be sure that those callbacks are safe.
Sure, the plugin is not the one triggerring the callback, the applications is, which makes the clean up not optional in some cases.
What callback are we talking about here? The function in which the crash occurs is called from the plugin dtor, not directly from the application. Logical, because the plugin is unloaded after the application, normally. > fWhat callback are we talking about here? The slots connected to the DBus signals. > The function in which the crash occurs is called from the plugin dtor, not directly from the application. Correct, because if this is not done, the application (or other plugins) can trigger DBus signals that segfaults. This is exactly the crash I mentioned very early on. https://bugs.kde.org/show_bug.cgi?id=363753#c2 Maybe that has to do with the kind of signals we're connecting to but I don't understand how this could happen. I don't see how QtCurve could be unloaded without calling the Style dtor; dlclose'ing it seems certainly something that shouldn't be happening in normal applications while they are still running. I've never seen that kind of crash happen with the attached patch but I would guess it must be something else that we should be able to address more properly than with the current cleanup callbacks. Clearly Breeze and Oxygen aren't affected so it must be possible. There's always the possibility of storing the plugin instance in each Style instance, and maintaining a list of Style instances in each plugin instance. If the Style dtor tells its plugin instance to remove it the plugin dtor can know if there are still "open" Style instances and delete those first. I think that's cleaner (and easier to debug!) than the current approach with a closure (lambda). Have you ever looked into this phenomenon that multiple Style instances are created and tried to figure out which one is actually used in the end? If that's always the last one we could disconnect all existing Style instances from DBus whenever a new Style instance is created. But we really should try without all that complexity first. There must be a reason Breeze and Oxygen don't need it. In the worst case we get (highly) sporadic crashes rather than regular/systematic crashes like we get now, which is a form of progress. Come to think of it: both styles only connect to DBus signals that are carry their own signature and which they send out themselves. Maybe that's where we need to rethink matters. QtCurve connects to more signals than both styles together. > Maybe that has to do with the kind of signals we're connecting to but I don't understand how this could happen. I don't see how QtCurve could be unloaded without calling the Style dtor; dlclose'ing it seems certainly something that shouldn't be happening in normal applications while they are still running. I just tested it and it seems to still happen on 17.04 It can be triggered by any programes I've tested that end up not showing a UI by itself `konsole --help` for example. > I've never seen that kind of crash happen with the attached patch but I would guess it must be something else that we should be able to address more properly than with the current cleanup callbacks. Clearly Breeze and Oxygen aren't affected so it must be possible. It'll be strictly worse for the case I described though might be better at working around the Qt bug. > There's always the possibility of storing the plugin instance in each Style instance, and maintaining a list of Style instances in each plugin instance. If the Style dtor tells its plugin instance to remove it the plugin dtor can know if there are still "open" Style instances and delete those first. I think that's cleaner (and easier to debug!) than the current approach with a closure (lambda). I believe that's effectly the same. The closure isn't what's causing the crash. Created attachment 105210 [details]
prevent DBus-disconnection crashing
Also disconnect from DBus in the Style dtor in case the style is deleted before the application
(In reply to Yichao Yu from comment #45) > I just tested it and it seems to still happen on 17.04 It can be triggered > by any programes I've tested that end up not showing a UI by itself `konsole > --help` for example. I cannot reproduce that - with my patch in place. If your reasoning is correct that patch should make me vulnerable to the crash. What I can confirm is that Style::connectDBus() is called (exactly once in this case), but not the Style dtor. That's a bit surprising to me, somewhere I was under the impression that class dtors were always called even if you don't delete them explicitly. There's no way to use something like a Q*Pointer class to get automatic deleting of those Style instances? But good, if you can reproduce it you can also try to figure out what other methods there are to avoid it. What Qt version are you using? > I believe that's effectly the same. The closure isn't what's causing the > crash. Effectively the same but with the closure we don't really know where the Style instance pointer comes from. That's what I hinted at earlier; if the cleanup closure is registered twice for the same instance it's not certain how valid the this pointer ("data") is when called the 2nd time. Seems farfetched, I know, but we're dealing with an issue here that we don't really understand, so the mechanism to prevent it should be as easy to understand and verify as possible. > I cannot reproduce that - with my patch in place. If your reasoning is correct that patch should make me vulnerable to the crash. Yes it'll make you vulnerable but it won't actually crash unless some unknown conditions are met. I suspect it's related to other plugins that are also using dbus but I'm not sure. > What I can confirm is that Style::connectDBus() is called (exactly once in this case), but not the Style dtor. That's a bit surprising to me, somewhere I was under the impression that class dtors were always called even if you don't delete them explicitly. Exactly. Good to know at least you can observe this. I can only say that I've seen this leads to crashes before and I don't really think this should happen. I've just checked that this doesn't crash for me anymore but I've also noticed that it is because the library is somehow not ummaped anymore, meaning if it does in some case the crash I saw before will come back again. Not sure why. (OTOH, another way to work around this is to dlopen the library itself so that the dlclose won't actually unmmap the library code. Not sure how that can be done though) > There's no way to use something like a Q*Pointer class to get automatic deleting of those Style instances? = = .... I actually have no idea who created them and how the memory is managed by either qt or the application > But good, if you can reproduce it you can also try to figure out what other methods there are to avoid it. The problem now is that I can't reproduce the crash at the moment anymore but almost everything that leads to it are still reproducible (i.e. the distructor is not called before the library is dlclosed) which means it's hard for me to tell if any alternative solution can work........ I'm checking why is dlclose not unmapping the library, though don't expect that to be very easy.... > What Qt version are you using? 5.8.0 ATM > I believe that's effectly the same. The closure isn't what's causing the > crash. > Effectively the same but with the closure we don't really know where the Style instance pointer comes from. That's what I hinted at earlier; if the cleanup closure is registered twice for the same instance it's not certain how valid the this pointer ("data") is when called the 2nd time. Seems farfetched, I know, but we're dealing with an issue here that we don't really understand, so the mechanism to prevent it should be as easy to understand and verify as possible. Sure, I'll be certainly fine with that. Though given many (all?) of the backtrace points to `QDBusConnection::sessionBus` I don't think that'll solve the problem... Created attachment 105212 [details]
prevent DBus-disconnection crashing
This version implements my idea of keeping track of the Style instances allocated by each plugin instance, and deleting any of those that remain when the plugin is deleted.
Yichao, please test if this solves your crash with `konsole --help` and similar. For me it makes no difference (but I've only tested on Mac so far).
(In reply to Yichao Yu from comment #48) > Yes it'll make you vulnerable but it won't actually crash unless some > unknown conditions are met. I suspect it's related to other plugins that are > also using dbus but I'm not sure. Usually I am the one running into seemingly inexplicable situations :) Yes, other plugins could be involved, but if I understand the DBus protocol well enough they'd have to send exactly the "right" signals (one of those QtCurve subscribes to) and exactly at the "right" moment. That seems a bit far-fetched. > Exactly. Good to know at least you can observe this. I can only say that > I've seen this leads to crashes before and I don't really think this should > happen. No it shouldn't but the overall complexity of KDE and Qt is becoming so big that it's not always possible to do certain things like proper clean up because of race conditions. That's probably why Qt no longer unloads (all) plugins explicitly. Except apparently the style plugin, or that too is handled by some automagic mechanism. > Not sure why. (OTOH, another way to work around this is to dlopen the > library itself so that the dlclose won't actually unmmap the library code. > Not sure how that can be done though) Presuming that each dlopen adds to a counter that prevents unmapping and each dlcose decreases that counter you could do it in the Style ctor and dtor. But are you sure the unmapping was performed by Qt? I've never looked into this aspect of the code but I would expect them at most to dlclose libraries and plugins. That should of course also invalidate the image of the library to some degree but not necessarily unload everything. Maybe your kernel is configured to do very strict cleanup, or RAM is so tight that it tends to unload resources that are no longer needed immediately? For the anecdote, I had a weird issue recently with QProcess where it would claim that commands failed to start while I could run them without any problem in a terminal. I finally traced that down to the fact I had deactivated "overcommit" (the Linux trick that pretends every memory allocation succeeds no matter how big it is). Asking around on the Qt development ML confirmed that there were known issues in Qt that could explain what I was seeing. But: it'd be a severe bug if Qt somehow unloaded or even dlclosed a plugin before having destroyed all class instances created through it. > = = .... I actually have no idea who created them and how the memory is > managed by either qt or the application Usually Qt itself, or the plasma-integration plugin, but some applications have a style selection feature (the oxygen-demo application currently in git/head, for instance). I've never seen an example showing that code must delete Style instances itself, so I think the application instance takes ownership. > hard for me to tell if any alternative solution can work........ I'm > checking why is dlclose not unmapping the library, though don't expect that > to be very easy.... No, as I said this is probably under very lowlevel system control; `man dlclose` just says that the library is "unloaded" when the library handle load count drops to 0. If that also means that the file is completely removed from memory will probably depend on circumstances, but should be a moot question. From `man dlclose`: > Instead, libraries should export routines using the __attribute__((constructor)) and > __attribute__((destructor)) function attributes. See the gcc info pages for information on these. Con‐ > structor routines are executed before dlopen() returns, and destructor routines are executed before > dlclose() returns. You could add some trace output to stderr (better not via Qt code ;)) in the library destructor and see if that confirms your hypothesis (let it print the number of plugin and style instances that are still open, for instance). Other things to do (as soon as you can reproduce the crash) would be to - test with other styles - if that stops the crash QtCurve must be doing something wrong - test without the plasma integration plugin > 5.8.0 ATM Same here. > Sure, I'll be certainly fine with that. Though given many (all?) of the > backtrace points to `QDBusConnection::sessionBus` I don't think that'll > solve the problem... I have never seen a backtrace of the kind of crash you are seeing, but I think that with my current implementation you should no longer be seeing it. As long as all plugin instances are deleted all Style instances will be deleted too, and they'll disconnect from DBus during that operation. In addition they'll disconnect when the application instance is taken down. This way nothing allocated/created through QtCurve should remain in memory that can wreak havoc when the library is unloaded (something your implementation couldn't guarantee). Not all of Qt is "fool proof", the code cannot possibly check for everything that goes wrong. In this case we are apparently using QDBusConnection::sessionBus() in a context where we shouldn't be calling it. It might be worthwhile to file a Qt bug report; that might be the best way to get an assessment whether or not we're doing something in a way we shouldn't be doing it. But I'd wait a bit; theoretically my current patch should lead to near-identical situations when delete'ing Style instances from the plugin dtor - and it'll be somewhat easier to describe. Created attachment 105217 [details]
prevent DBus-disconnection crashing
This version adds a library ctor and dtors that allow to trace dlopen and dlclose events when in debug mode.
Yes the latest version looks fine though I'd like to test it for some time. > but if I understand the DBus protocol well enough they'd have to send exactly the "right" signals (one of those QtCurve subscribes to) and exactly at the "right" moment. That seems a bit far-fetched. It's more complicated than that. IIRC the segfault happens when the qdbus connection walks some internal data structures for dispatch, which touches some unmapped memory (vtables and alike). > the overall complexity of KDE and Qt is becoming so big that it's not always possible to do certain things like proper clean up because of race conditions. I hope that's not the case but maybe it is.... > That's probably why Qt no longer unloads (all) plugins explicitly. Except apparently the style plugin, or that too is handled by some automagic mechanism. Hmmm. That's interesting because I've never seen anything being unloaded before 5.8 (or 5.7?). > Presuming that each dlopen adds to a counter that prevents unmapping and each dlcose decreases that counter you could do it in the Style ctor and dtor. Yes. If I can find the right function... > But are you sure the unmapping was performed by Qt? I've never looked into this aspect of the code but I would expect them at most to dlclose libraries and plugins. That should of course also invalidate the image of the library to some degree but not necessarily unload everything. What Qt does is of course `dlclose`, the unmmap I've seen happens in the dlclose call. > Maybe your kernel is configured to do very strict cleanup, or RAM is so tight that it tends to unload resources that are no longer needed immediately? I was hopping this to be a glibc thing but those low level detail is beyong me. > But: it'd be a severe bug if Qt somehow unloaded or even dlclosed a plugin before having destroyed all class instances created through it. It's clearly doing the dlclose before they destructed the Style* > I've never seen an example showing that code must delete Style instances itself, so I think the application instance takes ownership. It almost feels like that Qt should delete all QObjects before deleting the style plugin but that also seems like a terrible idea.... > If that also means that the file is completely removed from memory will probably depend on circumstances, but should be a moot question. Right, my confusing is just that why it sometimes (well, previously) unmmap and sometimes not.... >> __attribute__((destructor)) function attributes. See the gcc info pages for information on these. Con‐ This I'm acutally not worrying too much. I still have a faith that the plugin destructor will be closed before dlclose (confirmed in the debugger) so using it as the last resort should still be ok. > You could add some trace output to stderr (better not via Qt code ;)) in the library destructor and see if that confirms your hypothesis (let it print the number of plugin and style instances that are still open, for instance). I have confirmed in the debugger that the single dlopen on qtcurve have been dlclosed. I might get a glibc with debug simple and check internal logic later... > Other things to do (as soon as you can reproduce the crash) would be to Let's hope I don't and the patch just work......................... > I have never seen a backtrace of the kind of crash you are seeing, but I think that with my current implementation you should no longer be seeing it. I actually mean the stack trace in this bug report, not mine. For that I agree that adding a safe guard that's called earlier (i.e. abouttoquit) should work. (In reply to Yichao Yu from comment #52) > Yes the latest version looks fine though I'd like to test it for some time. By all means :) > It's more complicated than that. IIRC the segfault happens when the qdbus > connection walks some internal data structures for dispatch, which touches > some unmapped memory (vtables and alike). It would be useful to know which signal(s) is/are concerned? > Hmmm. That's interesting because I've never seen anything being unloaded > before 5.8 (or 5.7?). I think plugins were unloaded in 5.4 and earlier, and then in 5.5 they dropped unloading of at least certain plugins which started to cause trouble. That may even have been related to DBus, I can't remember and am not too keen on digging out the exchange with Thiago to see if my memory is correct :) > > But: it'd be a severe bug if Qt somehow unloaded or even dlclosed a plugin before having destroyed all class instances created through it. > > It's clearly doing the dlclose before they destructed the Style* Erm, yes, I forgot about that. This however should be just before the application exits, you've had a lot of chance that you have been able to get DBus signals to arrive in that interval :) > It almost feels like that Qt should delete all QObjects before deleting the > style plugin but that also seems like a terrible idea.... I notice that that Qt's Fusion style uses an undocumented QCommonStyle ctor, accepting an argument. I should have a look if QStylePlugin::Create() couldn't do `new Style(this)` and benefit from automatic Style instance deletion when the plugin is unloaded. > Right, my confusing is just that why it sometimes (well, previously) unmmap > and sometimes not.... What can also be the case is that enough other things have changed that whatever access is made to deallocated memory no longer causes a SEGV. You know, like a proper Heisenbug. Or you simply had some other kind of stability going on at the time. How reproducible was that bug over time, after a reboot etc? > This I'm acutally not worrying too much. I still have a faith that the > plugin destructor will be closed before dlclose (confirmed in the debugger) > so using it as the last resort should still be ok. Yes, of course. The library destructor idea is only for debugging to know exactly when unloading is going to take place (possible set a breakpoint) etc. I never planned to use it for deallocating memory; that seems like a bad idea. > > I have never seen a backtrace of the kind of crash you are seeing, but I think that with my current implementation you should no longer be seeing it. Sure, but the other backtrace is interesting too. :) Created attachment 105236 [details]
prevent DBus-disconnection crashing
cleaned-up version
Works for me so far. Haven't got a chance to figure out what's causing the (absence of) mmap yet... I assume you'll submit a pull request with this? Can do - if you mean a review request on Phabricator. The patch as is or do you already have some feedback that I can incorporate at once? BTW, the Qt guys consider that everyone should be building their own Qt with the patches below if their distribution doesn't incorporate them. Evidently that's not a reason NOT to apply our own patch. 5.6: https://codereview.qt-project.org/157488 & https://codereview.qt-project.org/161056 5.8: https://codereview.qt-project.org/180231 & https://codereview.qt-project.org/180232 With the patch from comment 54 applied, it crashes with Qt 5.9.0 beta 3 with the following stacktrace: #6 0x00007f42b7f24981 in QInternal::unregisterCallback(QInternal::Callback, bool (*)(void**)) () from /usr/lib64/libQt5Core.so.5 #7 0x00007f42a73bbdde in QtCurve::StylePlugin::~StylePlugin() () from /usr/lib64/qt5/plugins/styles/qtcurve.so #8 0x00007f42a73bbe89 in QtCurve::StylePlugin::~StylePlugin() () from /usr/lib64/qt5/plugins/styles/qtcurve.so #9 0x00007f42b80bef11 in QLibraryPrivate::unload(QLibraryPrivate::UnloadFlag) () from /usr/lib64/libQt5Core.so.5 #10 0x00007f42b80b5b32 in QFactoryLoaderPrivate::~QFactoryLoaderPrivate() () from /usr/lib64/libQt5Core.so.5 #11 0x00007f42b80b5c59 in QFactoryLoaderPrivate::~QFactoryLoaderPrivate() () from /usr/lib64/libQt5Core.so.5 #12 0x00007f42b80f57c7 in QObject::~QObject() () from /usr/lib64/libQt5Core.so.5 #13 0x00007f42b80b5001 in QFactoryLoader::~QFactoryLoader() () from /usr/lib64/libQt5Core.so.5 #14 0x00007f42b896d4b9 in (anonymous namespace)::Q_QGS_loader::innerFunction()::Holder::~Holder() () from /usr/lib64/libQt5Widgets.so.5 #15 0x00007f42b72249e0 in __run_exit_handlers () from /lib64/libc.so.6 #16 0x00007f42b7224a3a in exit () from /lib64/libc.so.6 #17 0x00007f42b7fe04b4 in QCommandLineParser::showHelp(int) () from /usr/lib64/libQt5Core.so.5 #18 0x000000000040f9b1 in main () Quick reaction: are you using the latest version in git - if not, could you please pull the latest version and rebuild with debug info so I can see how you get into `unregisterCallback()`? This is running `foo --help`? Latest version of QtCurve? Yes. Thread 1 "kdialog" received signal SIGSEGV, Segmentation fault. 0x00007ffff235c981 in QInternal::unregisterCallback(QInternal::Callback, bool (*)(void**)) () from /usr/lib64/libQt5Core.so.5 (gdb) bt #0 0x00007ffff235c981 in QInternal::unregisterCallback(QInternal::Callback, bool (*)(void**)) () from /usr/lib64/libQt5Core.so.5 #1 0x00007fffe17ce489 in QtCurve::StylePlugin::~StylePlugin (this=0x680500, __in_chrg=<optimized out>) at /usr/src/debug/x11-themes/qtcurve-9999/qtcurve-9999/qt5/style/qtcurve_plugin.cpp:153 #2 0x00007fffe17ce4e4 in QtCurve::StylePlugin::~StylePlugin (this=0x680500, __in_chrg=<optimized out>) at /usr/src/debug/x11-themes/qtcurve-9999/qtcurve-9999/qt5/style/qtcurve_plugin.cpp:159 #3 0x00007ffff24f6f11 in QLibraryPrivate::unload(QLibraryPrivate::UnloadFlag) () from /usr/lib64/libQt5Core.so.5 #4 0x00007ffff24edb32 in QFactoryLoaderPrivate::~QFactoryLoaderPrivate() () from /usr/lib64/libQt5Core.so.5 #5 0x00007ffff24edc59 in QFactoryLoaderPrivate::~QFactoryLoaderPrivate() () from /usr/lib64/libQt5Core.so.5 #6 0x00007ffff252d7c7 in QObject::~QObject() () from /usr/lib64/libQt5Core.so.5 #7 0x00007ffff24ed001 in QFactoryLoader::~QFactoryLoader() () from /usr/lib64/libQt5Core.so.5 #8 0x00007ffff2da54b9 in (anonymous namespace)::Q_QGS_loader::innerFunction()::Holder::~Holder() () from /usr/lib64/libQt5Widgets.so.5 #9 0x00007ffff165c9e0 in __run_exit_handlers () from /lib64/libc.so.6 #10 0x00007ffff165ca3a in exit () from /lib64/libc.so.6 #11 0x00007ffff24184b4 in QCommandLineParser::showHelp(int) () from /usr/lib64/libQt5Core.so.5 #12 0x000000000040f9b1 in main () This is from running `gdb kdialog` BTW, https://codereview.qt-project.org/180231 & https://codereview.qt-project.org/180232 are applied too. That looks like it should be unrelated at least to the purpose of my patch. There is a tiny possibility though that the event callback is called during destruction, and that it obtains a stale pointer to the object's style. Could you try moving the if with the unregisterCallback() call to immediately after the qtcInfo call and report back, please? Moved, but the stack trace is the same: #6 0x00007f9b4acea981 in QInternal::unregisterCallback(QInternal::Callback, bool (*)(void**)) () from /usr/lib64/libQt5Core.so.5 #7 0x00007f9b3a160364 in QtCurve::StylePlugin::~StylePlugin (this=0x21efe50, __in_chrg=<optimized out>) at /usr/src/debug/x11-themes/qtcurve-9999/qtcurve-9999/qt5/style/qtcurve_plugin.cpp:144 #8 0x00007f9b3a1604e4 in QtCurve::StylePlugin::~StylePlugin (this=0x21efe50, __in_chrg=<optimized out>) at /usr/src/debug/x11-themes/qtcurve-9999/qtcurve-9999/qt5/style/qtcurve_plugin.cpp:159 #9 0x00007f9b4ae84f11 in QLibraryPrivate::unload(QLibraryPrivate::UnloadFlag) () from /usr/lib64/libQt5Core.so.5 #10 0x00007f9b4ae7bb32 in QFactoryLoaderPrivate::~QFactoryLoaderPrivate() () from /usr/lib64/libQt5Core.so.5 #11 0x00007f9b4ae7bc59 in QFactoryLoaderPrivate::~QFactoryLoaderPrivate() () from /usr/lib64/libQt5Core.so.5 #12 0x00007f9b4aebb7c7 in QObject::~QObject() () from /usr/lib64/libQt5Core.so.5 #13 0x00007f9b4ae7b001 in QFactoryLoader::~QFactoryLoader() () from /usr/lib64/libQt5Core.so.5 #14 0x00007f9b4b7334b9 in (anonymous namespace)::Q_QGS_loader::innerFunction()::Holder::~Holder() () from /usr/lib64/libQt5Widgets.so.5 #15 0x00007f9b49fea9e0 in __run_exit_handlers () from /lib64/libc.so.6 #16 0x00007f9b49feaa3a in exit () from /lib64/libc.so.6 #17 0x00007f9b4ada64b4 in QCommandLineParser::showHelp(int) () from /usr/lib64/libQt5Core.so.5 #18 0x000000000040f9b1 in main () And you don't get that crash without the patch? See https://phabricator.kde.org/D5702 for a version which already moved the unregisterCallback() call. (In reply to RJVB from comment #63) > And you don't get that crash without the patch? With 5.9 it crashes under any conditions, albeit with different stack traces. OK, so that sounds like a regression in Qt, something you ought to report on bugreports.qt.io . (In reply to Eugene Shalygin from comment #66) > https://bugreports.qt.io/browse/QTBUG-60558 Eugene, I know you already applied Thiago's Qt patch for this issue, but would you be willing to check if the current version of my QtCurve patch (https://phabricator.kde.org/D5702) also prevents the unregisterCallback crash? It'd make David's recently added protection redundant but one can never be too paranoid ^^ *** Bug 378787 has been marked as a duplicate of this bug. *** (In reply to RJVB from comment #67) > Eugene, I know you already applied Thiago's Qt patch for this issue, but > would you be willing to check if the current version of my QtCurve patch > (https://phabricator.kde.org/D5702) also prevents the unregisterCallback > crash? > Current master does not crash without the patch. Thanks :) *** Bug 380510 has been marked as a duplicate of this bug. *** *** Bug 382182 has been marked as a duplicate of this bug. *** > *** Bug 382182 has been marked as a duplicate of this bug. ***
As I said on that report:
The problem is that this is a confirmed and *longstanding* issue in Qt. There has been movement on it recently and IIRC a patch was pushed through and in. Someone ought to test with Qt 5.9.1 to confirm.
Once we know what Qt version has the fix we really ought to commit the patch with our own fix modified to apply only to the unfixed versions. QtCurve ought to work at least with 5.6LTS too without patching required.
>Someone ought to test with Qt 5.9.1 to confirm.
I have 5.9 slightly after .1 and don't have a crash when following the instructions on 382182
|