Bug 450325

Summary:	Ark crashes in KUiServerV2JobTracker::registerJob() when turning on second monitor
Product:	[Frameworks and Libraries] frameworks-kjobwidgets	Reporter:	hbr <nepnep>
Component:	general	Assignee:	kdelibs bugs <kdelibs-bugs-null>
Status:	RESOLVED FIXED
Severity:	crash	CC:	aacid, andrea, christoph, dabiswas112, hsushipei1, jdanek, justin.zobel, kde, mpyne, nate, rthomsen6, toralf.foerster, torokati44
Priority:	HI	Keywords:	drkonqi, wayland-only
Version First Reported In:	5.91.0
Target Milestone:	---
Platform:	unspecified
OS:	Linux
Latest Commit:	https://invent.kde.org/frameworks/kjobwidgets/commit/5aeba3f01ef8cdf723813cacdd29945328288663	Version Fixed/Implemented In:	5.97
Sentry Crash Report:
Attachments:	kcrash-metadata kded5 kcrash-metadata plasmashell kcrash-metadata org_kde_powerdevil

Description hbr 2022-02-15 17:39:04 UTC

Application: ark (21.12.2)

Qt Version: 5.15.2
Frameworks Version: 5.91.0
Operating System: Linux 5.16.9-arch1-1 x86_64
Windowing System: Wayland
Distribution: EndeavourOS
DrKonqi: 5.24.0 [KCrashBackend]

-- Information about the crash:
- What I was doing when the application crashed:

Was extracting a file with Ark in a Wayland session, turned on the second monitor during extraction, Ark crashed the instant the second monitor was turning on.

No crashes like that on X11 or on Wayland when the second monitor was off / when it was turned on already prior to extracting files.

The reporter is unsure if this crash is reproducible.

-- Backtrace:
Application: Ark (ark), signal: Segmentation fault
Content of s_kcrashErrorMessage: {_M_t = {<std::__uniq_ptr_impl<char, std::default_delete<char []> >> = {_M_t = std::tuple containing = {[1] = 0x0, [2] = {<No data fields>}}}, <No data fields>}}
[KCrash Handler]
#6  0x00007f4736262633 in QObject::property (this=this@entry=0x555e081a3cb0, name=name@entry=0x7f47359daeb1 "desktopFileName") at kernel/qobject.cpp:4086
#7  0x00007f47359d47b5 in KUiServerV2JobTracker::registerJob (this=0x555e082d4200, job=<optimized out>) at /usr/src/debug/kjobwidgets-5.91.0/src/kuiserverv2jobtracker.cpp:187
#8  0x00007f47359d0f61 in operator() (__closure=0x555e08210640) at /usr/src/debug/kjobwidgets-5.91.0/src/kuiserverv2jobtracker.cpp:215
#9  QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, KUiServerV2JobTracker::registerJob(KJob*)::<lambda()> >::call (arg=<optimized out>, f=...) at /usr/include/qt/QtCore/qobjectdefs_impl.h:146
#10 QtPrivate::Functor<KUiServerV2JobTracker::registerJob(KJob*)::<lambda()>, 0>::call<QtPrivate::List<>, void> (arg=<optimized out>, f=...) at /usr/include/qt/QtCore/qobjectdefs_impl.h:256
#11 QtPrivate::QFunctorSlotObject<KUiServerV2JobTracker::registerJob(KJob*)::<lambda()>, 0, QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void **, bool *) (which=<optimized out>, this_=0x555e08210630, r=<optimized out>, a=<optimized out>, ret=<optimized out>) at /usr/include/qt/QtCore/qobjectdefs_impl.h:443
#12 0x00007f4736263d93 in QtPrivate::QSlotObjectBase::call (a=<optimized out>, r=<optimized out>, this=<optimized out>, this=<optimized out>, r=<optimized out>, a=<optimized out>) at ../../include/QtCore/../../src/corelib/kernel/qobjectdefs_impl.h:398
#13 doActivate<false> (sender=0x7f47359e3060 <_ZZN12_GLOBAL__N_117Q_QGS_serverProxy13innerFunctionEvE6holder.lto_priv.1>, signal_index=3, argv=0x7ffe9197f4c0) at kernel/qobject.cpp:3886
#14 0x00007f4736263d93 in QtPrivate::QSlotObjectBase::call (a=<optimized out>, r=<optimized out>, this=<optimized out>, this=<optimized out>, r=<optimized out>, a=<optimized out>) at ../../include/QtCore/../../src/corelib/kernel/qobjectdefs_impl.h:398
#15 doActivate<false> (sender=0x555e0849d340, signal_index=5, argv=0x7ffe9197f5e0) at kernel/qobject.cpp:3886
#16 0x00007f47356108b4 in QDBusServiceWatcher::serviceOwnerChanged (this=this@entry=0x555e0849d340, _t1=..., _t2=..., _t3=...) at .moc/moc_qdbusservicewatcher.cpp:242
#17 0x00007f4735614b73 in QDBusServiceWatcherPrivate::_q_serviceOwnerChanged (this=<optimized out>, this=<optimized out>, newOwner=..., oldOwner=..., service=...) at /usr/src/debug/qtbase/src/dbus/qdbusservicewatcher.cpp:76
#18 QDBusServiceWatcher::qt_static_metacall (_o=_o@entry=0x555e0849d340, _c=_c@entry=QMetaObject::InvokeMetaMethod, _id=_id@entry=3, _a=_a@entry=0x7ffe9197f760) at .moc/moc_qdbusservicewatcher.cpp:116
#19 0x00007f4735614e53 in QDBusServiceWatcher::qt_metacall (this=0x555e0849d340, _c=QMetaObject::InvokeMetaMethod, _id=3, _a=0x7ffe9197f760) at .moc/moc_qdbusservicewatcher.cpp:197
#20 0x00007f473561bfae in QDBusConnectionPrivate::deliverCall(QObject*, int, QDBusMessage const&, QVector<int> const&, int) [clone .constprop.0] (this=<optimized out>, object=<optimized out>, msg=..., metaTypes=..., slotIdx=<optimized out>) at /usr/src/debug/qtbase/src/dbus/qdbusintegrator.cpp:1001
#21 0x00007f4736256e76 in QObject::event (this=<optimized out>, e=0x555e0862d3c0) at kernel/qobject.cpp:1314
#22 0x00007f4736ca51a6 in QApplicationPrivate::notify_helper (this=<optimized out>, receiver=0x555e0849d340, e=0x555e0862d3c0) at kernel/qapplication.cpp:3632
#23 0x00007f473623316a in QCoreApplication::notifyInternal2 (receiver=0x555e0849d340, event=0x555e0862d3c0) at kernel/qcoreapplication.cpp:1064
#24 0x00007f4736233c69 in QCoreApplicationPrivate::sendPostedEvents (receiver=0x0, event_type=0, data=0x555e07e38ec0) at kernel/qcoreapplication.cpp:1821
#25 0x00007f473627a548 in postEventSourceDispatch (s=0x555e07e5c850) at kernel/qeventdispatcher_glib.cpp:277
#26 0x00007f47347eff13 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#27 0x00007f47348460d9 in ?? () from /usr/lib/libglib-2.0.so.0
#28 0x00007f47347ed485 in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
#29 0x00007f473627e44a in QEventDispatcherGlib::processEvents (this=0x555e07e16c00, flags=...) at kernel/qeventdispatcher_glib.cpp:423
#30 0x00007f473622b44b in QEventLoop::exec (this=0x7ffe9197fc40, flags=...) at ../../include/QtCore/../../src/corelib/global/qflags.h:69
#31 0x00007f4736236b97 in QCoreApplication::exec () at ../../include/QtCore/../../src/corelib/global/qflags.h:121
#32 0x00007f47365fd162 in QGuiApplication::exec () at kernel/qguiapplication.cpp:1867
#33 0x00007f4736ca352a in QApplication::exec () at kernel/qapplication.cpp:2824
#34 0x0000555e072dc55f in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/ark-21.12.2/app/main.cpp:337
[Inferior 1 (process 17470) detached]

Possible duplicates by query: bug 433546, bug 430677, bug 430657, bug 410092, bug 401450.

Reported using DrKonqi

Comment 1 hbr 2022-02-20 08:44:27 UTC

Turns out it is not just Ark crashing when I do this but the entire plasmashell. It is reproducable and happens every time.

This did not happen with KDE Plasma 5.23.X on Wayland and does not happen with KDE Plasma 5.24.X on X11.

However I don't seem to be able to get stack traces since drkonqi / the crash reporter also crashes shortly after / gets closed when plasmashell gets restarted.

Is there any way I can trigger the stack trace collection manually after plasmashell has been retarted?

Comment 2 hbr 2022-02-22 18:10:20 UTC

Created attachment 147055 [details]
kcrash-metadata kded5

Comment 3 hbr 2022-02-22 18:11:02 UTC

Created attachment 147056 [details]
kcrash-metadata plasmashell

Comment 4 hbr 2022-02-22 18:11:33 UTC

Created attachment 147057 [details]
kcrash-metadata org_kde_powerdevil

Comment 5 hbr 2022-02-22 18:16:07 UTC

Tried again today with Plasma 5.24.2 and it still happens.

However drkonqi did not start so I could not create a trace. I only got the ini files for kcrash from "~/.cache" (attached). No idea if they are of any news.

Updated to Plasma 5.24.2 -> rebooted -> logged into the Wayland session -> turned on the second monitor -> plasmashell crashed and it took about 30 seconds until it was restarted -> drkonqi did not show up

Comment 6 Nate Graham 2022-03-22 00:24:00 UTC

#7  0x00007f47359d47b5 in KUiServerV2JobTracker::registerJob (this=0x555e082d4200, job=<optimized out>) at /usr/src/debug/kjobwidgets-5.91.0/src/kuiserverv2jobtracker.cpp:187

Can you file a new bug report for the Plasma crash, and attach a backtrace of Plasma crashing? Thanks!

Comment 7 hbr 2022-03-22 07:44:43 UTC

I have since switched some hardware and am now running Fedora 36 pre-release. The crash still happens there but in a much more graceful way (e.g. the restart of Plasma Shell is not delayed by about 30s + but pretty much happens instantly).

I will check and report again once Fedora 36 is properly released, so this can be closed.

Comment 8 Kai Uwe Broulik 2022-03-22 20:58:39 UTC

I unfortunately cannot reproduce the Ark crash when restarting plasma while extracting a file.

The crash could occur if Plasma restarts, thus our jobtracker re-registers the job, but the job is already gone or something. But I failed to reproduce it by extracting a file or manually using kjobcreator.

Comment 9 Christoph Cullmann 2022-06-16 15:23:50 UTC

*** Bug 452808 has been marked as a duplicate of this bug. ***

Comment 10 Christoph Cullmann 2022-06-16 15:51:16 UTC

*** Bug 451474 has been marked as a duplicate of this bug. ***

Comment 11 Christoph Cullmann 2022-06-16 15:59:41 UTC

*** Bug 451855 has been marked as a duplicate of this bug. ***

Comment 12 Christoph Cullmann 2022-06-16 16:09:08 UTC

*** Bug 452167 has been marked as a duplicate of this bug. ***

Comment 13 Christoph Cullmann 2022-06-16 16:09:13 UTC

*** Bug 453377 has been marked as a duplicate of this bug. ***

Comment 14 Christoph Cullmann 2022-06-16 16:09:44 UTC

I collected all other bugs with the same backtrace now here.

Comment 15 Christoph Cullmann 2022-07-02 19:37:22 UTC

*** Bug 456175 has been marked as a duplicate of this bug. ***

Comment 16 Christoph Cullmann 2022-07-02 19:37:27 UTC

*** Bug 455696 has been marked as a duplicate of this bug. ***

Comment 17 Christoph Cullmann 2022-07-02 19:37:32 UTC

*** Bug 455658 has been marked as a duplicate of this bug. ***

Comment 18 Christoph Cullmann 2022-07-02 19:41:39 UTC

Seems this crash is not that uncommon :(

Comment 19 Michael Pyne 2022-07-02 20:37:53 UTC

I suspect the issue is in using a deleted KJob when re-registering a job view (kuiserverv2jobtracker.cpp:197)

   // Watch the server registering/unregistering and re-register the jobs as needed
    if (!d->serverRegisteredConnection) {
        d->serverRegisteredConnection = connect(serverProxy(), &KSharedUiServerV2Proxy::serverRegistered, this, [this]() {
            const auto staleViews = d->jobViews;
            // elided
            for (auto it = staleViews.begin(), end = staleViews.end(); it != end; ++it) {
                KJob *job = it.key();
                const JobView &view = it.value();
                // elided
                registerJob(job);

In the section of code right after this in registerJob, the code is careful to ensure that a QPointer<KJob> is used in order to verify that the KJob being registered won't have been deleted in the functor to be added to the event loop by QTimer (kuiserverv2jobtracker.cpp:248):

       QPointer<KJob> jobGuard = job;

        QTimer *delayTimer = new QTimer();
        delayTimer->setSingleShot(true);
        connect(delayTimer, &QTimer::timeout, this, [this, job, jobGuard, desktopEntry] {
            // elided
        });

This part of the code also references the d->jobViews the re-registration code uses earlier, so if it were possible to assume that a KJob* held in d->jobViews was enough to guarantee the KJob were still valid, the later delay timer could have done the same thing.  It would take more code to do this so that's not airtight evidence, but I can't find anywhere else that guarantees a KJob ending also removes its JobView (the re-registration code as much as calls it a "stale" job view...) so I suspect this is the cause of this crash.

I'll put together a patch that uses the same QPointer<KJob> jobGuard trick and open an MR.

Comment 20 Christoph Cullmann 2022-07-02 20:45:09 UTC

Sounds like a reasonable explanation.

Comment 21 Michael Pyne 2022-07-02 21:11:43 UTC

(In reply to Christoph Cullmann from comment #20)
> Sounds like a reasonable explanation.

well, maybe. I did find the code meant to update the jobView structure when the KJob is finished, but it is a) called indirectly (from KUiServerV2JobTracker::unregisterJob), and b) has a weird special case to handle terminated KJobs that don't have an existing view (presumably because the signal that would create it being later in the event loop):

        // Remember that the job finished in the meantime and
        // terminate the JobView once it arrives
        d->scheduleUpdate(job, QStringLiteral("terminated"), true);
        if (job->error()) {
            d->scheduleUpdate(job, QStringLiteral("errorCode"), static_cast<uint>(job->error()));
            d->scheduleUpdate(job, QStringLiteral("errorMessage"),  job->errorText());
        }
 
But scheduleUpdate() simply recreates the jobView mapping to the soon-to-be dead KJob and if the re-registration happens before the "terminate the JobView once it arrives" occurs we're in for a bad time, as the re-registration code assumes all entries in the map of jobs to jobViews have valid KJobs.

I'll take a look at how best to handle the weird lifetimes here and try to open the MR tonight.

Comment 22 Michael Pyne 2022-07-02 22:17:08 UTC

Git commit b8752085d2a480dfc93d2d422c463a59d46af5ee by Michael Pyne.
Committed on 02/07/2022 at 22:13.
Pushed by mpyne into branch 'work-bug-450325-fix-crash'.

ui-server: Fix crash by only re-registering live KJobs.

This addresses a frequently-reported crash in the job tracker for
KUiServerV2 that occurs when attempting to re-register new job views for
active KJobs after a new UI server comes online.

Although I have not been able to reproduce the crash myself, (by
attempting to use both long-lived and short-lived file transfers from
Dolphin and restarting plasmashell), inspection of the code shows that
it is possible for there to be deleted KJobs pointing to JobView objects
during some portions of the job tracker's lifetime.

The current code deals with this in situations including DBus calls to
create a U/I view for a KJob (the KJob may terminate before the DBus
reply is received) and even a short delay that can be optionally
introduced (the KJob may terminate before the delay elapses). A
QPointer<KJob> is used as a guard in these situations, but there is no
similar guard for the re-registration code.

In this case we cannot use QPointer<KJob> to guard the job's lifetime
because the KJob must be alive when the QPointer<KJob> is created, and
this crash occurs when the KJob is terminated. However the KJob's
destruction should lead to the unregisterJob() function being called,
which handles removing the terminated KJob from the map of job views
with only one exception, where instead the job view for the KJob has its
"terminated" pending status set.

So the fix here checks for the "terminated" state in the same way as
performed in requestView(), and if the KJob is terminated, handles
requesting the job view to terminate the U/I and finally removing the
terminated KJob from the map of job views.

By doing this, we avoid passing a deleted KJob to the registerJob()
function, which will attempt to dereference it and crash the
application.

M +16 -4 src/kuiserverv2jobtracker.cpp

https://invent.kde.org/frameworks/kjobwidgets/commit/b8752085d2a480dfc93d2d422c463a59d46af5ee

Comment 23 Michael Pyne 2022-07-02 22:18:04 UTC

I thought that a MR needed to be merged to master to have its BUG: close a bug, reopening until a fix is merged.

Comment 24 Bug Janitor Service 2022-07-02 22:18:44 UTC

A possibly relevant merge request was started @ https://invent.kde.org/frameworks/kjobwidgets/-/merge_requests/22

Comment 25 Christoph Cullmann 2022-07-02 22:22:06 UTC

I think the branch must be names work/.... to avoid the hooks.

Comment 26 Nate Graham 2022-07-09 17:56:54 UTC

That's correct, yeah. It's also requires for other people to be able to rebase your MRs.

Comment 27 Michael Pyne 2022-07-09 18:43:42 UTC

Git commit 5aeba3f01ef8cdf723813cacdd29945328288663 by Michael Pyne.
Committed on 09/07/2022 at 18:42.
Pushed by mpyne into branch 'master'.

ui-server: Fix crash by only re-registering live KJobs.

This addresses a frequently-reported crash in the job tracker for
KUiServerV2 that occurs when attempting to re-register new job views for
active KJobs after a new UI server comes online.

By doing this, we avoid passing a deleted KJob to the registerJob()
function, which will attempt to dereference it and crash the
application.