Bug 163171 - crash in KIO scheduler searchIdleList
Summary: crash in KIO scheduler searchIdleList
Status: RESOLVED FIXED
Alias: None
Product: kio
Classification: Unmaintained
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR crash
Target Milestone: ---
Assignee: kdelibs bugs
URL:
Keywords:
: 164364 166435 167077 168827 169105 170115 170288 170864 170869 173468 190120 199183 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-06-03 23:06 UTC by Rosetzky Cedric
Modified: 2011-05-31 15:41 UTC (History)
16 users (show)

See Also:
Latest Commit:
Version Fixed In: 4.5.0
Sentry Crash Report:


Attachments
Log of the crash with added debug info, showing a delete'd job being started. (79.29 KB, text/plain)
2008-10-19 22:13 UTC, Simon St James
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rosetzky Cedric 2008-06-03 23:06:39 UTC
Version:            (using Devel)
Installed from:    Compiled sources
Compiler:          GCC-4.2.4 
OS:                Linux

Dolphin crashed while I was in a pictures folder. It was a great folder (more than 2000 pictures).

Backtrace:

Application: Dolphin (dolphin), signal SIGSEGV
Using host libthread_db library "/lib/i686/cmov/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
[New Thread 0xb5ed2720 (LWP 5241)]
[New Thread 0xb3faeb90 (LWP 5257)]
[KCrash handler]
#6  0xb7dce412 in QUrl::host () from /usr/lib/libQtCore.so.4
#7  0xb76a2b3d in searchIdleList (idleSlaves=@0x82171cc, url=@0x94cc030, 
    protocol=@0xbfa3bd3c, exact=@0xbfa3bdf3)
    at /home/loacoon/Documents/Downloads/SVN/KDE4/KDE/kdelibs/kio/kio/scheduler.cpp:608
#8  0xb76a330f in KIO::SchedulerPrivate::findIdleSlave (this=0x8217178, 
    job=0x941b2c0, exact=@0xbfa3bdf3)
    at /home/loacoon/Documents/Downloads/SVN/KDE4/KDE/kdelibs/kio/kio/scheduler.cpp:682
#9  0xb76a5560 in KIO::SchedulerPrivate::startJobDirect (this=0x8217178)
    at /home/loacoon/Documents/Downloads/SVN/KDE4/KDE/kdelibs/kio/kio/scheduler.cpp:587
#10 0xb76a5647 in KIO::SchedulerPrivate::startStep (this=0x8217178)
    at /home/loacoon/Documents/Downloads/SVN/KDE4/KDE/kdelibs/kio/kio/scheduler.cpp:432
#11 0xb76a5870 in KIO::Scheduler::qt_metacall (this=0x8294598, 
    _c=QMetaObject::InvokeMetaMethod, _id=6, _a=0xbfa3bee8)
    at /home/loacoon/Documents/Downloads/SVN/KDE4/build/kdelibs/kio/scheduler.moc:101
#12 0xb7e245d9 in QMetaObject::activate () from /usr/lib/libQtCore.so.4
#13 0xb7e24ca2 in QMetaObject::activate () from /usr/lib/libQtCore.so.4
#14 0xb7e61ea7 in QTimer::timeout () from /usr/lib/libQtCore.so.4
#15 0xb7e2b75e in QTimer::timerEvent () from /usr/lib/libQtCore.so.4
#16 0xb7e1fd7a in QObject::event () from /usr/lib/libQtCore.so.4
#17 0xb6892bac in QApplicationPrivate::notify_helper ()
   from /usr/lib/libQtGui.so.4
#18 0xb6897739 in QApplication::notify () from /usr/lib/libQtGui.so.4
#19 0xb794c7f7 in KApplication::notify (this=0xbfa3c630, receiver=0x821717c, 
    event=0xbfa3c3b8)
    at /home/loacoon/Documents/Downloads/SVN/KDE4/KDE/kdelibs/kdeui/kernel/kapplication.cpp:311
#20 0xb7e0f899 in QCoreApplication::notifyInternal ()
   from /usr/lib/libQtCore.so.4
#21 0xb7e3d281 in ?? () from /usr/lib/libQtCore.so.4
#22 0xb7e3ab20 in ?? () from /usr/lib/libQtCore.so.4
#23 0xb6214978 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#24 0xb6217bce in ?? () from /usr/lib/libglib-2.0.so.0
#25 0x080bca98 in ?? ()
#26 0x00000000 in ?? ()
#0  0xffffe410 in __kernel_vsyscall ()
Comment 1 Christophe Marin 2008-06-18 10:22:50 UTC
*** Bug 164364 has been marked as a duplicate of this bug. ***
Comment 2 Christophe Marin 2008-07-20 15:34:55 UTC
*** Bug 167077 has been marked as a duplicate of this bug. ***
Comment 3 Christophe Marin 2008-08-09 23:50:55 UTC
*** Bug 168827 has been marked as a duplicate of this bug. ***
Comment 4 Frank Reininghaus 2008-09-11 18:36:50 UTC
*** Bug 170864 has been marked as a duplicate of this bug. ***
Comment 5 Frank Reininghaus 2008-09-14 11:37:41 UTC
To help us reproduce the problem, it would be helpful if the bug reporters could provide the following information:

- which view mode (icons, details, column) did you use?
- has the preview been turned on?
- does the crash occur very rarely or is it always reproducible by going to a particular directory?

Thanks!
Comment 6 Frank Reininghaus 2008-09-14 19:15:01 UTC
*** Bug 169105 has been marked as a duplicate of this bug. ***
Comment 7 David Faure 2008-10-02 13:13:51 UTC
This is a bug in the KIO scheduler, so it is by nature difficult to reproduce exactly (depends on timing of kioslaves coming and going, etc.)

However it is quite likely that this is the same crash as bug 165540. Which I thought was fixed, but a comment says it's not.

It would be really helpful to know if anyone still gets this crash with the trunk version of KDE.
Comment 8 el_ca_pi_tan 2008-10-07 16:37:24 UTC
Yes, I still get this bug occasionally. View mode is "icons", preview has been turned on for pictures. Version (dolphin-kde4) is: 4:4.1.2-0ubuntu1~hardy1~ppa1 

Here's the recent bug report:

Anwendung: Dolphin (dolphin), Signal SIGSEGV
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread 0xb5ef5720 (LWP 7841)]
[KCrash handler]
#6  QUrl::host (this=0x60) at io/qurl.cpp:4253
#7  0xb7d93f81 in searchIdleList (idleSlaves=@0x828305c, url=@0x60, 
    protocol=@0xbfb9a9cc, exact=@0xbfb9aa6b)
    at /build/buildd/kde4libs-4.1.2/kio/kio/scheduler.cpp:609
#8  0xb7d9451e in KIO::SchedulerPrivate::findIdleSlave (this=0x8283008, 
    job=0x872fa38, exact=@0xbfb9aa6b)
    at /build/buildd/kde4libs-4.1.2/kio/kio/scheduler.cpp:683
#9  0xb7d96b5a in KIO::SchedulerPrivate::startJobDirect (this=0x8283008)
    at /build/buildd/kde4libs-4.1.2/kio/kio/scheduler.cpp:588
#10 0xb7d96c38 in KIO::SchedulerPrivate::startStep (this=0x8283008)
    at /build/buildd/kde4libs-4.1.2/kio/kio/scheduler.cpp:433
#11 0xb7d96df7 in KIO::Scheduler::qt_metacall (this=0x827cd70, 
    _c=QMetaObject::InvokeMetaMethod, _id=6, _a=0xbfb9ab48)
    at /build/buildd/kde4libs-4.1.2/obj-i486-linux-gnu/kio/scheduler.moc:101
#12 0xb765df79 in QMetaObject::activate (sender=0x828300c, 
    from_signal_index=4, to_signal_index=4, argv=0x0)
    at kernel/qobject.cpp:3016
#13 0xb765e642 in QMetaObject::activate (sender=0x828300c, m=0xb773dae4, 
    local_signal_index=0, argv=0x0) at kernel/qobject.cpp:3086
#14 0xb769b817 in QTimer::timeout (this=0x828300c)
    at .moc/release-shared/moc_qtimer.cpp:126
#15 0xb76650fe in QTimer::timerEvent (this=0x828300c, e=0xbfb9b038)
    at kernel/qtimer.cpp:263
#16 0xb76589fa in QObject::event (this=0x828300c, e=0xbfb9b038)
    at kernel/qobject.cpp:1105
#17 0xb6bc6f9c in QApplicationPrivate::notify_helper (this=0x80b8e80, 
    receiver=0x828300c, e=0xbfb9b038) at kernel/qapplication.cpp:3800
#18 0xb6bcbbf9 in QApplication::notify (this=0xbfb9b2ac, receiver=0x828300c, 
    e=0xbfb9b038) at kernel/qapplication.cpp:3392
#19 0xb7ade483 in KApplication::notify (this=0xbfb9b2ac, receiver=0x828300c, 
    event=0xbfb9b038)
    at /build/buildd/kde4libs-4.1.2/kdeui/kernel/kapplication.cpp:311
#20 0xb76490b9 in QCoreApplication::notifyInternal (this=0xbfb9b2ac, 
    receiver=0x828300c, event=0xbfb9b038) at kernel/qcoreapplication.cpp:591
#21 0xb7676c01 in QTimerInfoList::activateTimers (this=0x80bc1f4)
    at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:215
#22 0xb76744a0 in timerSourceDispatch (source=0x80bc1c0)
    at kernel/qeventdispatcher_glib.cpp:166
#23 0xb6293dd6 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#24 0xb6297193 in ?? () from /usr/lib/libglib-2.0.so.0
#25 0xb629774e in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
#26 0xb76749f8 in QEventDispatcherGlib::processEvents (this=0x80b40d0, 
    flags=@0xbfb9b198) at kernel/qeventdispatcher_glib.cpp:325
#27 0xb6c5aa25 in QGuiEventDispatcherGlib::processEvents (this=0x80b40d0, 
    flags=@0xbfb9b1c8) at kernel/qguieventdispatcher_glib.cpp:204
#28 0xb764833d in QEventLoop::processEvents (this=0xbfb9b240, 
    flags=@0xbfb9b204) at kernel/qeventloop.cpp:149
#29 0xb76484cd in QEventLoop::exec (this=0xbfb9b240, flags=@0xbfb9b248)
    at kernel/qeventloop.cpp:200
#30 0xb764a74d in QCoreApplication::exec () at kernel/qcoreapplication.cpp:849
#31 0xb6bc6897 in QApplication::exec () at kernel/qapplication.cpp:3330
#32 0x08080a89 in ?? ()
#33 0xb673a450 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#34 0x080619d1 in _start ()
#0  0xb7f0b410 in __kernel_vsyscall ()
Comment 9 David Faure 2008-10-10 10:28:41 UTC
*** Bug 170115 has been marked as a duplicate of this bug. ***
Comment 10 David Faure 2008-10-10 10:32:37 UTC
*** Bug 166435 has been marked as a duplicate of this bug. ***
Comment 11 David Faure 2008-10-10 10:35:01 UTC
A rather difficult crash to debug, given that it comes from KIO scheduling so
it must be related to timing of slaves going in and out, etc. I haven't seen
precise instructions on how to trigger it yet (because I guess there isn't
really a way to do that...)

I'm guessing that "job" is already deleted on the line
return searchIdleList(idleSlaves, job->url(), jobData.protocol, exact);
but I have no idea why and from where.

It would be really helpful if you (anyone who has some chance of triggering
this crash) could run the application in valgrind and attach the log (of the
part related to the crash) here.
Comment 12 auxsvr 2008-10-19 22:06:58 UTC
I was able to reproduce this 100% reliably when downloading a file. Here's the output of valgrind:

==9572== Memcheck, a memory error detector.
==9572== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==9572== Using LibVEX rev 1804, a library for dynamic binary translation.
==9572== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==9572== Using valgrind-3.3.0, a dynamic binary instrumentation framework.
==9572== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==9572== For more details, rerun with: -v
==9572== 
==9572== My PID = 9572, parent PID = 5004.  Prog and args are:
==9572==    kget
==9572==    --nofork
==9572== 
==9572== Invalid read of size 4
==9572==    at 0x4F607E6: Soprano::FilterModel::addStatement(Soprano::Statement const&) (filtermodel.cpp:92)
==9572==    by 0x4F48208: Soprano::Model::addStatements(QList<Soprano::Statement> const&) (model.cpp:135)
==9572==    by 0x5014D1C: Nepomuk::ResourceFilterModel::addStatements(QList<Soprano::Statement> const&) (resourcefiltermodel.cpp:245)
==9572==    by 0x500BED3: Nepomuk::ResourceData::store() (resourcedata.cpp:285)
==9572==    by 0x500C4BA: Nepomuk::ResourceData::setProperty(QUrl const&, Nepomuk::Variant const&) (resourcedata.cpp:370)
==9572==    by 0x5027A81: Nepomuk::Resource::setProperty(QUrl const&, Nepomuk::Variant const&) (resource.cpp:227)
==9572==    by 0x43C2CA1: NepomukHandler::saveFileProperties(Nepomuk::Resource const&) (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x43C2C37: NepomukHandler::saveFileProperties() (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x43B1331: Transfer::setStatus(Job::Status, QString const&, QPixmap const&) (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x43B24BE: Transfer::load(QDomElement const&) (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x43B2CA0: Transfer::Transfer(TransferGroup*, TransferFactory*, Scheduler*, KUrl const&, KUrl const&, QDomElement const*) (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x76F095A: (within /usr/lib/kde4/kget_kiofactory.so)
==9572==  Address 0x694e878 is 0 bytes inside a block of size 24 free'd
==9572==    at 0x402371A: operator delete(void*) (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==9572==    by 0x500F722: Nepomuk::MainModel::~MainModel() (nepomukmainmodel.cpp:173)
==9572==    by 0x500E667: Nepomuk::ResourceManager::init() (resourcemanager.cpp:88)
==9572==    by 0x500E75F: Nepomuk::ResourceManager::mainModel() (resourcemanager.cpp:227)
==9572==    by 0x500EA8E: Nepomuk::ResourceManager::generateUniqueUri() (resourcemanager.cpp:206)
==9572==    by 0x5014C2A: Nepomuk::ResourceFilterModel::addStatements(QList<Soprano::Statement> const&) (resourcefiltermodel.cpp:236)
==9572==    by 0x500BED3: Nepomuk::ResourceData::store() (resourcedata.cpp:285)
==9572==    by 0x500C4BA: Nepomuk::ResourceData::setProperty(QUrl const&, Nepomuk::Variant const&) (resourcedata.cpp:370)
==9572==    by 0x5027A81: Nepomuk::Resource::setProperty(QUrl const&, Nepomuk::Variant const&) (resource.cpp:227)
==9572==    by 0x43C2CA1: NepomukHandler::saveFileProperties(Nepomuk::Resource const&) (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x43C2C37: NepomukHandler::saveFileProperties() (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x43B1331: Transfer::setStatus(Job::Status, QString const&, QPixmap const&) (in /usr/lib/libkgetcore.so.4.1.0)
==9572== 
==9572== Jump to the invalid address stated on the next line
==9572==    at 0x0: ???
==9572==    by 0x4F48208: Soprano::Model::addStatements(QList<Soprano::Statement> const&) (model.cpp:135)
==9572==    by 0x5014D1C: Nepomuk::ResourceFilterModel::addStatements(QList<Soprano::Statement> const&) (resourcefiltermodel.cpp:245)
==9572==    by 0x500BED3: Nepomuk::ResourceData::store() (resourcedata.cpp:285)
==9572==    by 0x500C4BA: Nepomuk::ResourceData::setProperty(QUrl const&, Nepomuk::Variant const&) (resourcedata.cpp:370)
==9572==    by 0x5027A81: Nepomuk::Resource::setProperty(QUrl const&, Nepomuk::Variant const&) (resource.cpp:227)
==9572==    by 0x43C2CA1: NepomukHandler::saveFileProperties(Nepomuk::Resource const&) (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x43C2C37: NepomukHandler::saveFileProperties() (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x43B1331: Transfer::setStatus(Job::Status, QString const&, QPixmap const&) (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x43B24BE: Transfer::load(QDomElement const&) (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x43B2CA0: Transfer::Transfer(TransferGroup*, TransferFactory*, Scheduler*, KUrl const&, KUrl const&, QDomElement const*) (in /usr/lib/libkgetcore.so.4.1.0)
==9572==    by 0x76F095A: (within /usr/lib/kde4/kget_kiofactory.so)
==9572==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==9572== 
==9572== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 68 from 4)
==9572== malloc/free: in use at exit: 2,520,345 bytes in 35,620 blocks.
==9572== malloc/free: 186,546 allocs, 150,926 frees, 109,236,341 bytes allocated.
==9572== For counts of detected errors, rerun with: -v
==9572== searching for pointers to 35,620 not-freed blocks.
==9572== checked 23,063,100 bytes.
==9572== 
==9572== LEAK SUMMARY:
==9572==    definitely lost: 9,589 bytes in 365 blocks.
==9572==      possibly lost: 25,039 bytes in 957 blocks.
==9572==    still reachable: 2,485,717 bytes in 34,298 blocks.
==9572==         suppressed: 0 bytes in 0 blocks.
==9572== Rerun with --leak-check=full to see details of leaked memory.
Comment 13 Simon St James 2008-10-19 22:11:48 UTC
I've got a local case here (my somethingawful thread archives) where I can reliably generate this, so please ask me any questions you have (I'm SSJ_GZ on IRC).

David's guess appears to be correct - a job is deleted just before it's scheduled to be started (I'll attach a log showing this in a sec).  The item to look out for is "0x939efc8".  The log is a complete run leading up to the crash.  What it looks like is that a job is registered via doJob, then deleted before it is actually run via startJobDirect.  Since d->m_slave is 0 at the time of deletion, the job is not cancelJob'd and the subsequent attempted use of the deleted job causes the crash.

I hope this is of use to people; as mentioned, grab me on IRC if you want me to add some specific debug output :)  When I get time (hopefully tomorrow), I'll add some dummy stuff to SimpleJob so that a flag is set by doJob and  cleared by startJobDirect, then add a breakpoint in ~SimpleJob which is triggered if this flag is not clear at the time of destruction.
Comment 14 Simon St James 2008-10-19 22:13:48 UTC
Created attachment 28030 [details]
Log of the crash with added debug info, showing a delete'd job being started.
Comment 15 David Faure 2008-10-20 13:36:54 UTC
auxsvr: yours is an unrelated nepomuk/soprano crash, please file a separate bug report for it so that we can assign it to the nepomuk/soprano people.
Comment 16 auxsvr 2008-10-20 14:10:47 UTC
If I leave it crash without valgrind, the backtrace is similar to that of the comments above (findIdleSlave etc.). Should I post it?
Comment 17 David Faure 2008-10-20 14:40:23 UTC
SVN commit 873972 by dfaure:

Fix crash when deleting a job before it starts.
Note that applications should use kill rather than delete anyway, on kio jobs
(deleting -after- it starts leads to a warning in the dtor, and, hmm, the code to kill the slave is even ifdefed out right now...).
CCBUG: 163171


 M  +1 -1      kio/job.cpp  
 M  +24 -0     tests/jobtest.cpp  
 M  +2 -0      tests/jobtest.h  


WebSVN link: http://websvn.kde.org/?view=rev&revision=873972
Comment 18 David Faure 2008-10-24 19:38:24 UTC
Simon St James said that my commit seemed to fix the crash, but he wasn't able to find where a job would be deleted without its kill() method being called first. However my commit only changes something in the case kill() is not called; the unit test shows that it was already working fine when kill() is called. So I'm still a bit unsure about this bug, and whether the commit really fixes it, and whether any code is deleting jobs without calling kill on them.
But since I don't know how to trigger the bug in the first place, I can't do more currently.
Comment 19 Frank Reininghaus 2008-11-17 21:56:35 UTC
*** Bug 170288 has been marked as a duplicate of this bug. ***
Comment 20 Lukas Appelhans 2008-11-17 22:33:14 UTC
*** Bug 170869 has been marked as a duplicate of this bug. ***
Comment 21 Simon St James 2008-12-27 11:10:49 UTC
Just as an addendum to what David said: I actually completely lost the ability to reproduce the bug even after reverting David's patch (I originally managed to get a test case that would quite reliably trigger the bug, but it stopped working after a while - probably because it involved loading files over a network, which is notoriously non-deterministic), so I think that there is a high probability that this bug is indeed fixed.
Comment 22 Jonathan Thomas 2009-01-07 04:08:11 UTC
*** Bug 173468 has been marked as a duplicate of this bug. ***
Comment 23 Dario Andres 2009-04-03 01:27:26 UTC
Any news on this ? Thanks
Comment 24 auxsvr 2009-04-03 14:10:20 UTC
I haven't seen this crash in months, it looks fixed to me.
Comment 25 auxsvr 2009-04-03 14:11:45 UTC
Please disregard the previous comment, wrong bug report.
Comment 26 Frank Reininghaus 2009-04-20 20:18:00 UTC
*** Bug 190120 has been marked as a duplicate of this bug. ***
Comment 27 Mikko C. 2009-07-07 17:36:47 UTC
*** Bug 199183 has been marked as a duplicate of this bug. ***
Comment 28 Mikko C. 2009-07-07 17:41:30 UTC
Backtrace from bug 199183 http://bugsfiles.kde.org/attachment.cgi?id=35124
Reporter is using kde 4.2.2.
Comment 29 Dawit Alemayehu 2011-05-29 22:59:24 UTC
Is this still an issue in KDE v4.6 or newer ? The KIO::Scheduler itself was rewritten for KDE 4.5.
Comment 30 Rosetzky Cedric 2011-05-31 15:12:00 UTC
Waw, it's an old one. No I don't remember having seen this issue for quiet a while now ;).
Comment 31 Dawit Alemayehu 2011-05-31 15:41:24 UTC
(In reply to comment #30)
> Waw, it's an old one. No I don't remember having seen this issue for quiet a
> while now ;).

Great. Feel free to reopen the ticket if you encounter the issue again.