Bug 276056 - Indexing of files hangs randomly - nepomukindexer indefinitely for D-Bus response from Soprano
Summary: Indexing of files hangs randomly - nepomukindexer indefinitely for D-Bus resp...
Status: RESOLVED UNMAINTAINED
Alias: None
Product: nepomuk
Classification: Miscellaneous
Component: fileindexer (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR crash
Target Milestone: ---
Assignee: Sebastian Trueg
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-19 15:52 UTC by Paweł Paprota
Modified: 2015-01-23 16:17 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
stack traces of Nepomuk storage service threads (64.00 KB, text/plain)
2011-07-04 17:03 UTC, Paweł Paprota
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paweł Paprota 2011-06-19 15:52:44 UTC
Version:           unspecified (using Devel) 
OS:                Linux

When indexing my home directory I'm able to reproduce this hang every time but it happens on different files so it's not caused by one specific file.

Here are the last lines from the log of nepomukserver:

[/home/ppawel/kdeinst/bin/nepomukservicestub] nepomukstrigiservice(31227)/nepomuk (strigi service) Nepomuk::Indexer::indexFile: Indexing of "/home/ppawel/Downloads/music/RR_-_JIMMYBASH.rar-dir/02 Ol' Man
 River.wma" finished with exit code 0
[/home/ppawel/kdeinst/bin/nepomukservicestub] nepomukstrigiservice(31227)/nepomuk (strigi service) Nepomuk::Indexer::indexFile: Running "/home/ppawel/kdeinst/bin/nepomukindexer" "/home/ppawel/Downloads/m
usic/RR_-_JIMMYBASH.rar-dir/06 Bashin'.wma"
[/home/ppawel/kdeinst/bin/nepomukservicestub] "/home/ppawel/kdeinst/bin/nepomukindexer(28178)" Soprano: "org.freedesktop.DBus.Error.NoReply - Did not receive a reply. Possible causes include: the remote 
application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
[/home/ppawel/kdeinst/bin/nepomukservicestub] nepomukindexer(28178)/nepomuk (library) {anonymous}::GlobalModelContainer::init: Connecting to local socket "/tmp/ksocket-ppawel/nepomuk-socket"
[/home/ppawel/kdeinst/bin/nepomukservicestub] virtual void Soprano::Server::LocalServer::incomingConnection(quintptr) 
void Soprano::Server::ServerCorePrivate::addConnection(Soprano::Server::ServerConnection*) New connection. New count: 8
[/home/ppawel/kdeinst/bin/nepomukservicestub] "/home/ppawel/kdeinst/bin/nepomukindexer(28178)" Soprano: "org.freedesktop.DBus.Error.NoReply - Did not receive a reply. Possible causes include: the remote 
application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
[/home/ppawel/kdeinst/bin/nepomukservicestub] "/home/ppawel/kdeinst/bin/nepomukindexer(28178)" Soprano: "org.freedesktop.DBus.Error.NoReply - Did not receive a reply. Possible causes include: the remote 
application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
[/home/ppawel/kdeinst/bin/nepomukservicestub] Soprano::ODBC::Connection::Connection() Soprano::Server::ServerConnection(0x2ea1910)
[/home/ppawel/kdeinst/bin/nepomukservicestub] "/home/ppawel/kdeinst/bin/nepomukindexer(28178)" Soprano: "org.freedesktop.DBus.Error.NoReply - Did not receive a reply. Possible causes include: the remote 
application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
[/home/ppawel/kdeinst/bin/nepomukservicestub] "/home/ppawel/kdeinst/bin/nepomukindexer(28178)" Soprano: "org.freedesktop.DBus.Error.NoReply - Did not receive a reply. Possible causes include: the remote 
application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
[/home/ppawel/kdeinst/bin/nepomukservicestub] "/home/ppawel/kdeinst/bin/nepomukindexer(28178)" Soprano: "org.freedesktop.DBus.Error.NoReply - Did not receive a reply. Possible causes include: the remote 
application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
[/home/ppawel/kdeinst/bin/nepomukservicestub] "/home/ppawel/kdeinst/bin/nepomukindexer(28178)" Soprano: "org.freedesktop.DBus.Error.NoReply - Did not receive a reply. Possible causes include: the remote 
application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."
[/home/ppawel/kdeinst/bin/nepomukservicestub] nepomukstrigiservice(31227)/nepomuk (strigi service) Nepomuk::IndexScheduler::setIndexingSpeed: 1 ReducedSpeed
[/home/ppawel/kdeinst/bin/nepomukservicestub] "/home/ppawel/kdeinst/bin/nepomukindexer(28178)" Soprano: "org.freedesktop.DBus.Error.NoReply - Did not receive a reply. Possible causes include: the remote 
application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."

Later continues to time out with this message and nothing happens - have to restart the nepomukserver process.

When this is happening, I attach gdb to the nepomukindexer process and here is the back trace:

Thread 1 (Thread 0x7f2c0eaf8760 (LWP 28178)):                                                                                                                                                              
#0  0x00007f2c09d1bb48 in poll () from /lib/libc.so.6                                                                                                                                                      
#1  0x00007f2bfc917c80 in ?? () from /usr/lib/libdbus-1.so.3                                                                                                                                               
#2  0x00007f2bfc916b6d in ?? () from /usr/lib/libdbus-1.so.3                                                                                                                                               
#3  0x00007f2bfc901a25 in ?? () from /usr/lib/libdbus-1.so.3                                                                                                                                               
#4  0x00007f2bfc902d89 in ?? () from /usr/lib/libdbus-1.so.3                                                                                                                                               
#5  0x00007f2bfc902344 in dbus_connection_send_with_reply_and_block () from /usr/lib/libdbus-1.so.3                                                                                                        
#6  0x00007f2c0ae1e5a4 in ?? () from /usr/lib/libQtDBus.so.4                                                                                                                                               
#7  0x00007f2c0ae0c23b in QDBusConnection::call(QDBusMessage const&, QDBus::CallMode, int) const () from /usr/lib/libQtDBus.so.4                                                                           
#8  0x00007f2c0ae2bb11 in QDBusAbstractInterface::callWithArgumentList(QDBus::CallMode, QString const&, QList<QVariant> const&) () from /usr/lib/libQtDBus.so.4                                            
#9  0x00007f2c073ae47a in Soprano::Client::DBusServerInterface::createModel(QString const&) () from /home/ppawel/kdeinst/lib/libsopranoclient.so.1                                                         
#10 0x00007f2c073ae9eb in Soprano::Client::DBusClient::createModel(QString const&, QList<Soprano::BackendSetting> const&) () from /home/ppawel/kdeinst/lib/libsopranoclient.so.1                           
#11 0x00007f2c0d704b5b in (anonymous namespace)::GlobalModelContainer::init(bool) () from /home/ppawel/kdeinst/lib/libnepomuk.so.4                                                                         
#12 0x00007f2c0d704e9f in (anonymous namespace)::GlobalModelContainer::model() () from /home/ppawel/kdeinst/lib/libnepomuk.so.4                                                                            
#13 0x00007f2c0d705514 in Nepomuk::MainModel::executeQuery(QString const&, Soprano::Query::QueryLanguage, QString const&) const () from /home/ppawel/kdeinst/lib/libnepomuk.so.4                           
#14 0x00007f2c0d407f11 in Soprano::FilterModel::executeQuery(QString const&, Soprano::Query::QueryLanguage, QString const&) const () from /home/ppawel/kdeinst/lib/libsoprano.so.4                         
#15 0x00007f2c0d71c1bf in Nepomuk::Types::EntityPrivate::load() () from /home/ppawel/kdeinst/lib/libnepomuk.so.4                                                                                           
#16 0x00007f2c0d71bf67 in Nepomuk::Types::EntityPrivate::init() () from /home/ppawel/kdeinst/lib/libnepomuk.so.4                                                                                           
#17 0x00007f2c0d726ea7 in Nepomuk::Types::Property::literalRangeType() () from /home/ppawel/kdeinst/lib/libnepomuk.so.4                                                                                    
#18 0x000000000040e2c8 in Nepomuk::StrigiIndexWriter::initWriterData(Strigi::FieldRegister const&) ()                                                                                                      
#19 0x00007f2c0e48ebd2 in Strigi::StreamAnalyzer::setIndexWriter(Strigi::IndexWriter&) () from /home/ppawel/kdeinst/lib/libstreamanalyzer.so.0                                                             
#20 0x000000000040a0b5 in Nepomuk::Indexer::Indexer(QObject*) ()                                                                                                                                           
#21 0x00000000004098ca in main ()                                                                                                                                                                          

Reproducible: Sometimes




Latest KDE (kdelibs, kde-runtime, strigi etc.) compiled from sources.
Comment 1 Paweł Paprota 2011-07-03 13:55:45 UTC
I managed to get more information out of Soprano's error logging subsystem by adding printing of D-Bus error type to Soprano::DBus::convertError function.

Now it says:

Soprano: "org.freedesktop.DBus.Error.NoReply - Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. (org.freedesktop.DBus.Error.NoReply)"

At the end is a result of QDBusError::errorString call. So the error is really that the method did not return at the server. I was afraid it was some strange networking problem or some such but now I will continue debugging the server side.
Comment 2 Paweł Paprota 2011-07-03 14:32:40 UTC
OK, so this definitely is a server side problem with the storage service. I added a lot of logging both on the client and the server side and when indexing works, the logging shows up where expected. When it hangs, the request doesn't even seem to reach the server side.

Also, after it hangs, it's not possible to interface with org.kde.NepomukStorage in any way.

Debugging continues...
Comment 3 Paweł Paprota 2011-07-03 15:50:53 UTC
I've just had a D-Bus debugging session and with the support of #dbus the conclusion is that we should first examine Nepomuk thread-safety. Here is the relevant IRC log:

ppawel> hey
<ppawel> I'm having a strange problem with dbus communication in qt app
<ppawel> sometimes (rarely, randomly but reproducibly) one of my dbus exposed services stops responding
<ppawel> that is, I get NoReply errors when trying to do anything with it.. what would be a proper way to start debugging this problem?
<thiago> attach a debugger to the application
<thiago> is the event loop still running?
<ppawel> yes
<ppawel> I don't understand what the problem is, it's as if dbus messages don't reach the application at all
<ppawel> right now I'm looking at some debug logging in dbus
<ppawel> and how to enable it
<thiago> you can run the app with QDBUS_DEBUG=1 as well and it will print all the incoming messages
<ppawel> hmm. wait
<thiago> you need to have a debug-mode QtDBus too
<ppawel> (thanks)
<ppawel> this doesn't seem right:
<ppawel> (gdb) bt                                                                                                                                                                                                   
<ppawel> #0  0x00007fc679755b83 in poll () from /lib/libc.so.6                                                                                                                                                      
<ppawel> #1  0x00007fc678bb97d4 in ?? () from /usr/lib/libglib-2.0.so.0                                                                                                                                             
<ppawel> #2  0x00007fc678bb9d09 in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0                                                                                                                       
<ppawel> #3  0x00007fc67c935876 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4                                                                       
<ppawel> #4  0x00007fc67a63e1be in ?? () from /usr/lib/libQtGui.so.4                                                                                                                                                
<ppawel> #5  0x00007fc67c909db2 in QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4                                                                                 
<ppawel> #6  0x00007fc67c909fb7 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4                                                                                          
<ppawel> #7  0x00007fc67c90e1ab in QCoreApplication::exec() () from /usr/lib/libQtCore.so.4
<ppawel> sorry for spam
<ppawel> this is my app
<thiago> it seems right to me
<ppawel> ok
<thiago> try the QDBUS_DEBUG method. If that doesn't work, you'll need to debug QtDBus and figure out why the socket isn't waking up
<ppawel> ah so I don't have to recompile qt to get QDBUS_DEBUG support?
<thiago> you might have to
<ppawel> phew
<ppawel> it works
<ppawel> without recompiling
<ppawel> thank you for the pointer!
<ppawel> waiting for the problem to reproduce itself now
<ppawel> eh, nothing too exciting :(
<ppawel> "sending message" from the client but no "got message" on the server
<thiago> it doesn't appear?
<ppawel> logging does not appear, nor the message itself
<ppawel> I'm not sure how to interpret that... can this be some violation of usage of dbus on my app's part? or is this bug in dbus
<thiago> we need to know where it got stuck
<ppawel> client or server?
<thiago> there are three programs involved here, so four interfaces
<thiago> original sending, dbus-daemon receiving it, dbus-daemon resending it, target receiving it
<thiago> we know the target didn't receive it
<thiago> so where did it fail?
<ppawel> how can I enable debug logging for dbus daemon?
<thiago> rebuild it
<thiago> but you can try first dbus-monitor
<thiago> if you see the message, then at least you know the bus received it
<ppawel> hmm but isn't dbus-monitor just another client and it won't be conclusive whether the daemon received it or not?
<ppawel> but ok I will re-run the app and capture dbus monitor
<thiago> if dbus-monitor shows it, then dbus-daemon received it
<ppawel> ok
<ppawel> so I see "method call" but not "method return" in dbus-monitor
<thiago> and you don't see the method call in the target's log, right?
<ppawel> no I don't
<thiago> so the bus received it
<thiago> I have no reason to think that it didn't send it
<thiago> so the question is: why isn't the target reading the message from the socket?
<thiago> can you strace the target program to see what it's doing?
<ppawel> above gdb trace is from the target
<thiago> I know
<thiago> but we need something different
<ppawel> I could use netstat to see recv queue in the target right?
<thiago> yes
<ppawel> wait but should I look for unix socket or tcp?
<thiago> unix
<ppawel> hmm for unix sockets it doesn't give queue sizes, only refcnt
<ppawel> I guess there is no buffering for unix sockets? I don't know, whatever
<thiago> there is a bit of buffering
<ppawel> I can't figure out what socket is the target using to listen on for dbus messages
<ppawel> neither netstat nor lsof show any relevant file paths
<thiago> ls /proc/PID/fd/, get the inode from there and then search in the output of netstat --unix
<ppawel> I do have a list of unix sockets opened by the target process but neither of them look like dbus sockets
<ppawel> http://pastebin.com/raw.php?i=ej5MFdiH
<thiago> are you the nepomukservice developer?
<ppawel> I'm just starting with the nepomuk project
<thiago> are you using the very latest source code from Git/SVN?
<ppawel> yes
<thiago> nepomuk had a problem that it left some objects on threads without event loops
<thiago> so I can't help you. Fix nepomuk first.
<thiago> it must not open more than one D-Bus connection (which it used to do) and it must stop leaking threads.
<ppawel> you might be right, there is a lot of multithreading in the target process, I better focus on debugging that first...
<ppawel> thank you for the support though - valuable hints
<ppawel> nondeterministic nature of this problem points to multithreading...
<thiago> yes
<thiago> I know it leaks objects and threads
<thiago> fix that first
<ppawel> besides I doubt that dbus has bugs of such severity
<ppawel> a lot of people are using it so it would get noticed
<ppawel> can I copy paste log from our chat to the bugzilla for future reference?
Comment 4 Paweł Paprota 2011-07-03 15:52:42 UTC
<thiago> yes

:)
Comment 5 Paweł Paprota 2011-07-04 17:03:11 UTC
Created attachment 61602 [details]
stack traces of Nepomuk storage service threads

Another person (mjansen on #nepomuk-kde) also reported this problem. Attached is his gdb output which clearly shows relation to the multithreading in the Nepomuk storage/Soprano.
Comment 6 Sebastian Trueg 2011-09-23 18:44:30 UTC
The hanging of the storage service is not really solved. But the spawning of thousands of nepomukindexer instances is: bug 281779.
Comment 7 Vishesh Handa 2015-01-23 16:17:41 UTC
Thank you for taking the time to file a bug report.

The Nepomuk project is no longer included in the KDE Software Compilation. With Plasma 5, we have replaced most of the underlying technology with Baloo and other components. Hopefully this will have addressed your concern.

We encourage you to try out Plasma 5 (+Baloo) and let us know if your problem persists.