Summary: | Indexing of files hangs randomly - nepomukindexer indefinitely for D-Bus response from Soprano | ||
---|---|---|---|
Product: | [Unmaintained] nepomuk | Reporter: | Paweł Paprota <ppawel> |
Component: | fileindexer | Assignee: | Sebastian Trueg <sebastian> |
Status: | RESOLVED UNMAINTAINED | ||
Severity: | crash | CC: | me, trueg |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: | stack traces of Nepomuk storage service threads |
Description
Paweł Paprota
2011-06-19 15:52:44 UTC
I managed to get more information out of Soprano's error logging subsystem by adding printing of D-Bus error type to Soprano::DBus::convertError function. Now it says: Soprano: "org.freedesktop.DBus.Error.NoReply - Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. (org.freedesktop.DBus.Error.NoReply)" At the end is a result of QDBusError::errorString call. So the error is really that the method did not return at the server. I was afraid it was some strange networking problem or some such but now I will continue debugging the server side. OK, so this definitely is a server side problem with the storage service. I added a lot of logging both on the client and the server side and when indexing works, the logging shows up where expected. When it hangs, the request doesn't even seem to reach the server side. Also, after it hangs, it's not possible to interface with org.kde.NepomukStorage in any way. Debugging continues... I've just had a D-Bus debugging session and with the support of #dbus the conclusion is that we should first examine Nepomuk thread-safety. Here is the relevant IRC log: ppawel> hey <ppawel> I'm having a strange problem with dbus communication in qt app <ppawel> sometimes (rarely, randomly but reproducibly) one of my dbus exposed services stops responding <ppawel> that is, I get NoReply errors when trying to do anything with it.. what would be a proper way to start debugging this problem? <thiago> attach a debugger to the application <thiago> is the event loop still running? <ppawel> yes <ppawel> I don't understand what the problem is, it's as if dbus messages don't reach the application at all <ppawel> right now I'm looking at some debug logging in dbus <ppawel> and how to enable it <thiago> you can run the app with QDBUS_DEBUG=1 as well and it will print all the incoming messages <ppawel> hmm. wait <thiago> you need to have a debug-mode QtDBus too <ppawel> (thanks) <ppawel> this doesn't seem right: <ppawel> (gdb) bt <ppawel> #0 0x00007fc679755b83 in poll () from /lib/libc.so.6 <ppawel> #1 0x00007fc678bb97d4 in ?? () from /usr/lib/libglib-2.0.so.0 <ppawel> #2 0x00007fc678bb9d09 in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0 <ppawel> #3 0x00007fc67c935876 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4 <ppawel> #4 0x00007fc67a63e1be in ?? () from /usr/lib/libQtGui.so.4 <ppawel> #5 0x00007fc67c909db2 in QEventLoop::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4 <ppawel> #6 0x00007fc67c909fb7 in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQtCore.so.4 <ppawel> #7 0x00007fc67c90e1ab in QCoreApplication::exec() () from /usr/lib/libQtCore.so.4 <ppawel> sorry for spam <ppawel> this is my app <thiago> it seems right to me <ppawel> ok <thiago> try the QDBUS_DEBUG method. If that doesn't work, you'll need to debug QtDBus and figure out why the socket isn't waking up <ppawel> ah so I don't have to recompile qt to get QDBUS_DEBUG support? <thiago> you might have to <ppawel> phew <ppawel> it works <ppawel> without recompiling <ppawel> thank you for the pointer! <ppawel> waiting for the problem to reproduce itself now <ppawel> eh, nothing too exciting :( <ppawel> "sending message" from the client but no "got message" on the server <thiago> it doesn't appear? <ppawel> logging does not appear, nor the message itself <ppawel> I'm not sure how to interpret that... can this be some violation of usage of dbus on my app's part? or is this bug in dbus <thiago> we need to know where it got stuck <ppawel> client or server? <thiago> there are three programs involved here, so four interfaces <thiago> original sending, dbus-daemon receiving it, dbus-daemon resending it, target receiving it <thiago> we know the target didn't receive it <thiago> so where did it fail? <ppawel> how can I enable debug logging for dbus daemon? <thiago> rebuild it <thiago> but you can try first dbus-monitor <thiago> if you see the message, then at least you know the bus received it <ppawel> hmm but isn't dbus-monitor just another client and it won't be conclusive whether the daemon received it or not? <ppawel> but ok I will re-run the app and capture dbus monitor <thiago> if dbus-monitor shows it, then dbus-daemon received it <ppawel> ok <ppawel> so I see "method call" but not "method return" in dbus-monitor <thiago> and you don't see the method call in the target's log, right? <ppawel> no I don't <thiago> so the bus received it <thiago> I have no reason to think that it didn't send it <thiago> so the question is: why isn't the target reading the message from the socket? <thiago> can you strace the target program to see what it's doing? <ppawel> above gdb trace is from the target <thiago> I know <thiago> but we need something different <ppawel> I could use netstat to see recv queue in the target right? <thiago> yes <ppawel> wait but should I look for unix socket or tcp? <thiago> unix <ppawel> hmm for unix sockets it doesn't give queue sizes, only refcnt <ppawel> I guess there is no buffering for unix sockets? I don't know, whatever <thiago> there is a bit of buffering <ppawel> I can't figure out what socket is the target using to listen on for dbus messages <ppawel> neither netstat nor lsof show any relevant file paths <thiago> ls /proc/PID/fd/, get the inode from there and then search in the output of netstat --unix <ppawel> I do have a list of unix sockets opened by the target process but neither of them look like dbus sockets <ppawel> http://pastebin.com/raw.php?i=ej5MFdiH <thiago> are you the nepomukservice developer? <ppawel> I'm just starting with the nepomuk project <thiago> are you using the very latest source code from Git/SVN? <ppawel> yes <thiago> nepomuk had a problem that it left some objects on threads without event loops <thiago> so I can't help you. Fix nepomuk first. <thiago> it must not open more than one D-Bus connection (which it used to do) and it must stop leaking threads. <ppawel> you might be right, there is a lot of multithreading in the target process, I better focus on debugging that first... <ppawel> thank you for the support though - valuable hints <ppawel> nondeterministic nature of this problem points to multithreading... <thiago> yes <thiago> I know it leaks objects and threads <thiago> fix that first <ppawel> besides I doubt that dbus has bugs of such severity <ppawel> a lot of people are using it so it would get noticed <ppawel> can I copy paste log from our chat to the bugzilla for future reference? <thiago> yes :) Created attachment 61602 [details]
stack traces of Nepomuk storage service threads
Another person (mjansen on #nepomuk-kde) also reported this problem. Attached is his gdb output which clearly shows relation to the multithreading in the Nepomuk storage/Soprano.
The hanging of the storage service is not really solved. But the spawning of thousands of nepomukindexer instances is: bug 281779. Thank you for taking the time to file a bug report. The Nepomuk project is no longer included in the KDE Software Compilation. With Plasma 5, we have replaced most of the underlying technology with Baloo and other components. Hopefully this will have addressed your concern. We encourage you to try out Plasma 5 (+Baloo) and let us know if your problem persists. |