Version: (using KDE 4.4.1) OS: Linux Installed from: Debian testing/unstable Packages I had a PDF file from which pdftotext apparently extracted malformed UTF-8 text which causes an assert failure in strigi (LineEventAnalyzer::handleUtf8Data). Strigi, in turn aborts with a SIGABRT. Nepomuk doesn't log an error nor does it in any other way indicate that something went wrong -- which makes it rather hard to realize that something went wrong in the first place and even harder to track down the file causing the problem. Currently, Nepomuk just restarts strigi and starts over from the beginning. Instead it would be much better to - Notify the user that a file is causing problems. - Exclude this file from indexing as long as it remains unchanged. - Resume indexing after the offending file. See also https://sourceforge.net/tracker/?func=detail&atid=856302&aid=2979889&group_id=171000 for a ticket asking strigi to be more helpful when encountering malformed UTF-8.
This is similar to bug #232395 I reported some days ago, except that in my case, strigi nepomuk services complains loudly of too many crashes in ~/.xsession-errors. Did you grep your ~/.xsession-errors just for the word "crash"? But you put emphasis on how nepomuk handles those crashes. I did so as well already in bug #232398. I think your bug report contains two bug reports. I suggest you to add here all information on the UTF-8 related crashes you encounter as I am not yet sure, whether you are seeing a duplicate of bug #232395, your description sounds different. And to add your suggestions how Nepomuk should handle those crashes in bug #232398 or report a new wish if your suggestions differ.
Martin, I don't agree with your assessment. Your problem and bug report is concerned with behavior that occurs in the storage backend Nepomuk uses. In contrast, this bug report is about a problem in the frontend used for extracting data from files.
Ok, seems you have a clearer understanding. Point taken, bugs are linked, may the Sebastian Trueg or some other strigi / nepomuk developer finally decide on similarity.
I agree that this needs to be done since there are too many cases in which strigi crashes. Are there any takers for this bug?
*** Bug 232402 has been marked as a duplicate of this bug. ***
*** Bug 232395 has been marked as a duplicate of this bug. ***
*** This bug has been marked as a duplicate of bug 232398 ***