Summary: | Indexer loops on MSOFFICE files | ||
---|---|---|---|
Product: | [Unmaintained] nepomuk | Reporter: | Hrvoje Senjan <hrvoje.senjan> |
Component: | fileindexer | Assignee: | Nepomuk Bugs Coordination <nepomuk-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | me, nepomuk-bugs, stephan.diestelhorst |
Priority: | NOR | ||
Version: | git master | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | http://commits.kde.org/nepomuk-core/fcb3df91a5824b25741042ef0837cf28665deeb3 | Version Fixed In: | |
Sentry Crash Report: | |||
Attachments: | Fixes pageCount and wordCount problem |
Description
Hrvoje Senjan
2013-05-28 11:39:52 UTC
Confirmed. I know about this. I've brought up the issue on the nepomuk mailing list. We can either change the ontologies or fix the indexer. I should also add some code to make sure that faulty indexers do not cause the indexer to loop forever. (In reply to comment #1) > Confirmed. > > I know about this. I've brought up the issue on the nepomuk mailing list. We > can either change the ontologies or fix the indexer. OK, i guess ontologies can be changed, as we already need new release for 4.11 > I should also add some code to make sure that faulty indexers do not cause > the indexer to loop forever. Idea(not sure if possible/how hard to implement): if indexer fails to index unchanged file for x times, stop trying to index it... > > I should also add some code to make sure that faulty indexers do not cause
> > the indexer to loop forever.
> Idea(not sure if possible/how hard to implement): if indexer fails to index
> unchanged file for x times, stop trying to index it...
It's not too hard. I have been meaning to implement it for some time now. Maybe I'll do it this week.
(In reply to comment #3) > > > I should also add some code to make sure that faulty indexers do not cause > > > the indexer to loop forever. > > Idea(not sure if possible/how hard to implement): if indexer fails to index > > unchanged file for x times, stop trying to index it... > > It's not too hard. I have been meaning to implement it for some time now. > Maybe I'll do it this week. Great! :-) Also, at least for docx, nfo#pageCount needs adjusting: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#pageCount> has a rdfs:domain of <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#PaginatedTextDocument>. <nepomuk:/res/9b4a50a1-7c0b-4fe9-9146-1abf4588c09c> only has the following types <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#InformationElement>, <http://www.w3.org/2000/01/rdf-schema#Resource>, <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>, <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#DataObject>, <http://www.w3.org/2000/01/rdf-schema#Resource>, <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Document>, <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#InformationElement>, <http://www.w3.org/2000/01/rdf-schema#Resource> Created attachment 80119 [details]
Fixes pageCount and wordCount problem
Could you please test? I don't seem to have many msoffice documents.
Sure, i'll test it ;-) Yup, works :-) Just the page count seems to be always at 1 You'll need to file a separate bug for that and maybe attach a document I can use to test it out. Committing the patch above. Git commit fcb3df91a5824b25741042ef0837cf28665deeb3 by Vishesh Handa. Committed on 28/05/2013 at 15:29. Pushed by vhanda into branch 'master'. Office2007Extractor: Only add pageCount and wordCount for documents The ontologies do not support it for presentations and spreadsheets M +17 -14 services/fileindexer/indexer/office2007extractor.cpp http://commits.kde.org/nepomuk-core/fcb3df91a5824b25741042ef0837cf28665deeb3 This still seems to happen with the latest KDE 4.11, is the patch part of the release? I think the functionality of skipping a file when choking on it earlier should be implemented, too. (In reply to comment #10) > This still seems to happen with the latest KDE 4.11, is the patch part of > the release? I think the functionality of skipping a file when choking on > it earlier should be implemented, too. Just inspected the broken files (one PPTX and one ODT). Both are corrupted files, properly detected by running the file command on them. I have tried to add their names to the ignore list in Nepomuksettings, but that does not prevent them from being indexed. Also renaming the files to foo.odt.BROKEN and then adding *.BROKEN to the ignore list does not help. Stefan, could you please report a new bug and attach the documents to reproduce there? |