Version: 4.8 (using KDE 4.8.0) OS: Linux I have many pdf files, that they do not appear in the nepomuk search results. I made many search queries using keywords I know these PDFs contain, or I even searched directly using their filename from within dolphin, krunner or nepoogle. I removed my nepomuk database, I tried to index these PDF file using the nepomukindexer command, I have moved the around to different folders, but it is impossibly to get them indexed. For some of these PDFs, nepomukindexer doesn't return anything (suppose that there is no error) but for some others different errors are returned. Nepomukindex exit status is always 0 though... I guess that's probably a different bug by itself. Reproducible: Always Steps to Reproduce: Try to index some pdf files using nepomukindexer command. Then use nepoogle to see if they have entered the nepomuk database. ./nepoogle url:"file name of pdf.pdf" One PDF surely cannot be indexed for me (but no error returned by nepomukindexer) is the MLN Manual you can get from this sourceforge link: http://mln.sourceforge.net/doc/mln-manual.pdf Actual Results: PDF file is not indexed, thus no results showing to queries related to that file. Expected Results: All PDF files should be able to get indexed correctly by nepomuk. There is a whole thread in KDE forums with lots of information and ways tried to solve this unsuccessfully. http://forum.kde.org/viewtopic.php?f=154&t=99385
Created attachment 69048 [details] One more of the PDFs fails to get indexed
I have the same problem. With xmlindexer I get a lot of information about the pdfs (metadata and content), but nepomukindexer returns without printing anything. Monitoring with sopranocmd --dbus org.kde.NepomukStorage --model main monitor shows nothing. The files in question show nothing when opened in nepomukshell and show no hash in dolphin.
I guess that bugs #285128 and #234069 could be could be clusterd in this one. This is a problem with the strigi analyser. In the repo there are two branches with alternative analisers: "newPdfAnalyzer" and "popplerPdfAnalyzer". Although in incomplete state, both these alternatives produce better results than the default pdf analiser. Please, could any of the developers involved take a stab at pushing any of these alternatives as the default?
In KDE 4.10, we have moved away from Strigi and are using our own indexer based on poppler. I'm not marking this bug as fixed, as the indexer has not been thoroughly tested. It could still use some polish. I'll mark this as fixed, when I have tested it adequately.
This new PDF analyzer works quite well :)