SUMMARY Verifiably indexed DOCX files do not yield content search results STEPS TO REPRODUCE 1. Create a DOCX file in some location that is indexed by Baloo 2. Write some unique string in the DOCX file 3. Index the file: balooctl index /path/to/file.docx 4. Open KFind and in the content tab, search for the unique string Observed on two different Arch Linux systems. There is also a forum topic about this from last year: https://forum.kde.org/viewtopic.php?f=154&t=161505 KDE Plasma Version: 5.19.5 KDE Frameworks Version: 5.74 Qt Version: 5.15.1
That's not really a bug I think. KFind doesn't use baloo's search index (it predates baloo by far). AFAIK, it doesn't have special support for certain file formats either, it basically does the same as the "grep" tool, i.e. search for the text verbatim in the file. And as a DOCX file is compressed as ZIP, no text can be found of course. So maybe this can be seen as enhancement request to support content search via baloo. I have no idea if that would fit into kfind's design though.
(In reply to Wolfgang Bauer from comment #1) > That's not really a bug I think. > > KFind doesn't use baloo's search index (it predates baloo by far). > > AFAIK, it doesn't have special support for certain file formats either, it > basically does the same as the "grep" tool, i.e. search for the text > verbatim in the file. > And as a DOCX file is compressed as ZIP, no text can be found of course. Oh, that is surprising to hear. It does find text in ODF documents, which are compressed as ZIP as well. Does Dolphin search use Baloo's index? It doesn't work either.
(In reply to Buovjaga from comment #2) > Does Dolphin search use Baloo's index? It doesn't work either. Yes, it uses Baloo: https://userbase.kde.org/Dolphin I would rather change this to be about Baloo, sorry for the noise.
On second thought, I am closing this. I opened this to help someone else, but it seems Dolphin's content search is only broken on my system. Apparently the only problem on the original reporter's system was KFind, which we now learned should not even work with zipped files (although for some reason it does work with ODT on the reporter's system).
Dolphin uses baloo, baloo uses kfilemetadata, and kfilemetadata supports ODF and DOCX files. Zipped files are supported when it is part of the file format itself, as is the case for the OpenDocuemnt and Microsoft Office formats. Other archives (zip or e.g. any tar.*) are not extracted. As the generic structure of both is very similar (zip file + some XML), it is strange one works and the other not. Please provide one of the files which does not work, if possible.
Example docx file for reproducing the issue required.
Created attachment 132276 [details] Example DOCX file Here it is. Any ideas on how I could check, why it does not work on my system, but work on the system of the other person?
KFM has no problem with the file, and baloo on my system has no problem finding it. 1. Check if any data can be extracted from the file: a) dolphin, information panel (F11) should show "words" and "pages" b) dolphin -> properties -> details 2. Check if baloo has stored the file information: $> balooshow -x path/to/file
(In reply to Stefan Brüns from comment #8) > KFM has no problem with the file, and baloo on my system has no problem > finding it. > > 1. Check if any data can be extracted from the file: > a) dolphin, information panel (F11) should show "words" and "pages" > b) dolphin -> properties -> details > > 2. Check if baloo has stored the file information: > $> balooshow -x path/to/file Dolphin's info is showing the word and page count properly. balooshow gives this: Internal Info Terms: Mapplication Mdocument Mofficedocument Mopenxmlformats Mvnd Mwordprocessingml T5 File Name Terms: Fbalooindextest Fdocx XAttr Terms: Should the 'superduperuniquestring' appear there?
Thats just basic indexing information. Seems like the content indexer never ran. Whats the output of: $> balooctl status <path/to/file>
(In reply to Stefan Brüns from comment #10) > Thats just basic indexing information. Seems like the content indexer never > ran. Whats the output of: > $> balooctl status <path/to/file> It was indexed. Now I tried it again in a directory with less files and Dolphin was able to find it. Maybe it was just taking too long to run :( Sorry for the noise again.