Hi, baloo doesn't index some file by default, like cource code (cpp files...) text... I know that it is possible to remove the mimedatafile exclusion to make it work, but it is not the normal behavior someone expects from a normal search. There should be a reason for that behavior, maybe. I think by default everything should be indexed, maybe not the contents of file, but at least file names. Sincerely
The devil's in the details here. You probably do not want the filenames of all the millions of hidden files in the .git directories of your git repos indexed, for example. But I agree that we could perhaps index the filenames of all the filetypes excluded by mimetype, as they are typically excluded because their contents are useless to index, or so large that they blow up the index.
@Nate: https://phabricator.kde.org/D29207
.git would be excluded by default, as long as the users does not change the config deliberately.
How did I miss that!?
Git commit 24b1392e0094a954bb15c99d71cb0ccf527e88ea by Stefan Brüns. Committed on 10/06/2020 at 23:16. Pushed by bruns into branch 'master'. [Indexers] Ignore name-based mimetype for initial indexing decisions Summary: The name based mime type is inaccurate, so it should not be used to decide if a file should be indexed. In case a specific extension should be skipped this can still be done accurately by the name based filters, e.g. instead of "image/png" "*.png" can be used, or the whole directory can be excluded. This inaccuracy is also confusing for the user, as a file without extension will be added to the index, but adding an extension removes the file from the index. The file extension may also be ambiguous. This also matches the current list of excluded mime types, which are source files for various languages. These blow up the full text index and thus should be excluded (by default), but just adding the file names increases the index size only marginally. The 'inability' to find files is a recurring user complaint. Depends on D28932 Reviewers: #baloo, ngraham Reviewed By: #baloo, ngraham Subscribers: kde-frameworks-devel Tags: #frameworks, #baloo Differential Revision: https://phabricator.kde.org/D29207 M +2 -0 src/file/extractor/app.cpp M +0 -3 src/file/firstrunindexer.cpp M +0 -3 src/file/modifiedfileindexer.cpp M +0 -3 src/file/newfileindexer.cpp M +0 -3 src/file/unindexedfileiterator.cpp https://invent.kde.org/frameworks/baloo/commit/24b1392e0094a954bb15c99d71cb0ccf527e88ea
This is effectively all fixed by the above commit. Great job, Stefan!