Bug 445825 - Baloo indexer gives more priority to content rather than file name
Summary: Baloo indexer gives more priority to content rather than file name
Status: RESOLVED NOT A BUG
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.88.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-20 16:44 UTC by saif1988
Modified: 2023-07-06 19:41 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Strange results (108.31 KB, image/png)
2021-11-20 16:44 UTC, saif1988
Details

Note You need to log in before you can comment on or make changes to this bug.
Description saif1988 2021-11-20 16:44:36 UTC
Created attachment 143777 [details]
Strange results

SUMMARY
As you can see in the picture, the results from Baloo are completely irrelevant to a first impression. It seems it is currently giving priority to weight based on the file contents rather than on the file name. For a further discussion and context, this reddit post can help: https://www.reddit.com/r/kde/comments/qwxmje/how_can_i_do_so_kde_file_search_brings_the/ in which t his Bug originated

STEPS TO REPRODUCE
Te reproduce it is tricky, as we would need specific files that contain keywords that we search for, in this case, it will happen than when searching for these keywords, the files containing them appear first, and the files that actually have these keywords on the file name, never show up, which is the assumed default. Check the image.

OBSERVED RESULT
In the picture, you can see the keyword being search brings very strange results

Of course disabling "Content Index" solves it, but it is not an answer or a solution. Content Index is important but should not be first, IMHO. Also, disabling Content Index in KDE does not work well. User needs to disable, purge, logout and re-enable baloo for the changes to take effect. But this would be a separate issue.


EXPECTED RESULT
The first results should always be files whose filename contain the searched keyword


SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 5.23.3
KDE Plasma Version: 5.23.3
KDE Frameworks Version: 5.88.0
Qt Version: 5.15.2

ADDITIONAL INFORMATION
X11, EndeavourOS
Comment 1 tagwerk19 2021-11-20 19:18:56 UTC
First thing to get out of the way is to see if you are actually using baloo, different distributions have different defaults:

Open Dolphin and the search dialog (a Ctrl-F) you could get a dialog something like:
    https://bugsfiles.kde.org/attachment.cgi?id=137170
In this case, Dolphin will be doing the searching

If the dialog looks like:
    https://bugsfiles.kde.org/attachment.cgi?id=137169
Dolphin will ask baloo for the results (these attachments are from Bug 435119)

Note that both of the dialogs give you options to select "Filename" and "Content", if you want to be sure that you are getting just filename matches , select "Filename" (and refresh if you have doubts...)

You can query the baloo index from the command line with baloosearch - as is mentioned in your Reddit thread - to get filename matches, try:
    baloosearch filename:codex

If you try
    baloosearch codex
you'll get hits that include filenames and content.

Baloosearch the results are listed according to the modification time, most recent first.
Dolphin it depends on how you are sorting.
Krunner, you are getting the most recent first...
Comment 2 saif1988 2021-11-20 19:23:25 UTC
Thank you @tagewek19
Yes, this is using baloo. In the reddit post, we followed various debugging steps in which we verified that Baloo was prioritizing results based on the contents of the files.

So the biggest question is, are we happy with KDE search (in launchers) to return the results I show in the screenshots? 

Because if we think that the solution to this problem is to manually typing "filename:codex" to bring the result, then this bug report would be discarded as a design choice. I really hope it doesn't, because to me it clearly is an unintended bug to return such "poor" search results.

In my screenshot I am searching through a Full Screen Launcher, so Dolphin's search has nothing to do with this issue. As mentioned, fixing it for the moment to me, was disabling baloo's content indexing.
Comment 3 tagwerk19 2021-11-20 22:22:55 UTC
(In reply to saif1988 from comment #2)
> Yes, this is using baloo. 
The test I tried was - pausing a second or so between each to allow baloo to keep up:

    $ echo "Hello Penguin" > test-codex.txt
    $ echo "Hello Codex" > test-penguin.txt

    $ baloosearch penguin
    $ baloosearch codex

and then:

    $ echo "Hello Codex" > test-penguin.txt

    $ echo "Hello Penguin" > test-codex.txt

    $ baloosearch penguin
    $ baloosearch codex

I was getting the "last file written" listed first...

(done on Neon Testing)

> Because if we think that the solution to this problem is to manually typing
> "filename:codex" to bring the result, then this bug report would be
> discarded as a design choice. I really hope it doesn't, because to me it
> clearly is an unintended bug to return such "poor" search results.
I've just looked through "System Settings > Search > Krunner" to see if there was a configuration option to do "just" filename search. I don't see one and maybe that would be useful...

I thought I saw a comment somewhere about "mixed case" searches, there's been some confusion/uncertainty with krunner results, Bug 388857 - scroll down to the 14th comment
Comment 4 saif1988 2021-11-20 23:05:54 UTC
The Option "Also index file content" -- can be found in System Settings > (Workspace) Search > "File Search" - Disable the checkbox and baloo would skip file contents for indexing. It is not in KRunner.

Regarding your test, I am not sure it is representative. Baloo might weight word repetition, and other criterias. Better to review it from a development's point of view to see the actual/expected behavior.

Thanks
Comment 5 tagwerk19 2021-11-21 09:03:01 UTC
(In reply to saif1988 from comment #4)
> ... Better to review it from a
> development's point of view to see the actual/expected behavior ...
I think touched on in Bug 419819, have a look down at
    https://bugs.kde.org/show_bug.cgi?id=419819#c2

For System Settings, I was hoping for a setting that would allow content indexing to be done but that krunner just queries for filenames.
Comment 6 Stefan Brüns 2023-03-18 16:02:07 UTC
Several incorrect assumptions:

1. The Baloo indexer does not prioritize content at all
2. KRunner prioritizes by modification time
3. You can limit your searches by prefixing the term with either "content:" or "filename:"
Comment 7 Stefan Brüns 2023-07-06 19:41:04 UTC
The search can be limited by prefixing the search with "filename:". If this is not sufficient, a wishlist bug should be filed for KRunner.