Bug 419819

Summary: Baloo ranks relevance of results undesirably
Product: [Frameworks and Libraries] frameworks-baloo Reporter: bughatter
Component: generalAssignee: Stefan Brüns <stefan.bruens>
Status: RESOLVED WORKSFORME    
Severity: normal CC: nate
Priority: NOR    
Version First Reported In: 5.68.0   
Target Milestone: ---   
Platform: Other   
OS: Linux   
See Also: https://bugs.kde.org/show_bug.cgi?id=445825
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description bughatter 2020-04-07 21:10:15 UTC
SUMMARY
Baloo ranks the relevance of results undesirably. For example, including the entire exact title of a pdf book can produce many seemingly irrelevant results but not the intended result (at least not visibly in the selection of only the most relevant results shown in the start menu).

STEPS TO REPRODUCE & OBSERVED RESULTS
I have pdf's of many textbooks on my filesystem. I can search contents of the title, e.g. "Advanced Engineering Electromagnetics" and about 10 results appear--none of which are the intended textbook with the file name "C A Balanis - Advanced Engineering Electromagnetics (Second Edition)John Wiley and Sons (2012).pdf". The first result is actually "Advanced Window Behavior" which should not be even listed in the results given that the words "Engineering" and "Electromagnetics" are also included in the search. 

If I can get a textbook I search for to appear, it is often the lowermost result which would indicate that Baloo believes it is the least relevant. Indeed, the textbooks that Baloo believes more relevant potentially have the words "engineering" or "electromagnetics" or "advanced" within them. Nevertheless, in my opinion most of these results should instead be calculated to be irrelevant to the search. But in any case the file with the all of the searched words in the filename (not to mention in the same order with the same casing) should appear as the first result.

This is just an example; many file searches are like this. Including almost every word in the textbook title and the authors' names does not necessarily result in the desired result appearing at all (let alone at the top), say if these names are mentioned in other files even in places different from the other query words even referring to other people with the same name.

Gnome Tracker demonstrates the desired behavior for these kinds of searches. But I must say that the intra-file searching capabilities of Baloo are impressive. Also, Baloo appears to search the pdf metadata whereas Gnome Tracker does not.

EXPECTED RESULT
I propose that you make the fact that several query words match some files' names or pdf metadata grant those files higher relevance than files that merely contain all of the query words at some place within them.

Secondarily, I propose you also ensure that successive word matches are given priority over dispersed word matches within a file (I do not know if this is the case or how it would be implemented). Successive matches within a file may approach the relevance level of dispersed matches within the file name or pdf metadata. The distinction between exact matches and dispersed matches may not matter as much when both are within the file name or pdf metadata. 

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Arch Linux
(available in About System)
KDE Plasma Version: 5.18.4
KDE Frameworks Version: 5.68.0
Qt Version: 5.14.2
Comment 1 Nate Graham 2020-04-08 04:55:37 UTC
What are you using to search? Dolphin? Krunner? `baloosearch`?
Comment 2 Stefan Brüns 2020-04-08 06:45:51 UTC
The only "ranking" currently done by krunner and baloosearch is sorting by modification time.

If you only want to search in titles, you have to prefix the search terms, in your case e.g. "baloosearch title:engineering title:electromagnetics". The same should work from krunner.

For other prefixes, have a look at the output of 'balooshow -x "C A Balanis - Advanced Engineering Electromagnetics (Second Edition)John Wiley and Sons (2012).pdf"', extracted properties are listed at the bottom, e.g. "subject:", "title:", "author:".
Comment 3 Bug Janitor Service 2020-04-23 04:33:10 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 4 Bug Janitor Service 2020-05-08 04:33:16 UTC
This bug has been in NEEDSINFO status with no change for at least
30 days. The bug is now closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

Thank you for helping us make KDE software even better for everyone!