Bug 462787 - Filesearch in KRunner fails to give relevant result
Summary: Filesearch in KRunner fails to give relevant result
Status: RESOLVED DUPLICATE of bug 434589
Alias: None
Product: krunner
Classification: Plasma
Component: filesearch (show other bugs)
Version: 5.26.4
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-12-08 15:16 UTC by Andrea Panontin
Modified: 2023-01-05 16:23 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
File (clearly present on the filesystem) failing to be shown in a KRunner query (624.15 KB, image/png)
2022-12-08 15:16 UTC, Andrea Panontin
Details
Same file easily found using search in Dolphin (233.20 KB, image/png)
2022-12-08 15:17 UTC, Andrea Panontin
Details
Lots of irrelevant results in the query for "commutative algebra with a view" (up to case, an exact match of a filename) (435.78 KB, image/png)
2022-12-09 14:52 UTC, Andrea Panontin
Details
Lots of irrelevant results in the query for "commutative algebra with a view" (up to case, an exact match of a filename) (456.78 KB, image/png)
2022-12-09 14:57 UTC, Andrea Panontin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrea Panontin 2022-12-08 15:16:39 UTC
Created attachment 154433 [details]
File (clearly present on the filesystem) failing to be shown in a KRunner query

I got the habit of looking for files in the start menu on windows (due to the clunkyness of its filemanager), and I found it rather efficient.
Since coming back to KDE I tried to replicate this workflow, but I find it hard to replicate.

My workflow is: I am studying mathematics, so I often need to check on sources (i.e. pdf/djvu files) certain results.
I have the files already on my computer and (as far as I am aware) indexed by baloo.
I open either kickoff or krunner and type some keywords which should match the filename of the desired pdf, 
hoping to see the pdf in the search results.
Most of the times, though, the desired pdf is not shown, but I get random suggestions about system settings or other stuff which clearly is not related to algebraic geometry, commutative algebra, or any other technical term.
As an added note: usually (though not always) the filename of my pdf files follows the following pattern: "Surname, Name; Surname, Name - Title.pdf".
I sometimes look for a title and sometimes for a name (or both), depending on what comes to mind first.


STEPS TO REPRODUCE
1. Open KRunner or Kickoff
2. Type part of a pdf filename
3. Be disappointed by the lack of relevant results

OBSERVED RESULT
The relevant files are not shown, but random ones plus other results are.

EXPECTED RESULT
The file looked for is shown (ideally in first position, so that it can be opened by just pressing enter, though I have no idea whether this choice would conflict with other workflows; on windows though it worked like this and I liked the workflow).

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Archlinux
KDE Plasma Version: 5.26.4
KDE Frameworks Version: 5.100.0
Qt Version: 5.15.7

ADDITIONAL INFORMATION
I have collected just one screenshot, but I can try to take more in the future, if you want more examples. This happens most of the times I try to use KRunner to look for a file, notice though that when making the same search using Dolphin the results are usually shown at the top of the list, so it should not be an indexing problem.
Comment 1 Andrea Panontin 2022-12-08 15:17:57 UTC
Created attachment 154434 [details]
Same file easily found using search in Dolphin
Comment 2 Natalie Clarius 2022-12-08 23:59:04 UTC
Do you have the desktop search plugin enabled in Plasma search system settings? If you don't you won't get any file results.

Do you have content indexing enabled in file search system settings? This can unfortunately produce lots of not very relevant results.
Comment 3 Alexander Lohnau 2022-12-09 06:58:09 UTC
Also, please check if the specific folder (or it's parent folders) is indexed.

Calling "baloosearch <your search term>" on the command line might help triage the problem to see if it is in the runner or baloo.
Comment 4 Andrea Panontin 2022-12-09 14:50:24 UTC
I have the search plugin enabled for KRunner (the one for files). Indeed I most often get results, I chose a poor example for the screenshot, I am sorry.
The file I am looking for is in the indexed folder, indeed with different queries I can find it.

I have tried some extra queries for this particular file, so here the specific setup:
I am looking for the following file (which again, is indexed):
/home/andrea/Documents/Universität/Coursework/PhD/Basis/Commutative Algebra/Eisenbud, David - Commutative Algebra with a View Toward Algebraic Geometry.pdf

Now, if I run
baloosearch Eisenbud, David - Commutative Algebra
or
baloosearch "Eisenbud, David - Commutative Algebra"

I get no results

If, instead, I look for (again using the command baloosearch) any of the strings "Eisenbud", "Eisenbud, David", "Eisenbud Commutative" or any other subset of words of the title (with no punctuation), I find it.

Now my remarks specialize:
1. It appears that as soon as I type "-" in my search query I get no results by baloo, I guess it's a bug in Baloo proper, more than in the plugins (even though if I make the same search from dolphin I find the desired file).
2. In all of the previous examples I always find the document I am looking for, but it's drowned in results where (I imagine) the words I am looking for are just part of the content of the file (I imagine in the reference section of all of these files).
3. When I look for these results in KRunner, the files search plugin is more often than not low in the search results (and the file is so low in the long list of results that it is not even shown).

Now, here are my thoughts (but I imagine this topic might have been already discussed, so I am kind of expecting to be reasons behind the behaviours I am observing):
1. Well, this I assume is just something related to regular expressions. Though using dolphin search I find results even if I write "-" in my search query. I think it would be more user friendly to have such behaviour also in KRunner and Kickoff.
2. To be fair it still has not happened to me that I was looking for a file by searching for its content. It is (at least for me, but I cannot certainly generalize it to a common behaviour) waaaay more likely that I will look for the filename of a file I am looking for. And in any case, I assume that if I am looking for the content of a file it's because I forgot its name, and I probably don't have files named in that way on my filesystem. All of this to say that I would prioritize filename matches over content matches in baloosearch, in particular in the KRunner plugin
3. Especially if I have a match with a filename, I expect this result to be at the top of the list, not beneath a long list of results which appear to have nothing to do with my query, as is the case in the two new screenshots I attached. (Now, I am realizing that a solution to one such a complaint is probably not easy, as it would require to compare priorities or relevance among different plugins of KRunner, and apparently [sadly for me] photos and text documents have priority over pdf documents. Still I think that a good filename match should be prioritized over the fact that in my master's thesis I write the words "commutative", "algebra", "with", etc a bunch of times [Here I am referring to the long list of .tex files that appear in the query "commutative algebra with a view", as of one of the new screenshots].)
Comment 5 Andrea Panontin 2022-12-09 14:52:40 UTC
Created attachment 154457 [details]
Lots of irrelevant results in the query for "commutative algebra with a view" (up to case, an exact match of a filename)
Comment 6 Andrea Panontin 2022-12-09 14:57:39 UTC
Created attachment 154458 [details]
Lots of irrelevant results in the query for "commutative algebra with a view" (up to case, an exact match of a filename)
Comment 7 tagwerk19 2022-12-09 16:27:57 UTC
(In reply to Andrea Panontin from comment #4)
> 1. It appears that as soon as I type "-" in my search query I get no results
> by baloo, I guess it's a bug in Baloo proper, more than in the plugins (even
> though if I make the same search from dolphin I find the desired file).
Have a look at Bug 388857 and Bug 434589, does seem to catch people searching for "quoted text - with hyphens" :-/
Comment 8 Andrea Panontin 2022-12-09 16:41:09 UTC
(In reply to tagwerk19 from comment #7)
> (In reply to Andrea Panontin from comment #4)
> > 1. It appears that as soon as I type "-" in my search query I get no results
> > by baloo, I guess it's a bug in Baloo proper, more than in the plugins (even
> > though if I make the same search from dolphin I find the desired file).
> Have a look at Bug 388857 and Bug 434589, does seem to catch people
> searching for "quoted text - with hyphens" :-/

You are definitely right, that part of the report is another instance of these bugs.
Now, more that a bug report, this is kind of a wishlist (I don't know whether this is intended behaviour or not).
Should I make any changes to the title or metadata to reflect more properly the current status?
Comment 9 tagwerk19 2022-12-09 16:51:33 UTC
Also catches searches for track names, as in "01 - Always In My Head".

See whether the kludge of searching for
    Eisenbud,_David_-_Commutative_Algebra 
works for you, as per:
    https://bugs.kde.org/show_bug.cgi?id=438850#c1

Replacing the spaces with underscores also means you're searching for a phrase rather than a set of words - you'd get hits for "Eisenbud, David" and not "David Eisenbud"
Comment 10 tagwerk19 2022-12-09 17:03:54 UTC
(In reply to Andrea Panontin from comment #8)
> Now, more that a bug report, this is kind of a wishlist (I don't know
> whether this is intended behaviour or not).
Difficult question :-)

My guess is it's more of an "unfortunate result" of the way things fit together. 

> Should I make any changes to the title or metadata to reflect more properly
> the current status?
I'll also reference Bug 407664. We probably should mark this as a duplicate (of one or the other of the earlier reports...)
Comment 11 Natalie Clarius 2022-12-15 14:36:04 UTC
If you don't ever try to find files by content anyway, have you considered simply disabling content indexing? Likewise, in case you never search for photos, videos and audios by file name, you might consider excluding these file types from indexing. I have the same use case as you, and this significantly mitigated the issue for me.

The main problem is that Baloo doesn't order the found files by relevance at all, it simply sorts them by date. The Krunner plugin just takes over the ranking from Baloo, except split by file type. This means that especially with content indexing enabled, lots of only remotely relevant results, such as the search term occurring somewhere in the references as you suggest, may outrank even exact name matches, and if the KRunner result list is too long, some results may be dropped entirely.  It doesn't discriminate by file type though; photos aren't ranked higher than documents per se, the relevance of a category is determined by the relevance of its individual results and this in turn is determined by the time stamp in the case of baloo search. 

Imo in the long run Baloo should get proper sorting, but maybe in the meantime we could add some basic relevance ranking logic in the runner plugin, such as giving lower relevance when the search term does not occur in the file name.
Comment 12 Natalie Clarius 2022-12-15 14:49:00 UTC
To make it clear, I agree that the sorting of files retrieved with Baloo is very bad, and that the proper solution is for developers to fix the sorting rather than for users to work around it by reducing their index. But until such an improvement exists, maybe these suggestions can make life easier for you.
Comment 13 Nate Graham 2023-01-04 20:23:52 UTC

*** This bug has been marked as a duplicate of bug 434589 ***
Comment 14 tagwerk19 2023-01-05 16:23:47 UTC
(In reply to Natalie Clarius from comment #12)
> ... sorting of files retrieved with Baloo is very bad ...
Baloo knows "what words" are used "where" in "which files". That's pretty low level and you'd need to build anything more sophisticated as a layer above this.

I can imagine you could do a series of searches, looking for the exact phrase first, and combine the results:

    A "filename" search, for the exact phrase
    A content search, for the exact phrase

    A "filename" search, for the collection of terms
    A content search, for the collection of terms

You could factor in ratings, maybe also a search for tags (xattr or embedded) or look at the full filename for matches in the parent foldername(s).

Problem is, I don't see these would give more than marginal improvements...