Bug 459586 - Application Launcher and KRunner should strip away stop words
Summary: Application Launcher and KRunner should strip away stop words
Status: RESOLVED FIXED
Alias: None
Product: krunner
Classification: Plasma
Component: general (show other bugs)
Version: 5.25.5
Platform: unspecified Linux
: NOR wishlist
Target Milestone: ---
Assignee: Plasma Bugs List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-09-24 05:17 UTC by Slavi
Modified: 2022-10-31 15:14 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Example of stop words (like "to") yielding a bunch of unhelpful keyword matches in Application Launcher (243.91 KB, image/png)
2022-09-24 05:17 UTC, Slavi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Slavi 2022-09-24 05:17:15 UTC
Created attachment 152380 [details]
Example of stop words (like "to") yielding a bunch of unhelpful keyword matches in Application Launcher

SUMMARY

If stop words (https://en.wikipedia.org/wiki/Stop_word) are filtered out of search queries, results would be  much better.

When one is using the Application Launcher or KRunner to query for `1000 USD to EUR` (for example), the `to` part of this query matches various unrelated things like "Display Configuration", "Night Color", "Device Actions", "Autostart", "Windows Shares", etc.

While the ordering of the actual results differs (it's better in KRunner than in Application Launcher, courtesy of https://bugs.kde.org/show_bug.cgi?id=431204), both suffer from the same problem of matching things that they'd better not match at all.


STEPS TO REPRODUCE
1. Open Application Launcher or KRunner
2. Query for `1000 USD to EUR`

OBSERVED RESULT

There are various unnecessary results that appear to be matches to the `to` keyword.

EXPECTED RESULT

Stop words (like `to`, `the`, etc.) could be stripped away from keyword-based queries.

SOFTWARE/OS VERSIONS
KDE Plasma Version:  5.25.5
KDE Frameworks Version: 5.98.0
Qt Version: 5.15.6
Comment 1 Natalie Clarius 2022-10-04 00:33:26 UTC
It looks like in the specific example you cited, in all of the not very relevant results "to" occurs inside one of the match words: "deskTOp", "moniTOr", "auTOmatic", etc. So filtering out stop words in the sense that an exact match with a common word like "to" or "the" shouldn't bring up the result wouldn't help here since 1. it is not a word match but a substring match, and always removing all results that contain "to" somewhere would be a bad idea, 2. the problem would occur similarly with common sequences of letters that don't correspond to any common words.

I do agree though that the results you're showing don't make a lot of sense for the query, so one possible solution would be to make the system settings runner fire only when the match is longer than, say, 4 letters.
Comment 2 Slavi 2022-10-04 05:10:33 UTC
(In reply to Natalie Clarius from comment #1)
> It looks like in the specific example you cited, in all of the not very
> relevant results "to" occurs inside one of the match words: "deskTOp",
> "moniTOr", "auTOmatic", etc. So filtering out stop words in the sense that
> an exact match with a common word like "to" or "the" shouldn't bring up the
> result wouldn't help here since 1. it is not a word match but a substring
> match, and always removing all results that contain "to" somewhere would be
> a bad idea, 2. the problem would occur similarly with common sequences of
> letters that don't correspond to any common words.
> 
> I do agree though that the results you're showing don't make a lot of sense
> for the query, so one possible solution would be to make the system settings
> runner fire only when the match is longer than, say, 4 letters.

I don't think people who search for "to" (only) will want to get a match for "moniTOr" or "auTOmatic". These substring match results also seem useless, don't they?

I do not expect that doing a (insert search engine name) search for "car" would give me results for "myoCARditis" - that's completely irrelevant. Prefix-matches, maybe. Substring and fuzzy string matches, it seems pointless.
Comment 3 Natalie Clarius 2022-10-26 18:02:52 UTC
It may not seem very useful for languages like English where most self-contained chunks of meaning are separated by whitespace, but in German I'd definitely want "dekoration" to find "Fensterdekoration" (window decoration) for example, so I'm not sure it would be a good idea to remove in-/suffix matching entirely. Then again, we do have keywords for system settings that will be considered in the search. 

Coming back to your example though, in the actual KRunner (Alt+F2) popup and overview search, the results are correctly ordered, with the conversion the top result. So this mostly comes down to the kickoff search having its own strange sorting logic:  https://bugs.kde.org/show_bug.cgi?id=431204.
Comment 4 Natalie Clarius 2022-10-28 13:30:06 UTC
I addressed this now in https://invent.kde.org/plasma/systemsettings/-/merge_requests/165
Comment 5 Nate Graham 2022-10-31 15:14:13 UTC
Git commit 3bcf867b39b10a98e1207314003d37ba48698ab7 by Nate Graham, on behalf of Natalie Clarius.
Committed on 31/10/2022 at 15:09.
Pushed by ngraham into branch 'master'.

runner: check for containing matches only when query word is more than 3 characters long

M  +1    -1    runner/systemsettingsrunner.cpp

https://invent.kde.org/plasma/systemsettings/commit/3bcf867b39b10a98e1207314003d37ba48698ab7