Bug 455865 - Quick Open: search for contiguous string
Summary: Quick Open: search for contiguous string
Status: RESOLVED FIXED
Alias: None
Product: kate
Classification: Applications
Component: general (show other bugs)
Version: 22.04.1
Platform: openSUSE Linux
: NOR wishlist
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-06-23 20:33 UTC by Grósz Dániel
Modified: 2022-06-28 18:37 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Current Quick Open search (249.63 KB, image/png)
2022-06-24 02:36 UTC, Yerrey Dev
Details
After proposed change (253.78 KB, image/png)
2022-06-24 02:36 UTC, Yerrey Dev
Details
Screenshots of behavior in 22.04.1, master (da4b519d2) and VS Codium (366.60 KB, image/png)
2022-06-26 23:32 UTC, Grósz Dániel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Grósz Dániel 2022-06-23 20:33:25 UTC
Please make it easier to search for a string as a contiguous substring of a file name in the Quick Open tool.

Currently, the tool displays any file that contains the characters entered in the same order, not necessarily as a contiguous substring. This is sometimes useful, but it makes it difficult to search for a short word in a big project, because it sometimes also find dozens of files unrelated to the word whose name happens to contain its letters.

You could add an option to the context menu, but I think it would be even better to just list files that contain the text as a substring first (if there are any), and then the files that only contain the characters non-contiguously.

SOFTWARE/OS VERSIONS
Operating System: openSUSE Tumbleweed 20220622
KDE Plasma Version: 5.25.0
KDE Frameworks Version: 5.95.0
Qt Version: 5.15.2
Comment 1 Yerrey Dev 2022-06-24 02:35:01 UTC
Even adding something basic into kate/apps/lib/quickopen/katequickopen.cpp filterAcceptsRow() such as:

```
if (name.contains(fileNameMatchPattern, Qt::CaseInsensitive)) {
            score += 100;
            score += (sm->isOpened(sourceRow))*100;
}
```

Would bump exact matches above the fuzzy searches, and further the ones that are open above those. Even with a flat 100 score increase, in some specific situations the fuzzy match has higher score than the exact match, so the number could be even higher. This would only fix the Quick Open search, so I'm not sure if there is a way to fix this more generally or even in the fuzzy_match itself. Pictures for reference of this change attached below.
Comment 2 Yerrey Dev 2022-06-24 02:36:02 UTC
Created attachment 150109 [details]
Current Quick Open search
Comment 3 Yerrey Dev 2022-06-24 02:36:42 UTC
Created attachment 150110 [details]
After proposed change
Comment 4 Waqar Ahmed 2022-06-24 07:03:36 UTC
> Even adding something basic into kate/apps/lib/quickopen/katequickopen.cpp filterAcceptsRow() such as:
> > ```
> if (name.contains(fileNameMatchPattern, Qt::CaseInsensitive)) {
>             score += 100;
>             score += (sm->isOpened(sourceRow))*100;
> }
> ```
> Would bump exact matches above the fuzzy searches, and further the ones that are open above those.

This is not acceptable because it will massively slow down things.

> Currently, the tool displays any file that contains the characters entered in the same order, not necessarily as a contiguous substring. This is sometimes useful, but it makes it difficult to search for a short word in a big project, because it sometimes also find dozens of files unrelated to the word whose name happens to contain its letters.

Contigous matches are ranked higher if the match is at the beginning of the string. But if doesn't match at the beginning then it may not get ranked as high as there might be matches where an abbreviation match happens which gets ranked higher. For short strings, abbreviations matches are more likely to happen so contigous matches get pushed down in the list unless they are at the beginning.

So, in short, it is kind of working as expected. However, any concrete suggestions for improvements are always welcome. It would also be nice if you could compare the results of Sublime/VScode on the same project.
Comment 5 Waqar Ahmed 2022-06-24 11:36:10 UTC
Git commit 7cd1ef981fbf465f68e2d238d4142917868cb9e9 by Waqar Ahmed.
Committed on 24/06/2022 at 11:35.
Pushed by waqar into branch 'master'.

Fix cancelling out of unmatchedLettersPenalty

This needed to be a multiplication, not an addition as addition
unconditionally adds points to the score messing up the results.

M  +1    -1    apps/lib/quickopen/katequickopen.cpp

https://invent.kde.org/utilities/kate/commit/7cd1ef981fbf465f68e2d238d4142917868cb9e9
Comment 6 Waqar Ahmed 2022-06-24 11:37:11 UTC
Please try again with latest master if possible, the linked commit fixes a bug that should improve this.
Comment 7 Yerrey Dev 2022-06-26 21:03:21 UTC
(In reply to Waqar Ahmed from comment #6)
> Please try again with latest master if possible, the linked commit fixes a
> bug that should improve this.

There is no difference, as I understand that change only increases score for exact matches in the root which wasn't the issue here. I tried to benchmark the code I posted previously with some improvements on a project with 240.000 files, and you're right, there is a very small but noticeable delay when you type something in the search and it registering. Maybe someone smarter than me knows how to implement the same effect with negligible performance hit, as I feel that exact matches to your search should always be prioritized above fuzzy results.
Comment 8 Waqar Ahmed 2022-06-26 21:12:50 UTC
> There is no difference, as I understand

Did you actually try or did you just saw the change in the code and assumed the result? And btw no, you understood the change  

If you actually did try, can you give me sample test where exact match fails. Note that, the search in the linked image i.e., "liface", is producing correct results as per the algorithm. Abbreviation match is preferred in that case to the match at the end of the filename.
Comment 9 Yerrey Dev 2022-06-26 21:21:17 UTC
(In reply to Waqar Ahmed from comment #8)
> Did you actually try or did you just saw the change in the code and assumed
> the result?
> If you actually did try, can you give me sample test where exact match
> fails. Note that, the search in the linked image i.e., "liface", is
> producing correct results as per the algorithm. Abbreviation match is
> preferred in that case to the match at the end of the filename.

Yes, I tested it and when I said "exact match" what I mean is a continuous string. The "liface" example is the exact one I tried, as in my opinion a continuous string should always be shown above non-continuous strings. If you have typed something into the search, I would expect it to recommend me results that are a match to what I'm searching instead of 1 matching letter here, 2 here etc. I don't know if my expectations are totally out of the ordinary.
Comment 10 Christoph Cullmann 2022-06-26 21:29:20 UTC
For me the master branch seems to behave well in this regard.

e.g. if I search for some XXXX at least for me it seems that first I get matches with full XXXX (perhaps case insensitive) and only the fuzzy stuff below that.

But perhaps I did something wrong (or special).
Comment 11 Grósz Dániel 2022-06-26 23:32:03 UTC
Created attachment 150177 [details]
Screenshots of behavior in 22.04.1, master (da4b519d2) and VS Codium

I tried it with the latest master (da4b519d2, after 7cd1ef981fbf465f68e2d238d4142917868cb9e9). It changes the results, but it's only a marginal improvement.

In the attached screenshot montage, I'm searching for the word "store" in a project with 1484 files. In 22.04.1, the first substring match isn't even visible without scrolling. In da4b519d2, the first substring match is the 16th result, and it's still preceded by many results where "store" can't be reasonably considered an abbreviation of the file name. What most results preceding the substring matches have in common is that the *first* letter of the search string is the first letter of the file name.

In contrast, VS Codium (a build of the VS Code sources) behaves like I'd expect.
Comment 12 Christoph Cullmann 2022-06-28 18:37:25 UTC
Git commit ef62ffdda41904ccc0edcecf9deb322f8b298bfa by Christoph Cullmann, on behalf of Waqar Ahmed.
Committed on 28/06/2022 at 18:31.
Pushed by cullmann into branch 'master'.

Score fully in sequence matches higher if pattern len >= 4

If pattern is >= 4 and an exact match happens, score it higher.

Doing it for smaller strings might lead to worse results as the shorter
a string is, the more likely it is that it will be part of some other
string.

M  +7    -0    apps/lib/kfts_fuzzy_match.h

https://invent.kde.org/utilities/kate/commit/ef62ffdda41904ccc0edcecf9deb322f8b298bfa