Bug 324617

Summary: KDE text search (ex: filtering lists, krunner ) to enhance diacritics matching
Product: [Plasma] krunner Reporter: Radek Koníček <kozyla>
Component: filesearchAssignee: baloo-bugs-null
Status: CONFIRMED ---    
Severity: wishlist CC: alexander.lohnau, bugseforuns, eddymcv, gaantonio, lukas, niccolo.venerandi, plasma-bugs, vivians88, volkangezer
Priority: NOR Keywords: usability
Version: 5.19.2   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Radek Koníček 2013-09-07 13:06:30 UTC
I have an idea that search dialogs such as krunner or any filter fields across kde would support enhanced diacritics matching. 
So that non-diacritic input would match all diacritic variants of text.

Reproducible: Always

Steps to Reproduce:
1.have KDE in languge whose alphabet contains diacritics (ENGLISH has not it! :|) like Slavic languages.
open any text filter dialog like System settings OR krunner.
2. you are about to find "systém" OR "system" string.
3. you type "system"
Actual Results:  
only "system" string matches

Expected Results:  
both "system", "systém" and other variants (ěë) would match.

This feature exists in Wind0ws 7.
Comment 1 Lukas Kucharczyk 2020-06-09 11:04:34 UTC
This issue still exists and it is confusing because no other platform I've used differentiates between something like system and systém. Is there any reason why it shouldn't be as easy to solve as stripping the diacritics from words by converting it to ASCII?
Comment 2 Lukas Kucharczyk 2020-07-04 07:53:33 UTC
Sorry for my previous comment, it sounded too confrontational.
Comment 3 Lukas Kucharczyk 2020-07-04 08:09:55 UTC
Tested it just now and both "system" and "systém" work. But for example pisma (== "fonts" in Czech) doesn't work but search for "font" finds "Písma" so I'm thinking in some cases it is the case where diacritics are accounted for and in other cases it doesn't find keywords.
Comment 4 Alexander Lohnau 2020-07-10 20:25:32 UTC
*** Bug 316077 has been marked as a duplicate of this bug. ***
Comment 5 Alexander Lohnau 2020-07-11 15:26:58 UTC
*** Bug 414689 has been marked as a duplicate of this bug. ***
Comment 6 Alexander Lohnau 2020-07-12 06:27:02 UTC
*** Bug 328763 has been marked as a duplicate of this bug. ***
Comment 7 Alexander Lohnau 2020-07-12 06:28:15 UTC
A similar patch has been made to baloo quite some time ago:
https://invent.kde.org/frameworks/baloo/commit/59318e9694c0847bcaa5e71a4fbadde877e7a33e
Comment 8 Alexander Lohnau 2020-08-31 15:57:22 UTC
*** Bug 426017 has been marked as a duplicate of this bug. ***
Comment 9 Alexander Lohnau 2020-10-31 09:16:08 UTC
I am not sure how this should be implemented, should the diacritics be removed like in the baloo patch or should we make sure that we check both the stripped and normal variant for matches?

Maybe a user whose languages actually uses diacritics can comment :)
Comment 10 veggero 2020-10-31 09:52:39 UTC
Hi! I think it is necessary to also search a non-stripped version. This is because the meaning can change leading to different results especially in file searches. As an example, italian "e" translates to "and", while "è" translates to "is".
Comment 11 Lukas Kucharczyk 2020-10-31 10:15:55 UTC
I agree with the above comment. Both need to be searched.