Bug 436791 - baloo treats similar letters as different - follows Unicode standard
Summary: baloo treats similar letters as different - follows Unicode standard
Status: RESOLVED NOT A BUG
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.80.0
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Stefan Brüns
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-08 19:36 UTC by Amanda99
Modified: 2021-05-08 19:48 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Amanda99 2021-05-08 19:36:38 UTC
SUMMARY

Hi!
Thanks for making baloo, the solid indexing features are one of the things that make linux a better suited for me. 

I do, however, have some issues with baloo indexer.

Sadly, it treats similar (national, like Polish) letters as different- sometimes. While they are different, it is not uncommon to avoid using them in filenames to save yourself some problems (admitedly, it's more of the old behaviour, as code pages were a massive PITA). Anyway, letters 'l' and 'ł' ('l' with a stroke) are not considered similar enough (even when 'l' is sometimes used when using 'ł' is inconvenient), yet letters 'e' and 'ę' ('e' with a tail) are considered similar (i.e. the search results for words with 'ę' /like: "się"/ also include phrase 'sie').

Operating System: Kubuntu 21.04
KDE Plasma Version: 5.21.4
KDE Frameworks Version: 5.80.0
Qt Version: 5.15.2
Kernel Version: 5.11.0-16-generic
OS Type: 64-bit
Graphics Platform: X11
Processors: 4 × AMD PRO A12-9800B R7, 12 COMPUTE CORES 4C+8G
Memory: 14.6 GiB of RAM
Graphics Processor: AMD Radeon R7 Graphics

kde installed from the official repository
Comment 1 Stefan Brüns 2021-05-08 19:48:03 UTC
Baloo relies on decomposition according to the Unicode standard. E.g. the letter ä has an equivalent decomposition 'a + diaresis' (diaresis: "dots"). 'ł' has no equivalent.

You can see all the equivalents either in the Unicode standard, or with KCharSelect.

If you think this is wrong, please report it to the Unicode consortium. Baloo is not able to and thus wont maintain a list of exceptions to the ever evolving Unicode standard.