SUMMARY Hi! Thanks for making baloo, the solid indexing features are one of the things that make linux a better suited for me. I do, however, have some issues with baloo indexer. Sadly, it treats similar (national, like Polish) letters as different- sometimes. While they are different, it is not uncommon to avoid using them in filenames to save yourself some problems (admitedly, it's more of the old behaviour, as code pages were a massive PITA). Anyway, letters 'l' and 'ł' ('l' with a stroke) are not considered similar enough (even when 'l' is sometimes used when using 'ł' is inconvenient), yet letters 'e' and 'ę' ('e' with a tail) are considered similar (i.e. the search results for words with 'ę' /like: "się"/ also include phrase 'sie'). Operating System: Kubuntu 21.04 KDE Plasma Version: 5.21.4 KDE Frameworks Version: 5.80.0 Qt Version: 5.15.2 Kernel Version: 5.11.0-16-generic OS Type: 64-bit Graphics Platform: X11 Processors: 4 × AMD PRO A12-9800B R7, 12 COMPUTE CORES 4C+8G Memory: 14.6 GiB of RAM Graphics Processor: AMD Radeon R7 Graphics kde installed from the official repository
Baloo relies on decomposition according to the Unicode standard. E.g. the letter ä has an equivalent decomposition 'a + diaresis' (diaresis: "dots"). 'ł' has no equivalent. You can see all the equivalents either in the Unicode standard, or with KCharSelect. If you think this is wrong, please report it to the Unicode consortium. Baloo is not able to and thus wont maintain a list of exceptions to the ever evolving Unicode standard.