Bug 143530 - canonically equivalent strings are not considered as such
Summary: canonically equivalent strings are not considered as such
Status: RESOLVED INTENTIONAL
Alias: None
Product: kde
Classification: I don't know
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR wishlist
Target Milestone: ---
Assignee: kdelibs bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-28 01:45 UTC by Denis Jacquerye
Modified: 2010-10-29 12:59 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Denis Jacquerye 2007-03-28 01:45:35 UTC
Version:            (using KDE Devel)
Installed from:    Compiled sources

KDE applications are not doing "the right thing" regarding Unicode strings.

Unicode defines canonically equivalent sequences of characters.
For example these are equivalent:
ẹ́ <U+0065 LATIN SMALL LETTER E + U+0323 COMBINING DOT BELOW + U+0301
COMBINING ACUTE ACCENT>
ẹ́ <U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT + U+0323
COMBINING DOT BELOW>
ẹ́ <U+1EB9 LATIN SMALL LETTER E WITH DOT BELOW + U+0301 COMBINING ACUTE
ACCENT>

Applications should normalize strings before comparing them. This needs to be done in sorting, searching, opening/creating/saving file.

For example if a file has one string sequence and a query is a canonically equivalent string, they should match.
Or two canonically equivalent strings should be sorted as the same string.
Creating files or directories, or other meaningful entity, should behave as if canonically equivalent strings were the same, i.e. warn the user of a conflict of name and offer to overwrite. Also, one path should access the equivalent existing one (unless both exists).

QString::normalized(QString::NormalizationForm_D) or (QString::NormalizationForm_C) should be used before comparing strings.

It might be wise to have a policy normalizing all created filenames to NFC, for better compatibility with legacy systems.

For sorting QString::NormalizationForm_KC should probably be used.

Please read http://www.w3.org/TR/charmod-norm/#sec-NormalizationMotivation for motivation and http://www.unicode.org/reports/tr15/ for more info.
Comment 1 Stephan Kulow 2007-03-28 09:37:49 UTC
I'm afraid the bug report is too vague. You will have to file a report (or better send in a patch) to every place where it's missing. 

And note that this feature is new to Qt4.
Comment 2 Denis Jacquerye 2007-03-28 09:45:12 UTC
Could this be a metabug to keep track of application specific instances of the bug?

I'm sorry but I'm not a KDE developer, don't expect me to submit patches to all KDE applications breaking Unicode canonical equivalence.
Comment 3 Stephan Kulow 2007-03-28 09:47:50 UTC
there are no metabugs
Comment 4 Denis Jacquerye 2007-03-28 10:00:14 UTC
created Bug 143539
Comment 5 Denis Jacquerye 2007-03-28 10:00:57 UTC
bug 143364 was already there for Kate search breaking Unicode canonical equivalence
Comment 6 Denis Jacquerye 2007-03-28 10:06:57 UTC
Opened Bug 143540 Konqueror breaking Unicode in searches
Comment 7 Denis Jacquerye 2007-03-28 10:12:15 UTC
opened Bug 143541 Konqueror breaking Unicode when creating/copying file/directories
Comment 8 Denis Jacquerye 2007-03-28 10:30:48 UTC
Opened Bug 143542 Kword Search: canonically equivalent strings do not match
Comment 9 Denis Jacquerye 2007-03-28 10:37:35 UTC
Opened Bug 143543: Kspread Search: canonically equivalent strings do not match
Comment 10 David Faure 2010-10-29 12:59:36 UTC
The bit about normalizing filenames should be submitted to Qt; currently it only normalizes on Mac OS X.