Summary: | Lack of unicode normalisation in search | ||
---|---|---|---|
Product: | [Applications] kate | Reporter: | Médéric Boquien <mboquien> |
Component: | search | Assignee: | KWrite Developers <kwrite-bugs-null> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | christoph, fonts-bugs, kwrite-bugs-null, moyogo |
Priority: | NOR | Keywords: | triaged |
Version: | SVN | ||
Target Milestone: | --- | ||
Platform: | openSUSE | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: | Testcase |
Description
Médéric Boquien
2007-03-22 22:26:00 UTC
Created attachment 20074 [details]
Testcase
This file (encoded in UTF-8) comtains the word école written twice. On the
first line, é is the composed glyph while on the second line it is the
precomposed glyph. Searching using the composed glyph it only finds the second
line while it should find both lines.
The problem is still present in KDE SVN trunk (r813532). Hello. Apparently the problem is in katetextline.cpp. Indeed in the KateTextLine::searchText method, the text and the search strings are compared without normalising them first. In this case if the 2 strings are canonically equivalent but with a different normalisation (NFC vs NFD for instance) then they won't match. The first step to solve the bug would be to substitude m_text by m_text.normalized(QString::NormalizationForm_C) and text by text.normalized(QString::NormalizationForm_C) for instance so they have the same normalisation. However, this fix is apparently not enough. I think the problem comes with the index which is not the same depending on the normalisation. One possibility would be perhaps to compare the length of the search string before and after the normalisation and adapt the index accordingly afterwards. Thanks. I am NOT the assignee for, nor have I any interest in, the following bugs: 143364, 223151, 200577, 210685, 142832, 172473, 188115, 220614, 196085, and anything whatsoever to do with Kate. I edited my email preferences at https://bugs.kde.org/userprefs.cgi?tab=email to receive no email, but it keeps rolling in. Apparently someone is using my email. I would like to stop receiving what to me is spam. Thanks for any assistance. -David Bush Actually, this needs to be fixed in Qt. We need QString functions that allow us to search for stuff normalized, conversion on our own will only lead to a lot of corner case problems. Could you report this to qt-project.org? Sure other people have the same issues. (In reply to comment #5) > Actually, this needs to be fixed in Qt. We need QString functions that allow > us to search for stuff normalized, conversion on our own will only lead to a > lot of corner case problems. Could you report this to qt-project.org? Sure > other people have the same issues. Do you mean QString::normalized() http://qt-project.org/doc/qt-4.8/qstring.html#normalized ? No, i mean functions to search/replace normalized. We can't apply a regex or string search normalized with current qt api. We may normalize the string we search for, but we would then need to normalize the string we search in, too, which will change colums of matches for original string. There is no sane way to implement this with current Qt api. Dear Bug Submitter, This bug has been in NEEDSINFO status with no change for at least 15 days. Please provide the requested information as soon as possible and set the bug status as REPORTED. Due to regular bug tracker maintenance, if the bug is still in NEEDSINFO status with no change in 30 days, the bug will be closed as RESOLVED > WORKSFORME due to lack of needed information. For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging If you have already provided the requested information, please set the bug status as REPORTED so that the KDE team knows that the bug is ready to be confirmed. Thank you for helping us make KDE software even better for everyone! Dear Bug Submitter, This bug has been in NEEDSINFO status with no change for at least 30 days. The bug is now closed as RESOLVED > WORKSFORME due to lack of needed information. For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging Thank you for helping us make KDE software even better for everyone! |