Bug 143364 - Lack of unicode normalisation in search
Summary: Lack of unicode normalisation in search
Status: RESOLVED WORKSFORME
Alias: None
Product: kate
Classification: Applications
Component: search (show other bugs)
Version: SVN
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords: triaged
Depends on:
Blocks:
 
Reported: 2007-03-22 22:25 UTC by Médéric Boquien
Modified: 2018-10-27 02:44 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Testcase (15 bytes, text/plain)
2007-03-22 22:48 UTC, Médéric Boquien
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Médéric Boquien 2007-03-22 22:26:00 UTC
Version:            (using KDE KDE 3.5.5)
Installed from:    SuSE RPMs

The search command in kwrite doesn't do unicode normalisation. For instance if one searches a precomposed gylph, it won't find this glyph if in the document this glyph is composed. As an example, e with acute can have two representations. One using the precomposed glyph (U+00E9), the other one using e (U+0065) and the combining acute (U+0301).

The search should be independant of the normalisation and should find a letter whatever its representation is in the document.
Comment 1 Médéric Boquien 2007-03-22 22:48:03 UTC
Created attachment 20074 [details]
Testcase

This file (encoded in UTF-8) comtains the word école written twice. On the
first line, é is the composed glyph while on the second line it is the
precomposed glyph. Searching using the composed glyph it only finds the second
line while it should find both lines.
Comment 2 Médéric Boquien 2008-05-28 05:46:44 UTC
The problem is still present in KDE SVN trunk (r813532).
Comment 3 Médéric Boquien 2008-07-18 02:07:23 UTC
Hello.

Apparently the problem is in katetextline.cpp. Indeed in the KateTextLine::searchText method, the text and the search strings are compared without normalising them first. In this case if the 2 strings are canonically equivalent but with a different normalisation (NFC vs NFD for instance) then they won't match. The first step to solve the bug would be to substitude m_text by m_text.normalized(QString::NormalizationForm_C) and text by text.normalized(QString::NormalizationForm_C) for instance so they have the same normalisation. However, this fix is apparently not enough. I think the problem comes with the index which is not the same depending on the normalisation. One possibility would be perhaps to compare the length of the search string before and after the normalisation and adapt the index accordingly afterwards. 

Thanks.
Comment 4 David Bush 2010-02-20 18:59:11 UTC
I am NOT the assignee for, nor have I any interest in, the following bugs: 143364, 223151, 200577, 210685, 142832, 172473, 188115, 220614, 196085, and anything whatsoever to do with Kate. I edited my email preferences at https://bugs.kde.org/userprefs.cgi?tab=email to receive no email, but it keeps rolling in. Apparently someone is using my email. I would like to stop receiving what to me is spam. Thanks for any assistance. -David Bush
Comment 5 Christoph Cullmann 2012-11-01 16:08:05 UTC
Actually, this needs to be fixed in Qt. We need QString functions that allow us to search for stuff normalized, conversion on our own will only lead to a lot of corner case problems. Could you report this to qt-project.org? Sure other people have the same issues.
Comment 6 Denis Jacquerye 2012-11-01 16:54:14 UTC
(In reply to comment #5)
> Actually, this needs to be fixed in Qt. We need QString functions that allow
> us to search for stuff normalized, conversion on our own will only lead to a
> lot of corner case problems. Could you report this to qt-project.org? Sure
> other people have the same issues.

Do you mean QString::normalized() http://qt-project.org/doc/qt-4.8/qstring.html#normalized ?
Comment 7 Christoph Cullmann 2012-11-01 17:50:47 UTC
No, i mean functions to search/replace normalized. We can't apply a regex or string search normalized with current qt api. We may normalize the string we search for, but we would then need  to normalize the string we search in, too, which will change colums of matches for original string. There is no sane way to implement this with current Qt api.
Comment 8 Andrew Crouthamel 2018-09-23 02:44:43 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least 15 days. Please provide the requested information as soon as possible and set the bug status as REPORTED. Due to regular bug tracker maintenance, if the bug is still in NEEDSINFO status with no change in 30 days, the bug will be closed as RESOLVED > WORKSFORME due to lack of needed information.

For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please set the bug status as REPORTED so that the KDE team knows that the bug is ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 9 Andrew Crouthamel 2018-10-27 02:44:09 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least 30 days. The bug is now closed as RESOLVED > WORKSFORME due to lack of needed information.

For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

Thank you for helping us make KDE software even better for everyone!