SUMMARY Regular expressions in Kate search are not optimized for non-Western languages (or even, any language that uses non-ASCII letters). For example, the /\w/ regex only matches ASCII letters (+ underscore), not all unicode letter characters as it does in most modern programming languages and editors. /\d/ likewise only covers ASCII digits, not digits in other scripts. This makes it exceedingly difficult to write a regex for languages with non-Latin script, or even languages like French that use a good number of non-ASCII characters. POSIX character classes are implemented but also only support ASCII characters. It would be great if Unicode classes/properties would be supported. STEPS TO REPRODUCE 1. write any text that contains non-ASCII letters, e.g., "café القهوة" 2. try to match words using /\w+/ (obviously very simplistic example, but imagine writing any regex without being able to use \w, \d, or [a-zA-Z]) OBSERVED RESULT only the letters `caf` are matched EXPECTED RESULT `café` and `القهوة` should be matched SOFTWARE/OS VERSIONS Windows: macOS: Linux/KDE Plasma: (available in About System) KDE Plasma Version: KDE Frameworks Version: Qt Version: ADDITIONAL INFORMATION
Yeah, seems we missed to set some needed flag during the QRegExp => QRegularExpression port. https://invent.kde.org/frameworks/ktexteditor/-/issues/10
Thanks for the quick action!
Git commit 675eaa6eebdbdf5437b7d150ae907283cb6ccb81 by Kåre Särs. Committed on 07/03/2021 at 09:38. Pushed by cullmann into branch 'master'. S&R: Add UseUnicodePropertiesOption to regexps To make regular expressions work properly with Unicode add UseUnicodePropertiesOption option (Search & Replace plugin) Related: bug 433673 M +4 -2 addons/search/plugin_search.cpp https://invent.kde.org/utilities/kate/commit/675eaa6eebdbdf5437b7d150ae907283cb6ccb81
Fixes in KTextEditor are there now, too: https://invent.kde.org/frameworks/ktexteditor/commit/fb35c7fd42ec6576121f3dc8cb59896133c4e433 Thanks for your report! We have really just missed that, we have now some unit test, too.
Impressive speed and teamwork on this issue - on a Sunday! Thank you very much to all involved! Will these changes be reflected in the nightly installer for Windows? Or does it take a while before they are included?
The next frameworks release 5.81 will have the fixes to the part, that means binary factory will have that in a bit over a month. The application fixes should be directly visible in the nightly builds there.