Summary: | "Whole words" search works in a wrong way | ||
---|---|---|---|
Product: | [Applications] kate | Reporter: | Victor Porton <porton> |
Component: | search | Assignee: | KWrite Developers <kwrite-bugs-null> |
Status: | RESOLVED NOT A BUG | ||
Severity: | normal | CC: | a.samirh78, christof.groschke, christoph, Darren.Lissimore, m.dlubakowski, simonandric5, sunwebrw |
Priority: | NOR | Keywords: | junior-jobs |
Version: | 16.08 | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | All | ||
Latest Commit: | Version Fixed In: | ||
Attachments: | gitdiff |
Description
Victor Porton
2017-03-07 17:00:24 UTC
Its Kate 17.8.1 already If you put $ at the start or at the end of the word it won't find the word with "Whole words" type of search. Confirmed - - disto: KDE-Neon - frameworks: 5.39.0 - kate 17.08.1 - Qt 5.9.1 Seems to be an artifact of how the whole words matching is done. kateplaintextsearch.cpp: lines 53-59 ... // abuse regex for whole word plaintext search if (m_wholeWords) { // escape dot and friends const QString workPattern = QStringLiteral("\\b%1\\b").arg(QRegExp::escape(text)); return KateRegExpSearch(m_document, m_caseSensitivity).search(workPattern, inputRange, backwards).at(0); } ... If you are searching for "$Pay" - the workPattern becomes "\\b\\$Pay\\b" Filtering down through - KateRegExpSearch::search: kateregexpsearch.cpp#193 - you find that it boils down to a KateRegExp object (light wrapper for QRegExp). Now the escaped search string does get down to the QRegExp at kateregexpsearch.cpp#435 where the underlaying QRegExp::indexIn() method is called. You can boil the search code down to this little test-app; #include <QDebug> #include <QString> #include <QRegExp> int main(void) { QString needle = "$uri"; QString haystack = "if (syswrite(OFH, $uri , $read) != $read) {"; int index; qDebug() << " Needle = " << needle << endl; qDebug() << " Haystack = " << haystack << endl; QString test = QStringLiteral("\b%1\b").arg(QRegExp::escape(needle)); QRegExp testqre; testqre.setPattern(test); qDebug() << " testqre.isValid() " << testqre.isValid() << endl; index = testqre.indexIn(haystack); qDebug() << " QRegExp - index: " << index << endl; return 0; } The results of which are: Needle = "$uri" Haystack = "if (syswrite(OFH, $uri , $read) != $read) {" testqre.isValid() true QRegExp - index: -1 Now - Qt has moved from QRegExp to QRegularExpression due to significant issues with QRegExp's engine. Unfortunately the KTextEditor search is still using QRegExp. If someone can tweak the test-app above to actually find the needle ... then a quick patch for the search system may be possible. The true long-term solution would be to migrate the search to QRegularExpression ... Hopefully converting to QRegularExpression fixes this issue. The actual conversion should be relatively easy; adding to junior tasks. I would like to fix this one, to get a start :) Created attachment 108932 [details]
gitdiff
Hello KDE developers,
I have a little question related to migrating from QRegExp to QRegularExpression in ktexteditor. I first changed only on occurrence for now, pls see the attached diff. I hope this works as a quick solution to the problem.
I would like to change every occurrence in ktexteditor. Todo that I need to change KateRegExp for example. By doing that I would touch a lot more of the code.
My question now is, what's the best way to get started here ... Change one occurrence at a time, or do them all at once ...
I appreciate the help :)
I suggest to add your patch to https://phabricator.kde.org/differential/diff/create/ and ask for feedback from Kate developers. I think a full port would be nice, if somebody has time, patches are welcome. I am afraid that porting to QRegularExpression won't fix this issue. There is some inherent problem with "\b\$" expression, if you test it with QRegularExpression, it still doesn't match. Even more, it doesn't match in any other language that I have tested (Java, JavaScript, Python). IIUC, actually with either qregexp or qregularexpression (or pcre for that matter) \w doesn't match $; perl docs[1]: <quote> Similarly, the word boundary anchor \b matches wherever a character matching \w is next to a character that doesn't, but it doesn't eat up any characters itself </quote> so '$\bPay\b' would work, but that's not how the "whole words" search mode works in ktexteditor. [1]https://perldoc.perl.org/5.30.0/perlretut.html |