| Summary: | "Whole words" search works in a wrong way | ||
|---|---|---|---|
| Product: | [Applications] kate | Reporter: | Victor Porton <porton> |
| Component: | search | Assignee: | KWrite Developers <kwrite-bugs-null> |
| Status: | RESOLVED NOT A BUG | ||
| Severity: | normal | CC: | a.samirh78, christof.groschke, christoph, Darren.Lissimore, m.dlubakowski, simonandric5, sunwebrw |
| Priority: | NOR | Keywords: | junior-jobs |
| Version First Reported In: | 16.08 | ||
| Target Milestone: | --- | ||
| Platform: | Other | ||
| OS: | All | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: | gitdiff | ||
|
Description
Victor Porton
2017-03-07 17:00:24 UTC
Its Kate 17.8.1 already If you put $ at the start or at the end of the word it won't find the word with "Whole words" type of search. Confirmed -
- disto: KDE-Neon
- frameworks: 5.39.0
- kate 17.08.1
- Qt 5.9.1
Seems to be an artifact of how the whole words matching is done.
kateplaintextsearch.cpp: lines 53-59
...
// abuse regex for whole word plaintext search
if (m_wholeWords) {
// escape dot and friends
const QString workPattern = QStringLiteral("\\b%1\\b").arg(QRegExp::escape(text));
return KateRegExpSearch(m_document, m_caseSensitivity).search(workPattern, inputRange, backwards).at(0);
}
...
If you are searching for "$Pay" - the workPattern becomes "\\b\\$Pay\\b"
Filtering down through - KateRegExpSearch::search: kateregexpsearch.cpp#193
- you find that it boils down to a KateRegExp object (light wrapper for QRegExp).
Now the escaped search string does get down to the QRegExp at kateregexpsearch.cpp#435
where the underlaying QRegExp::indexIn() method is called.
You can boil the search code down to this little test-app;
#include <QDebug>
#include <QString>
#include <QRegExp>
int main(void)
{
QString needle = "$uri";
QString haystack = "if (syswrite(OFH, $uri , $read) != $read) {";
int index;
qDebug() << " Needle = " << needle << endl;
qDebug() << " Haystack = " << haystack << endl;
QString test = QStringLiteral("\b%1\b").arg(QRegExp::escape(needle));
QRegExp testqre;
testqre.setPattern(test);
qDebug() << " testqre.isValid() " << testqre.isValid() << endl;
index = testqre.indexIn(haystack);
qDebug() << " QRegExp - index: " << index << endl;
return 0;
}
The results of which are:
Needle = "$uri"
Haystack = "if (syswrite(OFH, $uri , $read) != $read) {"
testqre.isValid() true
QRegExp - index: -1
Now - Qt has moved from QRegExp to QRegularExpression due to significant issues with QRegExp's engine.
Unfortunately the KTextEditor search is still using QRegExp.
If someone can tweak the test-app above to actually find the needle ...
then a quick patch for the search system may be possible.
The true long-term solution would be to migrate the search to QRegularExpression ...
Hopefully converting to QRegularExpression fixes this issue. The actual conversion should be relatively easy; adding to junior tasks. I would like to fix this one, to get a start :) Created attachment 108932 [details]
gitdiff
Hello KDE developers,
I have a little question related to migrating from QRegExp to QRegularExpression in ktexteditor. I first changed only on occurrence for now, pls see the attached diff. I hope this works as a quick solution to the problem.
I would like to change every occurrence in ktexteditor. Todo that I need to change KateRegExp for example. By doing that I would touch a lot more of the code.
My question now is, what's the best way to get started here ... Change one occurrence at a time, or do them all at once ...
I appreciate the help :)
I suggest to add your patch to https://phabricator.kde.org/differential/diff/create/ and ask for feedback from Kate developers. I think a full port would be nice, if somebody has time, patches are welcome. I am afraid that porting to QRegularExpression won't fix this issue. There is some inherent problem with "\b\$" expression, if you test it with QRegularExpression, it still doesn't match. Even more, it doesn't match in any other language that I have tested (Java, JavaScript, Python). IIUC, actually with either qregexp or qregularexpression (or pcre for that matter) \w doesn't match $; perl docs[1]: <quote> Similarly, the word boundary anchor \b matches wherever a character matching \w is next to a character that doesn't, but it doesn't eat up any characters itself </quote> so '$\bPay\b' would work, but that's not how the "whole words" search mode works in ktexteditor. [1]https://perldoc.perl.org/5.30.0/perlretut.html |