Bug 377342 - "Whole words" search works in a wrong way
Summary: "Whole words" search works in a wrong way
Status: RESOLVED NOT A BUG
Alias: None
Product: kate
Classification: Applications
Component: search (show other bugs)
Version: 16.08
Platform: Other All
: NOR normal
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords: junior-jobs
Depends on:
Blocks:
 
Reported: 2017-03-07 17:00 UTC by Victor Porton
Modified: 2019-08-08 22:38 UTC (History)
7 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
gitdiff (1.04 KB, patch)
2017-11-17 22:42 UTC, Christof Groschke
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Victor Porton 2017-03-07 17:00:24 UTC
When I do "Whole words" search for $Pay variable in a Perl file, it finds nothing (even though the Perl file contains such variable).

Workaround: Use plain text or regex search.
Comment 1 sunwebrw 2017-09-28 13:19:17 UTC
Its Kate 17.8.1 already
If you put $ at the start or at the end of the word it won't find the word with "Whole words" type of search.
Comment 2 Darren Lissimore 2017-10-08 06:15:30 UTC
Confirmed -
- disto: KDE-Neon 
- frameworks: 5.39.0
- kate  17.08.1 
- Qt 5.9.1

Seems to be an artifact of how the whole words matching is done.

kateplaintextsearch.cpp: lines 53-59
...
    // abuse regex for whole word plaintext search
    if (m_wholeWords) {
        // escape dot and friends
        const QString workPattern = QStringLiteral("\\b%1\\b").arg(QRegExp::escape(text));

        return KateRegExpSearch(m_document, m_caseSensitivity).search(workPattern, inputRange, backwards).at(0);
    }
...

If you are searching for "$Pay" - the workPattern becomes "\\b\\$Pay\\b"
Filtering down through - KateRegExpSearch::search: kateregexpsearch.cpp#193 
- you find that it boils down to a KateRegExp object (light wrapper for QRegExp). 
Now the escaped search string does get down to the QRegExp at kateregexpsearch.cpp#435 
where the underlaying QRegExp::indexIn() method is called.

You can boil the search code down to this little test-app; 

#include <QDebug>
#include <QString>
#include <QRegExp>

int main(void)
{
    QString needle = "$uri";
    QString haystack = "if (syswrite(OFH, $uri , $read) != $read) {";
    int index;

    qDebug() << " Needle =   " << needle << endl;
    qDebug() << " Haystack = " << haystack << endl;

    QString test = QStringLiteral("\b%1\b").arg(QRegExp::escape(needle));

    QRegExp testqre;
    testqre.setPattern(test);
    qDebug() << " testqre.isValid() " << testqre.isValid() << endl;
    index = testqre.indexIn(haystack);
    qDebug() << " QRegExp - index: " << index << endl;

    return 0;
}

The results of which are: 

 Needle =    "$uri" 

 Haystack =  "if (syswrite(OFH, $uri , $read) != $read) {" 

 testqre.isValid()  true 

 QRegExp - index:  -1 


Now - Qt has moved from QRegExp to QRegularExpression due to significant issues with QRegExp's engine.
Unfortunately the KTextEditor search is still using QRegExp. 
If someone can tweak the test-app above to actually find the needle ... 
then a quick patch for the search system may be possible. 
The true long-term solution would be to migrate the search to QRegularExpression ...
Comment 3 Christoph Feck 2017-10-17 20:34:39 UTC
Hopefully converting to QRegularExpression fixes this issue. The actual conversion should be relatively easy; adding to junior tasks.
Comment 4 Christof Groschke 2017-10-28 12:48:19 UTC
I would like to fix this one, to get a start :)
Comment 5 Christof Groschke 2017-11-17 22:42:53 UTC
Created attachment 108932 [details]
gitdiff

Hello KDE developers,

I have a little question related to migrating from QRegExp to QRegularExpression in ktexteditor. I first changed only on occurrence for now, pls see the attached diff. I hope this works as a quick solution to the problem.

I would like to change every occurrence in ktexteditor. Todo that I need to change KateRegExp for example. By doing that I would touch a lot more of the code.

My question now is, what's the best way to get started here ... Change one occurrence at a time, or do them all at once ... 

I appreciate the help :)
Comment 6 Christoph Feck 2017-11-29 17:23:06 UTC
I suggest to add your patch to https://phabricator.kde.org/differential/diff/create/ and ask for feedback from Kate developers.
Comment 7 Christoph Cullmann 2019-07-13 22:59:45 UTC
I think a full port would be nice, if somebody has time, patches are welcome.
Comment 8 Marcin Dłubakowski 2019-07-16 11:22:31 UTC
I am afraid that porting to QRegularExpression won't fix this issue. There is some inherent problem with "\b\$" expression, if you test it with QRegularExpression, it still doesn't match. Even more, it doesn't match in any other language that I have tested (Java, JavaScript, Python).
Comment 9 Ahmad Samir 2019-08-08 22:38:12 UTC
IIUC, actually with either qregexp or qregularexpression (or pcre for that matter) \w doesn't match $; perl docs[1]:
<quote>
Similarly, the word boundary anchor \b matches wherever a character matching \w is next to a character that doesn't, but it doesn't eat up any characters itself
</quote>

so '$\bPay\b' would work, but that's not how the "whole words" search mode works in  ktexteditor.

[1]https://perldoc.perl.org/5.30.0/perlretut.html