Bug 455995 - Regex replace: evaluate assertions before replacing
Summary: Regex replace: evaluate assertions before replacing
Status: REPORTED
Alias: None
Product: kate
Classification: Applications
Component: search (show other bugs)
Version: Git
Platform: openSUSE Linux
: NOR minor
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-06-27 01:02 UTC by Grósz Dániel
Modified: 2022-06-27 01:02 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Grósz Dániel 2022-06-27 01:02:27 UTC
SUMMARY
When using regular expressions in the Edit / Replace... feature of Kate or KWrite, the Replace All feature behaves in an unexpected way when using assertions at the beginning of the regex.

The most typical example is replacing "^ " with "": I'd expect this to remove (at most) one space from the beginning of every line, but it actually removes all initial spaces from each line.

Other examples: If I replace "(?<= )." with "", I'd expect it to remove the first character after each space, but it actually removes everything after the first space in each line. Likewise, if I replace "\b\w" with "", I'd expect it to remove the first character of each word, but it actually removes each word character.

My guess as to why this happens is that in each replacement step, Kate first preforms a replacement, then moves the cursor to the end of the replacement text (which is empty in our examples), and then performs the next search beginning from there.

Instead, when using Replace All, it should first find all instances to replace, and then perform all the replacements (or perhaps do something more efficient, but equivalent in effect). This is what other regex replacement engines seem to do, such as those of sed and javascript (at least in effect; I don't know how they are implemented).

When using the Replace button, rather than Replace All, it should probably take into account the result of previous replacements, but not the last replacement, when finding the next occurrence of the search string.

The issue doesn't occur with Kate's Search & Replace plugin, since it finds all occurrences first.

STEPS TO REPRODUCE
1. Create a file with this content:
a
␣b
␣␣c
␣␣␣d
␣␣␣␣e
2. Edit / Replace...
3. Mode: Regular expression
4. Find: ^␣
5. Leave Replace: empty
6. Replace All

OBSERVED RESULT
a
b
c
d
e

EXPECTED RESULT
a
b
␣c
␣␣d
␣␣␣e

SOFTWARE/OS VERSIONS
Kate 21.04.x, 22.04.2, git master (da4b519d2), KWrite 22.04.2
Operating System: openSUSE Tumbleweed 20220625
KDE Plasma Version: 5.25.1
KDE Frameworks Version: 5.95.0
Qt Version: 5.15.2

ADDITIONAL INFORMATION
Bug 142598 (reported on 2007-03-06, closed as fixed on 2007-08-31) probably had the same cause. I don't know if it was ever actually fixed, but if so, it broke again at some point. After I reopened that bug in June 2021, I was told it wasn't reproducible, and to open a new report for a new bug.