Bug 455995

Summary: Regex replace: evaluate assertions before replacing
Product: [Applications] kate Reporter: Grósz Dániel <groszdanielpub>
Component: searchAssignee: KWrite Developers <kwrite-bugs-null>
Status: REPORTED ---    
Severity: minor    
Priority: NOR    
Version: Git   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Grósz Dániel 2022-06-27 01:02:27 UTC
SUMMARY
When using regular expressions in the Edit / Replace... feature of Kate or KWrite, the Replace All feature behaves in an unexpected way when using assertions at the beginning of the regex.

The most typical example is replacing "^ " with "": I'd expect this to remove (at most) one space from the beginning of every line, but it actually removes all initial spaces from each line.

Other examples: If I replace "(?<= )." with "", I'd expect it to remove the first character after each space, but it actually removes everything after the first space in each line. Likewise, if I replace "\b\w" with "", I'd expect it to remove the first character of each word, but it actually removes each word character.

My guess as to why this happens is that in each replacement step, Kate first preforms a replacement, then moves the cursor to the end of the replacement text (which is empty in our examples), and then performs the next search beginning from there.

Instead, when using Replace All, it should first find all instances to replace, and then perform all the replacements (or perhaps do something more efficient, but equivalent in effect). This is what other regex replacement engines seem to do, such as those of sed and javascript (at least in effect; I don't know how they are implemented).

When using the Replace button, rather than Replace All, it should probably take into account the result of previous replacements, but not the last replacement, when finding the next occurrence of the search string.

The issue doesn't occur with Kate's Search & Replace plugin, since it finds all occurrences first.

STEPS TO REPRODUCE
1. Create a file with this content:
a
␣b
␣␣c
␣␣␣d
␣␣␣␣e
2. Edit / Replace...
3. Mode: Regular expression
4. Find: ^␣
5. Leave Replace: empty
6. Replace All

OBSERVED RESULT
a
b
c
d
e

EXPECTED RESULT
a
b
␣c
␣␣d
␣␣␣e

SOFTWARE/OS VERSIONS
Kate 21.04.x, 22.04.2, git master (da4b519d2), KWrite 22.04.2
Operating System: openSUSE Tumbleweed 20220625
KDE Plasma Version: 5.25.1
KDE Frameworks Version: 5.95.0
Qt Version: 5.15.2

ADDITIONAL INFORMATION
Bug 142598 (reported on 2007-03-06, closed as fixed on 2007-08-31) probably had the same cause. I don't know if it was ever actually fixed, but if so, it broke again at some point. After I reopened that bug in June 2021, I was told it wasn't reproducible, and to open a new report for a new bug.