Bug 365124 - Regex capture groups for text fails beyond \9
Summary: Regex capture groups for text fails beyond \9
Status: RESOLVED FIXED
Alias: None
Product: kate
Classification: Applications
Component: general (show other bugs)
Version: 3.14.3
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Burkhard Lück
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-05 20:22 UTC by Kyle Neal
Modified: 2016-12-04 17:25 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kyle Neal 2016-07-05 20:22:35 UTC
Regex capture groups for text fails beyond \9, but works for numbers. In other words, \10 is interpreted as capture group \1 with 0 appended. But for numbers it is correctly interpreted as capture group \10

To demonstrate, here are some examples.

(with numbers, works correctly)

text:
1 2 3 4 5 6 7 8 9 10 11 12 13

regex:
^([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*)$

replace with:
\1::\2::\3::\4::\5::\6::\7::\8::\9::\10::\11::\12::\13

result:
1::2::3::4::5::6::7::8::9::10::11::12::13

(with text, does not work)

text:
one two three four five six seven eight nine ten eleven twelve thirteen

regex:
^(.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*)$

replace with:
\1::\2::\3::\4::\5::\6::\7::\8::\9::\10::\11::\12::\13

result:
one::two::three::four::five::six::seven::eight::nine::one0::one1::one2::one3



For the text example, you get the same result even when the regex is:
^([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+)$

Reproducible: Always

Steps to Reproduce:
1. Enter text
2. Enter regex
3. Enter replacement
4. Click 'Replace All'

Actual Results:  
one::two::three::four::five::six::seven::eight::nine::one0::one1::one2::one3

Expected Results:  
one::two::three::four::five::six::seven::eight::nine::ten::eleven::twelve::thirteen
Comment 1 Kåre Särs 2016-07-06 06:06:15 UTC
I can confirm that this is the case both for the built-in search and the plugin :(
Comment 2 Christoph Cullmann 2016-09-08 17:14:03 UTC
Git commit c06dc8d8d63f2c852a4b0adcd012bd2fa2b37e69 by Christoph Cullmann.
Committed on 08/09/2016 at 17:11.
Pushed by cullmann into branch 'master'.

Bug 365124 - Regex capture groups for text fails beyond \9 (edit)

first part of fix: fix the search in files plugin

M  +7    -3    addons/search/replace_matches.cpp

http://commits.kde.org/kate/c06dc8d8d63f2c852a4b0adcd012bd2fa2b37e69
Comment 3 Christoph Cullmann 2016-09-08 17:39:11 UTC
Git commit a3c1bab8c301ae4af84a57b7d6bc2753bec40e7d by Christoph Cullmann.
Committed on 08/09/2016 at 17:38.
Pushed by cullmann into branch 'master'.

Bug 365124 - Regex capture groups for text fails beyond \9 (edit)

CHANGELOG: Support regular expressions replaces with captures > \9, e.g. \111

M  +1    -1    autotests/src/searchbar_test.cpp
M  +16   -3    src/search/kateregexpsearch.cpp

http://commits.kde.org/ktexteditor/a3c1bab8c301ae4af84a57b7d6bc2753bec40e7d
Comment 4 Christoph Cullmann 2016-09-08 20:24:48 UTC
See:

Git commit 650f1a3a854fa9a27b9ffab563306327f8aa5c1a by Christoph Cullmann.
Committed on 08/09/2016 at 20:17.
Pushed by cullmann into branch 'master'.

support multi char captures only in {xxx} to avoid regressions

M  +1    -1    autotests/src/searchbar_test.cpp
M  +14   -2    src/search/kateregexpsearch.cpp

http://commits.kde.org/ktexteditor/650f1a3a854fa9a27b9ffab563306327f8aa5c1a

diff --git a/autotests/src/searchbar_test.cpp b/autotests/src/searchbar_test.cpp
index c5b8e69..e3c568e 100644
--- a/autotests/src/searchbar_test.cpp
+++ b/autotests/src/searchbar_test.cpp
@@ -632,7 +632,7 @@ void SearchBarTest::testReplaceManyCapturesBug365124()
 
     bar.setSearchPattern("^(.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*)$");
     bar.setSearchMode(KateSearchBar::MODE_REGEX);
-    bar.setReplacementPattern("\\1::\\2::\\3::\\4::\\5::\\6::\\7::\\8::\\9::\\10::\\11::\\12::\\13");
+    bar.setReplacementPattern("\\{1}::\\2::\\3::\\4::\\5::\\6::\\7::\\8::\\9::\\{10}::\\{11}::\\{12}::\\{13}");
 
     bar.replaceAll();
 
diff --git a/src/search/kateregexpsearch.cpp b/src/search/kateregexpsearch.cpp
index 1ba7abd..2eea756 100644
--- a/src/search/kateregexpsearch.cpp
+++ b/src/search/kateregexpsearch.cpp
@@ -555,6 +555,7 @@ QVector<KTextEditor::Range> KateRegExpSearch::search(
                 }
                 break;
 
+            // single letter captures
             case L'1':
             case L'2':
             case L'3':
@@ -564,8 +565,15 @@ QVector<KTextEditor::Range> KateRegExpSearch::search(
             case L'7':
             case L'8':
             case L'9': {
-                // allow 1212124.... captures, see bug 365124 + testReplaceManyCapturesBug365124
-                int capture = 9 - (L'9' - text[input + 1].unicode());
+                out << ReplacementStream::cap(9 - (L'9' - text[input + 1].unicode()));
+                input += 2;
+                break;
+            }
+
+            // multi letter captures
+            case L'{': {
+                // allow {1212124}.... captures, see bug 365124 + testReplaceManyCapturesBug365124
+                int capture = 0;
                 int captureSize = 2;
                 while ((input + captureSize) < inputLen) {
                     const ushort nextDigit = text[input + captureSize].unicode();
@@ -574,6 +582,10 @@ QVector<KTextEditor::Range> KateRegExpSearch::search(
                         ++captureSize;
                         continue;
                     }
+                    if (nextDigit == L'}') {
+                        ++captureSize;
+                        break;
+                    }
                     break;
                 }
                 out << ReplacementStream::cap(capture);
Comment 5 Christoph Cullmann 2016-09-08 20:31:02 UTC
To avoid problems like "how to use \1 with a 2 behind without it is a \12", we now use \{12} as syntax for multi-digit matches (which didn't work before at all)

e.g. 

\{12}1 => use capture 12
\123 => use capture 1, rest ist normal text
Comment 6 Christoph Cullmann 2016-09-08 20:49:42 UTC
Could we add hints to \1 and \{13212} style capture references to the docs?
Comment 7 Burkhard Lück 2016-12-04 17:25:00 UTC
done by Yuri Chornoivan with
https://commits.kde.org/kate/2cc3de482b791aee99e651d613d8cd4924b17f23