Regex capture groups for text fails beyond \9, but works for numbers. In other words, \10 is interpreted as capture group \1 with 0 appended. But for numbers it is correctly interpreted as capture group \10 To demonstrate, here are some examples. (with numbers, works correctly) text: 1 2 3 4 5 6 7 8 9 10 11 12 13 regex: ^([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*) ([0-9]*)$ replace with: \1::\2::\3::\4::\5::\6::\7::\8::\9::\10::\11::\12::\13 result: 1::2::3::4::5::6::7::8::9::10::11::12::13 (with text, does not work) text: one two three four five six seven eight nine ten eleven twelve thirteen regex: ^(.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*)$ replace with: \1::\2::\3::\4::\5::\6::\7::\8::\9::\10::\11::\12::\13 result: one::two::three::four::five::six::seven::eight::nine::one0::one1::one2::one3 For the text example, you get the same result even when the regex is: ^([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+) ([^:]+)$ Reproducible: Always Steps to Reproduce: 1. Enter text 2. Enter regex 3. Enter replacement 4. Click 'Replace All' Actual Results: one::two::three::four::five::six::seven::eight::nine::one0::one1::one2::one3 Expected Results: one::two::three::four::five::six::seven::eight::nine::ten::eleven::twelve::thirteen
I can confirm that this is the case both for the built-in search and the plugin :(
Git commit c06dc8d8d63f2c852a4b0adcd012bd2fa2b37e69 by Christoph Cullmann. Committed on 08/09/2016 at 17:11. Pushed by cullmann into branch 'master'. Bug 365124 - Regex capture groups for text fails beyond \9 (edit) first part of fix: fix the search in files plugin M +7 -3 addons/search/replace_matches.cpp http://commits.kde.org/kate/c06dc8d8d63f2c852a4b0adcd012bd2fa2b37e69
Git commit a3c1bab8c301ae4af84a57b7d6bc2753bec40e7d by Christoph Cullmann. Committed on 08/09/2016 at 17:38. Pushed by cullmann into branch 'master'. Bug 365124 - Regex capture groups for text fails beyond \9 (edit) CHANGELOG: Support regular expressions replaces with captures > \9, e.g. \111 M +1 -1 autotests/src/searchbar_test.cpp M +16 -3 src/search/kateregexpsearch.cpp http://commits.kde.org/ktexteditor/a3c1bab8c301ae4af84a57b7d6bc2753bec40e7d
See: Git commit 650f1a3a854fa9a27b9ffab563306327f8aa5c1a by Christoph Cullmann. Committed on 08/09/2016 at 20:17. Pushed by cullmann into branch 'master'. support multi char captures only in {xxx} to avoid regressions M +1 -1 autotests/src/searchbar_test.cpp M +14 -2 src/search/kateregexpsearch.cpp http://commits.kde.org/ktexteditor/650f1a3a854fa9a27b9ffab563306327f8aa5c1a diff --git a/autotests/src/searchbar_test.cpp b/autotests/src/searchbar_test.cpp index c5b8e69..e3c568e 100644 --- a/autotests/src/searchbar_test.cpp +++ b/autotests/src/searchbar_test.cpp @@ -632,7 +632,7 @@ void SearchBarTest::testReplaceManyCapturesBug365124() bar.setSearchPattern("^(.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*) (.*)$"); bar.setSearchMode(KateSearchBar::MODE_REGEX); - bar.setReplacementPattern("\\1::\\2::\\3::\\4::\\5::\\6::\\7::\\8::\\9::\\10::\\11::\\12::\\13"); + bar.setReplacementPattern("\\{1}::\\2::\\3::\\4::\\5::\\6::\\7::\\8::\\9::\\{10}::\\{11}::\\{12}::\\{13}"); bar.replaceAll(); diff --git a/src/search/kateregexpsearch.cpp b/src/search/kateregexpsearch.cpp index 1ba7abd..2eea756 100644 --- a/src/search/kateregexpsearch.cpp +++ b/src/search/kateregexpsearch.cpp @@ -555,6 +555,7 @@ QVector<KTextEditor::Range> KateRegExpSearch::search( } break; + // single letter captures case L'1': case L'2': case L'3': @@ -564,8 +565,15 @@ QVector<KTextEditor::Range> KateRegExpSearch::search( case L'7': case L'8': case L'9': { - // allow 1212124.... captures, see bug 365124 + testReplaceManyCapturesBug365124 - int capture = 9 - (L'9' - text[input + 1].unicode()); + out << ReplacementStream::cap(9 - (L'9' - text[input + 1].unicode())); + input += 2; + break; + } + + // multi letter captures + case L'{': { + // allow {1212124}.... captures, see bug 365124 + testReplaceManyCapturesBug365124 + int capture = 0; int captureSize = 2; while ((input + captureSize) < inputLen) { const ushort nextDigit = text[input + captureSize].unicode(); @@ -574,6 +582,10 @@ QVector<KTextEditor::Range> KateRegExpSearch::search( ++captureSize; continue; } + if (nextDigit == L'}') { + ++captureSize; + break; + } break; } out << ReplacementStream::cap(capture);
To avoid problems like "how to use \1 with a 2 behind without it is a \12", we now use \{12} as syntax for multi-digit matches (which didn't work before at all) e.g. \{12}1 => use capture 12 \123 => use capture 1, rest ist normal text
Could we add hints to \1 and \{13212} style capture references to the docs?
done by Yuri Chornoivan with https://commits.kde.org/kate/2cc3de482b791aee99e651d613d8cd4924b17f23