Bug 318232 - Regular expression search cannot match newline and end-of-line markers at the same time
Summary: Regular expression search cannot match newline and end-of-line markers at the...
Status: RESOLVED FIXED
Alias: None
Product: kate
Classification: Applications
Component: search (show other bugs)
Version: 16.04.1
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-12 10:15 UTC by Todd
Modified: 2021-05-25 11:01 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Todd 2013-04-12 10:15:23 UTC
In the kate regular expression search, you can search for newlines with "\n", and the beginning and end of lines with "^" and "$", respectively.

However, with the current version (which ships with KDE SC 4.10.2), you cannot do both at the same time.  That is, you cannot search using a regular expression that includes both "\n" and "^" or "$".

Reproducible: Always

Steps to Reproduce:
1. Open kate
2. Put the following text in:

test123
test
test
123test

3. Open find and replace.
4. Switch to regular expression mode
5. Search for "\ntest\n" (no quotes)
6. Search for "^test$" (no quotes)
7. Search for "\ntest$" (no quotes)
8. Search for "^test\n" (no quotes)
Actual Results:  
5 and 6 match lines 2 and 3
7 and 8 don't match anything

Expected Results:  
5, 6, 7, and 8 all match lines 2 and 3
Comment 1 Kåre Särs 2013-04-12 13:22:44 UTC
Hi,

This is the situation:
In searches without "\n" Kate searches one line at a time (one line is one QString) and in searches containing even one "\n" the whole document is copied to one large QString.

QRegExp treats '^' as the beginning of the string and '$' as the end of the string.

try:
9. search for "^.*.*$" (no quotes)
10. search for "^.*\n.*$" (no quotes)

Results:
9 matches all lines separately
10 matches everything in one match

Does anybody have a good solution for this?
Comment 2 Todd 2013-04-12 13:58:55 UTC
What if, under the hood, you replace each ^ or $ with a \n, feed that to QRegExp, then remove the \n from the resulting match?
Comment 3 Martin Walch 2014-09-09 19:46:09 UTC
Just encountered something very similar in Kate 3.14.0, that shows another aspect of the problem. Test case (\n == newline):

\nk\n

So, it is a file with a blank line in the beginning, followed by a line with only a k, and finally an empty third line. Searching for "^k", "k$, "^k$", or "k\n" yields the desired results of exactly one match for each pattern. However, the result for "^k\n" depends on the position of the cursor. If the cursor is at the beginning of the second line, then the pattern matches. If the cursor is somewhere else, it does not.
Comment 4 Buovjaga 2016-06-19 13:26:36 UTC
Update:

(In reply to Todd from comment #0)
> 5. Search for "\ntest\n" (no quotes)

Matches the end of line 1 and the whole line 2 including the end of line.

> 6. Search for "^test$" (no quotes)

Matches lines 2 and 3, not end of lines.

> 7. Search for "\ntest$" (no quotes)

Matches the end of line 1 and lines 2 and 3. End of line 3 not included.

> 8. Search for "^test\n" (no quotes)

Doesn't match anything.

Arch Linux 64-bit
Kate 16.04.2
KDE Frameworks 5.22.0
Qt 5.6.1
xcb wm
Comment 5 Doncho N. Gunchev 2020-04-12 19:50:29 UTC
'\n': Just updated to kate-19.12.1-1.fc31.x86_64, Fedora 31 and it works again. Whatever the previous version was it did not work.

'$' matches the end of each line.

'^' matches the start of each line, except in the case where you also use '\n' when it matches only the start of the text/file.
Comment 6 Justin Zobel 2020-10-30 05:49:48 UTC
(In reply to Doncho N. Gunchev from comment #5)
> '\n': Just updated to kate-19.12.1-1.fc31.x86_64, Fedora 31 and it works
> again. Whatever the previous version was it did not work.
> 
> '$' matches the end of each line.
> 
> '^' matches the start of each line, except in the case where you also use
> '\n' when it matches only the start of the text/file.

Doncho can you please confirm if this issue is now resolved and all regex works as per expected from the original report?
Comment 7 Doncho N. Gunchev 2020-11-04 14:46:17 UTC
Nope, the behavior is still strange / inconsistent. For example

'^.*\.$'  - matches all lines ending with dot.
'^.*\.\n' - matches only the first line of the file if it ends with a dot. - BAD

'^test$'  - matches all lines containing "test"
'^test\n' - matches only if the first line of the file is "test" - BAD

'\ntest\n' - matches lines containing "test" and skips the next one (since the current match consumes the '\n' after "test" and the next one can not match. Also a help message explaining what is allowed (positive/negative look ahead/back for example, capture groups) would be very helpful IMHO.

In short, having '\n' and '$' or '^' in single regex makes it work for the first line only (I guess about '$', can't make '\n.*$' match anything).
Comment 8 Doncho N. Gunchev 2020-11-04 15:03:35 UTC
Tested with kate-20.08.1-1.fc33.x86_64, Fedora RPMs.

BTW I disagree with the original report about "\ntest\n" matching two consecutive lines containing "test" - after matching the first one there won't be '\n' for the next one's pattern start, empty line between them will be needed... '\n' consumes one character, '^' and '$' do not.
Comment 9 Kåre Särs 2020-11-05 11:16:37 UTC
This is unfortunately a known issue that is not fixed :(

It would be nice to get a patch to address the issue if possible ;)
Comment 10 Kåre Särs 2021-05-25 11:01:29 UTC
Git commit dd06221f675eb420281df173a6f429a3b5236a41 by Kåre Särs.
Committed on 25/05/2021 at 07:01.
Pushed by cullmann into branch 'master'.

S&R: Fix matching ^ and $ in multi-line expressions

M  +16   -6    addons/search/plugin_search.cpp

https://invent.kde.org/utilities/kate/commit/dd06221f675eb420281df173a6f429a3b5236a41