Bug 418636

Summary: Search/Replace exceeds physical ram on 5mb file
Product: [Applications] kate Reporter: Henry Pfeil <hpfeil>
Component: searchAssignee: KWrite Developers <kwrite-bugs-null>
Status: RESOLVED NOT A BUG    
Severity: major CC: kare.sars
Priority: NOR    
Version First Reported In: 19.12.2   
Target Milestone: ---   
Platform: PCLinuxOS   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Henry Pfeil 2020-03-08 18:38:20 UTC
SUMMARY
Stop button unresponsive during search/replace on a 80K-line text file.  KInfoCenter showed swap partition in use. Replace reports processing 27 million found, which should be only 5k. KSysGuard showed 18 threads, watched ram use increase from 10.1gb to 10.7gb, when I killed it.

STEPS TO REPRODUCE
1. Capture ldd output into text file.
2. Search for " \(0x" regular expression
3. 

OBSERVED RESULT
0 of 27,095,586 processed, should be only 5,078. System briefly froze as the unused swap partition invoked. Physical ram allocation went from 4gb out of 14.7gb to  10gb+,leaving only 200kb unallocated. Kate unresponsive. Had to kill it as more processes went to swap and kate allocation increased 

EXPECTED RESULT
Replace (0x with newline for sed deletion

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: 64-bit
(available in About System)
KDE Plasma Version: 5.18.0
KDE Frameworks Version:  5.67.0
Qt Version: 5.13.2

ADDITIONAL INFORMATION
Slackware post 14.2 -current
Processors: 8 × AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
Memory: 14.7 GiB of RAM
Comment 1 Henry Pfeil 2020-03-09 15:09:36 UTC
There appears to be more regex expressions that induce this runaway memory allocation as the number of matches increases. "Processed 0 of x matches" where x increases beyond the number of lines in the file, eventually increasing beyond the number of characters in the file. 
This began as my feeble knowledge of regexes tried to match a space followed by, for example, (0x00007ffc8f6a4000), without resorting to 
sed '/ \(0x[0-9]|[a-f]+\)/d' I thought if I just replaced the " (0x" with a linefeed, I could just sed '/00/d' and be done with it. Then I noticed that searching for "\/usr\/lib64" also invoked the runaway, only slower, so I could stop the search after Kate's memory footprint rose to 2gb. Could there be a loop in Kate's regex code that does not reach an end condition?
Comment 2 Henry Pfeil 2020-03-09 17:47:38 UTC
I noticed that the regular expression does not anchor the other expressions in the search criterion. The character class [a-f]+ matches every uppercase and lowercase abcdef in the file. The expression [0-9]+ finds zero matches. The expression [0-9a-f]+ matches every lowercase a..f preceded by a space. I'm going back to "O'Reilly Mastering Regular Expressions". My regex is wrong. Sorry for reporting my incorrect expressions as a bug. I thought [0-9] would match any number. Sorry about that.
Comment 3 Henry Pfeil 2020-03-09 18:33:44 UTC
I failed to set the case-sensitive button, the upper-case letters was a red-herring. Unable to find files with missing library dependencies, I'm trying to match everything but a filename (a line containing a colon) and the "not found" string. This is not a Kate problem. Please mark this as solved and move on. I got rid of the kernel addresses with the expression \(..................\). I apologise for troubling you with this matter.
Comment 4 Kåre Särs 2020-03-09 19:06:45 UTC
Thanks for the report

I actually found a bug when testing this bug report. Replacing the " \(x0" with "(" never finishes....
Comment 5 Henry Pfeil 2020-03-09 23:52:04 UTC
From https://doc/qt.io/qt-5/qregexp.html:

Note: In Qt 5, the new QRegularExpression class provides a Perl compatible implementation of regular expressions and is recommended in place of QRegExp.

How do you turn off search-while-typing? In a large file, such as slackware64/MANIFEST, a search for "braindump" halts at "brain" as it hunts for matches such as terminfo/s/superbrain or musicbrainz. I prefer to enter the complete search pattern, rather than wait for a long time while search hunts for useless substrings.
Comment 6 Kåre Särs 2020-03-10 16:35:00 UTC
Hmm.. that is very strange. Search as you type, searches one line at a time and if the search exceeds 100ms, it will stop and display a warning that the search was interrupted. I had absolutely no problems with the 54MB MANIFEST file I found on a slackware mirror.