Bug 145686 - JJ: Large Files open slowly due to futile Syntax-Highlighting
Summary: JJ: Large Files open slowly due to futile Syntax-Highlighting
Status: RESOLVED INTENTIONAL
Alias: None
Product: kate
Classification: Applications
Component: syntax (show other bugs)
Version: unspecified
Platform: unspecified Linux
: NOR wishlist
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
: 252348 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-05-19 23:12 UTC by leonidas666
Modified: 2014-09-08 14:51 UTC (History)
8 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
New crash information added by DrKonqi (8.26 KB, text/plain)
2011-07-30 08:11 UTC, paolog
Details

Note You need to log in before you can comment on or make changes to this bug.
Description leonidas666 2007-05-19 23:12:54 UTC
Version:           2.5.5 (using KDE 3.5.5, Kubuntu (edgy) 4:3.5.5-0ubuntu3.4)
Compiler:          Target: i486-linux-gnu
OS:                Linux (i686) release 2.6.17-11-generic

I'm having some large (70MB) Prolog files, which have the suffix .pl, just like Perl-files. The Problem is that Kate tries to do the Syntax- Highlighting for Perl even for my Prolog files, which takes quite some time (it takes extra time to start up, and once the file is loaded long scrollings, e.g. going to the end of the file also take _very_ long. The speed when opening the file without syntax-highlighting is acceptable. I don't know whether it opens so slowly because the perl-Syntax-Highlighter is getting confused by the Prolog code or just because the file is so big). I'm not complaining that kate is trying to do the syntax-highlighting (afterall, it's a .pl file, so how should kate know that it is a prolog file?), it's more the implementation of the syntax highlighting. Ideally, the highlighting should be done in some background thread so that the text is displayed right away, and i can start editing etc. This would be some kind of best-effort thing, so the user gets as much syntax-highlighting as the machine is able to do (but i guess this is quite complicated to implement).
Or Kate could measure the time it takes to do the syntax-highlighting, and if it takes to long open the file without syntax-highlighting (of course this could be configured by the user: Time to give up, Never give up, Prompt the user when it takes too long).
Comment 1 Jaan Vajakas 2009-03-29 11:11:10 UTC
There is much room for optimization of the memory and CPU usage of Kate and Kwrite with large files.

An example to follow might be the SciTe text editor: I can open a 280 MB XML file with 9.4 million lines in SciTe with 1 GB RAM on Windows 2000 (I have not tried SciTe in Linux yet). SciTe shows some syntax highlighting and folding and although quite sluggish with such large files, it also lets me scroll down or edit the file. It seems that SciTe initially reads the whole file but only parses the topmost rows, which makes it open as fast as possible. When scrolling down, SciTe seems to parse as much as needed. After scrolling to the bottom, scrolling up and down is very fast, as all the file has been parsed. After opening the aforementioned file and scrolling to the bottom and waiting until SciTe has parsed the whole file, SciTe has used 80 seconds CPU time and uses 680 MB memory for the aforementioned file.

With Kate or KWrite one cannot even dream of opening so large files: even for plain text files KWrite seems to use multiple times more RAM than the file size, especially for files with short lines; in case of XML files the memory overhead is still higher and opening the files takes very long time.

I do not know how SciTe does it, but as for memory usage, Kate could avoid saving syntax highlighting and displaying information for each line but save it only for lines near the cursor and, say, each kilobyte of the remaining file. For a file with 40-byte lines, this would decrease the memory overhead by 25 times! I think a perfect memory overhead might be maybe about 20% of the file size, as such small overhead would be virtually unnoticeable (SciTe with its (680-280)/280=140% overhead is not perfect in this sense of course; would the 20% overhead be large enough to get good performance when scrolling the file?).

Acting more user-friendly during time-consuming operations would be very nice too (by the way, lack of user control during time-consuming operations is a weak point of most GUIs, including SciTe). But in addition to that, in order to be of any use for very large files, KWrite and Kate should be seriously optimized.

My votes for making KWrite/Kate able to open large XML files with reasonable CPU and memory usage.
Comment 2 Milian Wolff 2010-02-15 01:42:43 UTC
SVN commit 1090237 by mwolff:

optimize: apply RegExp optimization I know from GeSHi:
          match the whole line and cache the result, that way we can save many regexp
          calls for lines, esp. for regexps that don't match at all.
          profiling showed that it got up to 10x faster for a big mysql dump

CCBUG: 145686
CCBUG: 225228

 M  +16 -0     katehighlight.cpp  
 M  +28 -5     katehighlighthelpers.cpp  
 M  +11 -0     katehighlighthelpers.h  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1090237
Comment 3 Milian Wolff 2010-02-15 02:11:21 UTC
Could anyone of you provide me with such a big file for profiling and (hopefully) improving kate?

Thanks
Comment 4 FiNeX 2010-08-20 20:06:56 UTC
@Milian: you can create a big plain text file with a simple bash script like


#!/bin/bash
for (( x=1; x<=50000; x++ ))
do
  echo -e "Line number "$x >> FILE    
done


:-)
Comment 5 paolog 2011-07-30 08:11:55 UTC
Created attachment 62328 [details]
New crash information added by DrKonqi

opening a 10 MB XML file consisting of only 5 lines
run for > 10 h without giving back the control to the user
lost changes on other files open :-(
suggestions in order of increasing difficulty:
1) should have a mechanism to estimate the effort for the synthax highlighting and turn it off automatically when certain conditions are met, or even better ask the user whether the highlighting should be tried or not
2) introduce a rescue mechanism to allow for selectively killing one of the Kate MDI tabs; cfr one-process-per-tab approach in Chrome and Firefox browsers
3) apply synthax highlighting only on the displayed chunks of the file (for version 5.9 ?)
Comment 6 Christoph Cullmann 2012-11-07 23:21:42 UTC
*** Bug 252348 has been marked as a duplicate of this bug. ***
Comment 7 Dominik Haumann 2013-08-20 09:36:55 UTC
The reason why Bash is slow in comment #4 is because Bash uses almost exclusively RegExpr rules. The same is true for XML.

For instance, a rule like
    <RegExpr attribute="Doctype" context="Doctype" String="&lt;!DOCTYPE\s+" beginRegion="doctype" />
can be turned into
    <WordDetect attribute="Doctype" context="Doctype" String="&lt;!DOCTYPE" beginRegion="doctype" />

Generally, there is much room for improvement by using DetectSpaces, DetectWord and the likes. Any takers?

Luckily, in Qt5 the regular expressions will be much faster.
Comment 8 jimmy 2014-05-03 07:03:33 UTC
spreads from an app of printer tools in ubuntu.. dec 2013 redirects opens printers, cell phones
Comment 9 Christoph Cullmann 2014-09-08 14:51:10 UTC
We know that we can improve hl speed, this bug doesn't give any new insight, how ;)
Therefore, close.