Bug 161213

Summary: Extreme memory usage when searching for text in large PDF
Product: [Applications] okular Reporter: Dustin Vaselaar <dustin.vaselaar>
Component: generalAssignee: Okular developers <okular-devel>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version: 0.6.3   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Dustin Vaselaar 2008-04-24 00:15:31 UTC
Version:           0.6.3 (using KDE 4.0.3)
Installed from:    Ubuntu Packages
OS:                Linux

Hello,
When searching for uncommon text using the "Find" function in large PDF files such as:
http://sagemath.org/doc/paper-letter/ref.pdf
I experience extreme memory usage.

For example when searching for the word "abracadabra" the virtual and resident memory increase from approximately 100 and 32 MB respectively, to greater than 550 and 450 MB (I stopped the test at that point otherwise my computer would become unresponsive.)
Comment 1 Pino Toscano 2008-05-01 21:03:33 UTC
SVN commit 803048 by pino:

Internally replace a TextEntity with a "lighter version", that stores the raw UTF-16 data of the text.
This way, we can save about 4 int's for each text entity; this is not much for small documents,
but with big documents with lots of text (eg, the PDF specs) we can save a lot (more than 50MB!).

CCBUG: 161213


 M  +84 -29    textpage.cpp  
 M  +8 -8      textpage_p.h  


WebSVN link: http://websvn.kde.org/?view=rev&revision=803048
Comment 2 Albert Astals Cid 2008-05-04 17:11:03 UTC
SVN commit 803949 by aacid:

limit the number of text pages we keep in memory so that searching does not bring your system to its knees

BUG: 161213


 M  +46 -0     core/document.cpp  
 M  +5 -0      core/document_p.h  
 M  +18 -2     core/generator.cpp  
 M  +5 -0      core/generator.h  
 M  +13 -3     generators/poppler/generator_pdf.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=803949