Bug 161213

Summary:	Extreme memory usage when searching for text in large PDF
Product:	[Applications] okular	Reporter:	Dustin Vaselaar <dustin.vaselaar>
Component:	general	Assignee:	Okular developers <okular-devel>
Status:	RESOLVED FIXED
Severity:	normal
Priority:	NOR
Version:	0.6.3
Target Milestone:	---
Platform:	Ubuntu
OS:	Linux
Latest Commit:		Version Fixed In:
Sentry Crash Report:

Description Dustin Vaselaar 2008-04-24 00:15:31 UTC

Version:           0.6.3 (using KDE 4.0.3)
Installed from:    Ubuntu Packages
OS:                Linux

Hello,
When searching for uncommon text using the "Find" function in large PDF files such as:
http://sagemath.org/doc/paper-letter/ref.pdf
I experience extreme memory usage.

For example when searching for the word "abracadabra" the virtual and resident memory increase from approximately 100 and 32 MB respectively, to greater than 550 and 450 MB (I stopped the test at that point otherwise my computer would become unresponsive.)

Comment 1 Pino Toscano 2008-05-01 21:03:33 UTC

SVN commit 803048 by pino:

Internally replace a TextEntity with a "lighter version", that stores the raw UTF-16 data of the text.
This way, we can save about 4 int's for each text entity; this is not much for small documents,
but with big documents with lots of text (eg, the PDF specs) we can save a lot (more than 50MB!).

CCBUG: 161213


 M  +84 -29    textpage.cpp  
 M  +8 -8      textpage_p.h  


WebSVN link: http://websvn.kde.org/?view=rev&revision=803048

Comment 2 Albert Astals Cid 2008-05-04 17:11:03 UTC

SVN commit 803949 by aacid:

limit the number of text pages we keep in memory so that searching does not bring your system to its knees

BUG: 161213


 M  +46 -0     core/document.cpp  
 M  +5 -0      core/document_p.h  
 M  +18 -2     core/generator.cpp  
 M  +5 -0      core/generator.h  
 M  +13 -3     generators/poppler/generator_pdf.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=803949