163297 – Faster repeat searching of large documents

Bug 163297 - Faster repeat searching of large documents

Summary: Faster repeat searching of large documents

Status:	CONFIRMED

Alias:	None

Product:	okular
Classification:	Applications
Component:	general (show other bugs)
Version:	unspecified
Platform:	unspecified Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Okular developers

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-06-05 20:17 UTC by Robert Knight
Modified:	2020-05-08 19:08 UTC (History)
CC List:	5 users (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Robert Knight 2008-06-05 20:17:10 UTC

Version:            (using Devel)

Search performance in Okular seems to have improved somewhat recently - thank-you for that!  Searching large PDFs (several hundred pages or more, eg. standards documents) can still take quite a long time though, independent of whether searches have been performed before on the document.

Perhaps Okular could build up an index (or use some other technique) to speed up multiple searches on the same document?

Comment 1 Albert Astals Cid 2008-06-06 17:27:14 UTC

Actually it's slow now than two months ago, since before we cached all the pages text after the first page and now we only cache N pages because the previous setup meant exausting system memory quite easily, if you want to have faster searches set memory options to aggressive so more pages are kept cached on memory.

Comment 2 Benjamin Lutz 2008-07-31 11:21:19 UTC

I find search performance unsatisfactory too, especially compared with KPDF. Take this rather large document for example: http://www.adobe.com/products/postscript/pdfs/PLRM.pdf . On KPDF, the first search (in the "thumbnails" pane) takes around 10 seconds on my machine, successive searches are nearly instantaneous. In Okular however, every search takes around 24 seconds, no matter whether it's the first one or not.

The memory usage policy in both programs is "aggressive". Actual memory usage after 5 searches in the above document is 171MB for KPDF and 65MB for Okular; I'm thinking that Okular is just being to modest here; if I tell it that it's ok to gobble up memory in order to provide faster responses, then it should just go ahead and do that.

In other words: I'd really like to get KPDF's instant searches back.

Comment 3 Philipp A. 2020-05-08 19:08:09 UTC

Hello to 12 years ago!

I wrote a quick script to check how fast searching is when building an index first from nothing, and found that it’s nearly as slow as an *early* search in Okular.

At some point, Okular seems to have built a text search index, but it takes much longer than the 21 seconds my script uses. I had Okular open for at least some minutes before the search got fast (maybe even longer, I spent half an hour to write the script).

Maybe the index only gets built once the visual page cache is full or so? Tell me if I’m wrong, but I think we could improve search by building the index ASAP.

https://gist.github.com/flying-sheep/27f99747f85abb20bab7dc732abe3f6a

    $ ./pdf_search.py '/home/phil/Dropbox/RPG/DSA/DSA 5/VR7 - Aventurische Magie III (2018).pdf' Geode
    Index time : 0:00:21.465361
    Search time: 0:00:00.001145
    [31, 32, 33, 34, 35, 36, 37, 38, 126, 166]