Looking at the implementation of TextPagePrivate::findTextInternalForward and TextPagePrivate::findTextInternalBackward in textpage.cpp, it seems that no backtracking is done, which can sometimes make the search fail. E. g. if the user searches for "ab" and the document consists of the string "aabaab" then Okular detects that the first "a" in "aabaab" matches the first letter in "ab", but the first two letters "aa" in "aabaab" do not match "ab", and thus discards the first letters "aa" and starts searching for "ab" in the last four letters "baab" (which it also fails, for the same reason). But actually it should backtrack to the second letter in the document and start searching for "ab" in "baab". Or one could avoid backtracking the entity iterator by using the Knuth-Morris-Pratt algorithm (http://www.ics.uci.edu/~eppstein/161/960227.html), which would also give better performance for searching long strings in long documents (but maybe not for short search strings).
Steps to Reproduce:
1. Open the attached file backtracking.pdf in Okular.
2. Press F3.
3. Type fast "ab" (do not pause after "a", otherwise Okular will first search for "a").
Okular finds no match (the search text box turns reddish).
Okular should highlight the first "aab".
Created attachment 81600 [details]
backtracking.pdf referred to in Steps to Reproduce
Sorry for a typo in the description. It should be
But actually it should backtrack to the second letter in the document and start searching for "ab" in "abaab".
Apologies again, it seems I'm getting tired. The Expected Results should read
Okular should highlight the first "ab" in "aabaab".
You seem to know what you're talking about, any chance you can produce a patch?
Yes, I will try. The Knuth-Morris-Pratt approach seems more interesting, so I think I will try it out and see how much code it takes and how fast it performs. That patch would then solve Bug #323262 as well.
Awesome, please upload your patch to http://reviewboard.kde.org/ when ready to review.
Git commit dff8bf1b365f28c6a7b7d97e8ca2a0e579738619 by Albert Astals Cid, on behalf of Jaan Vajakas.
Committed on 18/10/2013 at 14:28.
Pushed by aacid into branch 'master'.
Improve searching code
Also simplified code a bit by removing unnecessary calls to toLower in TextPagePrivate::findTextInternalForward and TextPagePrivate::findTextInternalBackward I also fixed a small bug: the letter capital I with dot above (U+0130) did not match itself in case-insensitive mode on Qt 4.8.4 (U+0130 still does not match lowercase i (U+0069), which can be considered another bug, that I didn't fix (although this behavior conforms to the Unicode case folding rules)).
(I did not implement the Knuth-Morris-Pratt algorithm that I promised in a comment of Bug 323263 because on second thought I find that the win, if any, would probably be negligible except for some very special documents and special query strings.)
Related: bug 323262
M +155 -136 core/textpage.cpp
M +11 -5 core/textpage_p.h
M +307 -1 tests/searchtest.cpp
M +1 -0 ui/videowidget.cpp