Bug 323263

Summary: Search fails if backtracking needed
Product: [Applications] okular Reporter: Jaan Vajakas <jaanvajakas>
Component: generalAssignee: Okular developers <okular-devel>
Severity: normal CC: aacid
Priority: NOR    
Version: 0.17.60   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed In: 4.12.0
Attachments: backtracking.pdf referred to in Steps to Reproduce

Description Jaan Vajakas 2013-08-07 18:40:57 UTC
Looking at the implementation of TextPagePrivate::findTextInternalForward and TextPagePrivate::findTextInternalBackward in textpage.cpp, it seems that no backtracking is done, which can sometimes make the search fail. E. g. if the user searches for "ab" and the document consists of the string "aabaab" then Okular detects that the first "a" in "aabaab" matches the first letter in "ab", but the first two letters "aa" in "aabaab" do not match "ab", and thus discards the first letters "aa" and starts searching for "ab" in the last four letters "baab" (which it also fails, for the same reason). But actually it should backtrack to the second letter in the document and start searching for "ab" in "baab". Or one could avoid backtracking the entity iterator by using the Knuth-Morris-Pratt algorithm (http://www.ics.uci.edu/~eppstein/161/960227.html), which would also give better performance for searching long strings in long documents (but maybe not for short search strings).

Reproducible: Always

Steps to Reproduce:
1. Open the attached file backtracking.pdf in Okular.
2. Press F3.
3. Type fast "ab" (do not pause after "a", otherwise Okular will first search for "a").

Actual Results:  
Okular finds no match (the search text box turns reddish).

Expected Results:  
Okular should highlight the first "aab".
Comment 1 Jaan Vajakas 2013-08-07 18:41:40 UTC
Created attachment 81600 [details]
backtracking.pdf referred to in Steps to Reproduce
Comment 2 Jaan Vajakas 2013-08-07 18:43:32 UTC
Sorry for a typo in the description. It should be

But actually it should backtrack to the second letter in the document and start searching for "ab" in "abaab".
Comment 3 Jaan Vajakas 2013-08-07 18:49:16 UTC
Apologies again, it seems I'm getting tired. The Expected Results should read

Okular should highlight the first "ab" in "aabaab".
Comment 4 Albert Astals Cid 2013-08-07 18:55:23 UTC
You seem to know what you're talking about, any chance you can produce a patch?
Comment 5 Jaan Vajakas 2013-08-07 19:55:29 UTC
Yes, I will try. The Knuth-Morris-Pratt approach seems more interesting, so I think I will try it out and see how much code it takes and how fast it performs. That patch would then solve Bug #323262 as well.
Comment 6 Albert Astals Cid 2013-08-07 20:14:54 UTC
Awesome, please upload your patch to http://reviewboard.kde.org/ when ready to review.
Comment 7 Albert Astals Cid 2013-10-18 14:44:19 UTC
Git commit dff8bf1b365f28c6a7b7d97e8ca2a0e579738619 by Albert Astals Cid, on behalf of Jaan Vajakas.
Committed on 18/10/2013 at 14:28.
Pushed by aacid into branch 'master'.

Improve searching code

Also simplified code a bit by removing unnecessary calls to toLower in TextPagePrivate::findTextInternalForward and TextPagePrivate::findTextInternalBackward I also fixed a small bug: the letter capital I with dot above (U+0130) did not match itself in case-insensitive mode on Qt 4.8.4 (U+0130 still does not match lowercase i (U+0069), which can be considered another bug, that I didn't fix (although this behavior conforms to the Unicode case folding rules)).

(I did not implement the Knuth-Morris-Pratt algorithm that I promised in a comment of Bug 323263 because on second thought I find that the win, if any, would probably be negligible except for some very special documents and special query strings.)
Related: bug 323262
REVIEW: 112135

M  +155  -136  core/textpage.cpp
M  +11   -5    core/textpage_p.h
M  +307  -1    tests/searchtest.cpp
M  +1    -0    ui/videowidget.cpp