Version: (using KDE 4.1.4)
Installed from: Ubuntu Packages
Here are two words: ﬁre, fire.
They are identical, except that the first one contains a ligature bringing the f and the i together. The second one has no ligature. The first is the standard in proper typesetting, and it is the default output of LaTeX.
If I search for "fire" in Okular, the word will not be found, because Okular doesn't understand the ligature. By way of comparison, Adobe Reader for Linux does understand the ligature, and finds the word.
This can lead to great frustration.
I imagine that this applies to all documents in Okular, rather than being specific to the PDF backend. There are a handful of other common ligatures that this applies to (see http://en.wikipedia.org/wiki/Typographic_ligature).
I have the same problem with Version 0.8 on KDE 4.2.
In case anyone needs an example-PDF:
Try to search for "config" in
I can also confirm this bug as well.
If you search for 'config' on the PDF mentioned in Jaruh's link, the first result returned will be on page 7. However, the first result returned should be on the bottom of page 3, where 'configurations' is written.
I made sure the 'use case sensitive' and 'from current page' options were NOT enabled.
I can conﬁrm this bug.
KDE Version 0.8.2 (KDE 4.2.2 (KDE 4.2.2), Kubuntu packages)
Application Universal document viewer
Operating System Linux (x86_64) release 188.8.131.52j2
This always happens with pdf files produced by pdflatex as it makes use of ligatures.
*** Bug 213086 has been marked as a duplicate of this bug. ***
A user [flying sheep] on launchpad has written a patch to fix this, in python, which can be found at https://bugs.launchpad.net/okular/+bug/411538/comments/4
Created attachment 40447 [details]
patch to search for ligatures [written in python]
written by flying sheep [launchpad], https://bugs.launchpad.net/okular/+bug/411538/comments/4
Just for the record, if anyone things that patch is useful, it is not.
Also, for the record, Adobe Reader 9.3 is not able to find the word "configurations" in document from comment #1
I too can confirm this bug for okular 0.9.5, kubuntu 9.10 and kde 4.3.5. Also, the copy function should separate ligatures, like Evince does.
Confirming this bug still exists in KDE 4.4.1.
Also, this was previously reported for kpdf as Bug 103621, so more information can be found there.
*** Bug 230274 has been marked as a duplicate of this bug. ***
*** This bug has been confirmed by popular vote. ***
reply to comment 7:
i’m sorry my “patch” isn’t useful, but at least it would be a way to quickly circumvent the problem until a better solution is found. and “program x does it equally wrong” is no excuse if we can do it better.
*** Bug 258515 has been marked as a duplicate of this bug. ***
SVN commit 1225994 by aacid:
"Normalize" strings so searching for ligatures like "fi" works
Patch by Christopher Reichert
M +11 -3 textpage.cpp
WebSVN link: http://websvn.kde.org/?view=rev&revision=1225994
Created attachment 60622 [details]
Try to search for 'Kaffee' - the ff ligature is the problem