Bug 181828 - Okular does not find words with ligatures
Summary: Okular does not find words with ligatures
Status: RESOLVED FIXED
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Ubuntu Linux
: NOR normal with 50 votes (vote)
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
: 213086 230274 258515 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-01-25 01:22 UTC by David Dempster
Modified: 2012-07-19 16:34 UTC (History)
12 users (show)

See Also:
Latest Commit:
Version Fixed In: 4.7.0


Attachments
patch to search for ligatures [written in python] (426 bytes, patch)
2010-02-02 01:02 UTC, W. Skora
Details
Try to search for 'Kaffee' - the ff ligature is the problem (3.11 KB, application/pdf)
2011-06-04 15:52 UTC, Thomas Domenig
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Dempster 2009-01-25 01:22:47 UTC
Version:            (using KDE 4.1.4)
OS:                Linux
Installed from:    Ubuntu Packages

Here are two words: fire, fire.

They are identical, except that the first one contains a ligature bringing the f and the i together.  The second one has no ligature.  The first is the standard in proper typesetting, and it is the default output of LaTeX.

If I search for "fire" in Okular, the word will not be found, because Okular doesn't understand the ligature.  By way of comparison, Adobe Reader for Linux does understand the ligature, and finds the word.

This can lead to great frustration.

I imagine that this applies to all documents in Okular, rather than being specific to the PDF backend.  There are a handful of other common ligatures that this applies to (see http://en.wikipedia.org/wiki/Typographic_ligature).
Comment 1 jarauh 2009-02-25 10:33:43 UTC
I have the same problem with Version 0.8 on KDE 4.2.

In case anyone needs an example-PDF:
Try to search for "config" in
http://www.nd.edu/~sommese/bertini/BertiniUsersManual.pdf
Comment 2 W. Skora 2009-08-24 17:58:33 UTC
I can also confirm this bug as well. 

okular 0.8.2.
ubuntu 9.04 
kde 4.2.2

If you search for 'config' on the PDF mentioned in Jaruh's link, the first result returned will be on page 7. However, the first result returned should be on the bottom of page 3, where 'configurations' is written.

I made sure the 'use case sensitive' and 'from current page' options were NOT enabled.
Comment 3 Jens Lang 2009-08-25 10:54:18 UTC
I can confirm this bug.

KDE Version  0.8.2 (KDE 4.2.2 (KDE 4.2.2), Kubuntu packages)
Application  Universal document viewer
Operating System  Linux (x86_64) release 2.6.28.9j2
Compiler  cc

This always happens with pdf files produced by pdflatex as it makes use of ligatures.
Comment 4 Pino Toscano 2009-11-04 14:39:15 UTC
*** Bug 213086 has been marked as a duplicate of this bug. ***
Comment 5 W. Skora 2010-02-02 01:00:37 UTC
A user [flying sheep] on launchpad has written a patch to fix this, in python, which can be found at https://bugs.launchpad.net/okular/+bug/411538/comments/4
Comment 6 W. Skora 2010-02-02 01:02:20 UTC
Created attachment 40447 [details]
patch to search for ligatures [written in python]

written by flying sheep [launchpad], https://bugs.launchpad.net/okular/+bug/411538/comments/4
Comment 7 Albert Astals Cid 2010-02-20 18:50:04 UTC
Just for the record, if anyone things that patch is useful, it is not.

Also, for the record, Adobe Reader 9.3 is not able to find the word "configurations" in document from comment #1
Comment 8 gelefisk 2010-02-25 19:56:26 UTC
I too can confirm this bug for okular 0.9.5, kubuntu 9.10 and kde 4.3.5. Also, the copy function should separate ligatures, like Evince does.
Comment 9 Tristan Miller 2010-03-05 15:57:36 UTC
Confirming this bug still exists in KDE 4.4.1.

Also, this was previously reported for kpdf as Bug 103621, so more information can be found there.
Comment 10 Albert Astals Cid 2010-03-11 21:34:11 UTC
*** Bug 230274 has been marked as a duplicate of this bug. ***
Comment 11 Glad Deschrijver 2010-08-11 18:22:00 UTC
*** This bug has been confirmed by popular vote. ***
Comment 12 Philipp A. 2010-11-10 10:39:43 UTC
reply to comment 7:
i’m sorry my “patch” isn’t useful, but at least it would be a way to quickly circumvent the problem until a better solution is found. and “program x does it equally wrong” is no excuse if we can do it better.
Comment 13 Pino Toscano 2010-12-01 21:12:56 UTC
*** Bug 258515 has been marked as a duplicate of this bug. ***
Comment 14 Albert Astals Cid 2011-03-25 21:05:58 UTC
SVN commit 1225994 by aacid:

"Normalize" strings so searching for ligatures like "fi" works
Patch by Christopher Reichert
BUGS: 181828


 M  +11 -3     textpage.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1225994
Comment 15 Thomas Domenig 2011-06-04 15:52:07 UTC
Created attachment 60622 [details]
Try to search for 'Kaffee' - the ff ligature is the problem