Bug 190433 - okular does not find automatically hyphenated words in pdf
Summary: okular does not find automatically hyphenated words in pdf
Status: RESOLVED FIXED
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: 0.8.2
Platform: unspecified Linux
: NOR wishlist
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
: 148458 228245 253371 (view as bug list)
Depends on: 161324
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-23 12:02 UTC by Oliver Putz
Modified: 2012-02-02 18:34 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In: 4.9.0
Sentry Crash Report:


Attachments
pdf file to show bug 190433 (14.50 KB, application/octet-stream)
2009-04-23 12:05 UTC, Oliver Putz
Details
Test case for the bug. (346.72 KB, application/octet-stream)
2009-04-30 18:44 UTC, Diogo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Oliver Putz 2009-04-23 12:02:49 UTC
Version:           0.8.2 (using 4.2.2 (KDE 4.2.2), Gentoo)
Compiler:          x86_64-pc-linux-gnu-gcc
OS:                Linux (x86_64) release 2.6.28-gentoo-r2

Steps to reproduce:

1) Create a pdf document via latex with a long word at the end of the line such that it gets automatically split (according to the hyphenation rules of the language)
2) Search for the split word in the pdf
3) See that the hyphenated version is not found

I'll also attach a sample document. In this document, the search for "Gefahrenanalysematrix" only returns one hit. However, the word occurs twice in the document. Once split up (first / second line) and once non-split up in line three.

I use poppler-0.10.5
Comment 1 Oliver Putz 2009-04-23 12:05:14 UTC
Created attachment 33035 [details]
pdf file to show bug 190433

Search for Gefahrenanalysematrix in this document and see only one hit due to hyphenation
Comment 2 Diogo 2009-04-30 18:44:10 UTC
Created attachment 33245 [details]
Test case for the bug.

To reproduce the bug just try to find the word "SIMULAÇÃO".
As it can be easily seen, the word is present on the main title, in the first page of the document.
Comment 3 Pino Toscano 2009-05-02 15:15:04 UTC
If we really want to be technically correct, there are no "words" in a PDF documents, but just characters at some positions.
This is the same issue of #161324 (which this depends on), ie doing actual text recognizing.
Comment 4 Pino Toscano 2010-04-07 22:18:01 UTC
*** Bug 148458 has been marked as a duplicate of this bug. ***
Comment 5 Pino Toscano 2010-05-23 01:05:08 UTC
*** Bug 228245 has been marked as a duplicate of this bug. ***
Comment 6 Pino Toscano 2010-10-06 10:53:46 UTC
*** Bug 253371 has been marked as a duplicate of this bug. ***
Comment 7 Albert Astals Cid 2012-02-02 18:34:49 UTC
Will be fixed in Okular from KDE 4.9.0 thanks to the work of Mahfuzur Rahman Mamun