190433 – okular does not find automatically hyphenated words in pdf

Bug 190433 - okular does not find automatically hyphenated words in pdf

Summary: okular does not find automatically hyphenated words in pdf

Status:	RESOLVED FIXED

Alias:	None

Product:	okular
Classification:	Applications
Component:	general (other bugs)
Version First Reported In:	0.8.2
Platform:	unspecified Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Okular developers

URL:
Keywords:

Duplicates (3):	148458 228245 253371 (view as bug list)
Depends on:	161324
Blocks:
	Show dependency tree / graph

Reported:	2009-04-23 12:02 UTC by Oliver Putz
Modified:	2012-02-02 18:34 UTC (History)
CC List:	4 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:	4.9.0
Sentry Crash Report:

Attachments
pdf file to show bug 190433 (14.50 KB, application/octet-stream) 2009-04-23 12:05 UTC, Oliver Putz	Details
Test case for the bug. (346.72 KB, application/octet-stream) 2009-04-30 18:44 UTC, Diogo	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Oliver Putz 2009-04-23 12:02:49 UTC

Version:           0.8.2 (using 4.2.2 (KDE 4.2.2), Gentoo)
Compiler:          x86_64-pc-linux-gnu-gcc
OS:                Linux (x86_64) release 2.6.28-gentoo-r2

Steps to reproduce:

1) Create a pdf document via latex with a long word at the end of the line such that it gets automatically split (according to the hyphenation rules of the language)
2) Search for the split word in the pdf
3) See that the hyphenated version is not found

I'll also attach a sample document. In this document, the search for "Gefahrenanalysematrix" only returns one hit. However, the word occurs twice in the document. Once split up (first / second line) and once non-split up in line three.

I use poppler-0.10.5

Comment 1 Oliver Putz 2009-04-23 12:05:14 UTC

Created attachment 33035 [details]
pdf file to show bug 190433

Search for Gefahrenanalysematrix in this document and see only one hit due to hyphenation

Comment 2 Diogo 2009-04-30 18:44:10 UTC

Created attachment 33245 [details]
Test case for the bug.

To reproduce the bug just try to find the word "SIMULAÇÃO".
As it can be easily seen, the word is present on the main title, in the first page of the document.

Comment 3 Pino Toscano 2009-05-02 15:15:04 UTC

If we really want to be technically correct, there are no "words" in a PDF documents, but just characters at some positions.
This is the same issue of #161324 (which this depends on), ie doing actual text recognizing.

Comment 4 Pino Toscano 2010-04-07 22:18:01 UTC

*** Bug 148458 has been marked as a duplicate of this bug. ***

Comment 5 Pino Toscano 2010-05-23 01:05:08 UTC

*** Bug 228245 has been marked as a duplicate of this bug. ***

Comment 6 Pino Toscano 2010-10-06 10:53:46 UTC

*** Bug 253371 has been marked as a duplicate of this bug. ***

Comment 7 Albert Astals Cid 2012-02-02 18:34:49 UTC

Will be fixed in Okular from KDE 4.9.0 thanks to the work of Mahfuzur Rahman Mamun