Bug 217914

Summary:	mixed languages: cannot copy text (garbage instead of proper letters)
Product:	[Applications] okular	Reporter:	Maciej Pilichowski <bluedzins>
Component:	PDF backend	Assignee:	Okular developers <okular-devel>
Status:	RESOLVED NOT A BUG
Severity:	normal
Priority:	NOR
Version First Reported In:	unspecified
Target Milestone:	---
Platform:	openSUSE
OS:	Unspecified
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description Maciej Pilichowski 2009-12-08 20:34:54 UTC

Version:            (using KDE 4.3.3)
Installed from:    SuSE RPMs

Let's say you have a pdf with mixed languages (example: polish and russian). Here you can download one for testing:
http://www.jezykiobce.net/jezykiobce_pdf/rosyjski_kurs_podst.pdf

when you copy polish part or russian part (or both) you end up with garbage, like:
êàíèêóëû

only basic latin letters plus letter ó are copied correctly. It is not a matter of font used, because I can use for the same font intended letters:
ęóąśł
or
яерсидоф

Comment 1 Pino Toscano 2009-12-08 20:50:03 UTC

Given the problem is reproduceable (in the very same way) with:

- acroread 9.2
- okular 0.9.2 + poppler 0.12.2
- evince 2.28.1 + poppler 0.12.2

I'm rather inclined to conclude the document might be badly encoded.
(Note: what you see in a PDF is not what you copy as text.)

Comment 2 Maciej Pilichowski 2009-12-08 22:37:32 UTC

Pino, thank you for explanation. I assumed that by definition pdf has to be properly encoded (text I mean).