217914 – mixed languages: cannot copy text (garbage instead of proper letters)

Bug 217914 - mixed languages: cannot copy text (garbage instead of proper letters)

Summary: mixed languages: cannot copy text (garbage instead of proper letters)

Status:	RESOLVED NOT A BUG

Alias:	None

Product:	okular
Classification:	Applications
Component:	PDF backend (other bugs)
Version First Reported In:	unspecified
Platform:	openSUSE Unspecified

Importance:	NOR normal
Target Milestone:	---
Assignee:	Okular developers

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-12-08 20:34 UTC by Maciej Pilichowski
Modified:	2009-12-08 22:37 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Maciej Pilichowski 2009-12-08 20:34:54 UTC

Version:            (using KDE 4.3.3)
Installed from:    SuSE RPMs

Let's say you have a pdf with mixed languages (example: polish and russian). Here you can download one for testing:
http://www.jezykiobce.net/jezykiobce_pdf/rosyjski_kurs_podst.pdf

when you copy polish part or russian part (or both) you end up with garbage, like:
êàíèêóëû

only basic latin letters plus letter ó are copied correctly. It is not a matter of font used, because I can use for the same font intended letters:
ęóąśł
or
яерсидоф

Comment 1 Pino Toscano 2009-12-08 20:50:03 UTC

Given the problem is reproduceable (in the very same way) with:

- acroread 9.2
- okular 0.9.2 + poppler 0.12.2
- evince 2.28.1 + poppler 0.12.2

I'm rather inclined to conclude the document might be badly encoded.
(Note: what you see in a PDF is not what you copy as text.)

Comment 2 Maciej Pilichowski 2009-12-08 22:37:32 UTC

Pino, thank you for explanation. I assumed that by definition pdf has to be properly encoded (text I mean).