Bug 217914 - mixed languages: cannot copy text (garbage instead of proper letters)
Summary: mixed languages: cannot copy text (garbage instead of proper letters)
Status: RESOLVED NOT A BUG
Alias: None
Product: okular
Classification: Applications
Component: PDF backend (show other bugs)
Version: unspecified
Platform: openSUSE Unspecified
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-08 20:34 UTC by Maciej Pilichowski
Modified: 2009-12-08 22:37 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Maciej Pilichowski 2009-12-08 20:34:54 UTC
Version:            (using KDE 4.3.3)
Installed from:    SuSE RPMs

Let's say you have a pdf with mixed languages (example: polish and russian). Here you can download one for testing:
http://www.jezykiobce.net/jezykiobce_pdf/rosyjski_kurs_podst.pdf

when you copy polish part or russian part (or both) you end up with garbage, like:
êàíèêóëû

only basic latin letters plus letter ó are copied correctly. It is not a matter of font used, because I can use for the same font intended letters:
ęóąśł
or
яерсидоф
Comment 1 Pino Toscano 2009-12-08 20:50:03 UTC
Given the problem is reproduceable (in the very same way) with:

- acroread 9.2
- okular 0.9.2 + poppler 0.12.2
- evince 2.28.1 + poppler 0.12.2

I'm rather inclined to conclude the document might be badly encoded.
(Note: what you see in a PDF is not what you copy as text.)
Comment 2 Maciej Pilichowski 2009-12-08 22:37:32 UTC
Pino, thank you for explanation. I assumed that by definition pdf has to be properly encoded (text I mean).