Bug 194299

Summary: correctly rendered text is not copied to clipboard correctly when it contains diacritics
Product: [Applications] okular Reporter: Radu Benea <radub82>
Component: generalAssignee: Okular developers <okular-devel>
Status: RESOLVED NOT A BUG    
Severity: normal    
Priority: NOR    
Version: 0.8.3   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: document demonstrating the problem

Description Radu Benea 2009-05-27 13:12:55 UTC
Version:           0.8.3 (using 4.2.3 (KDE 4.2.3), Gentoo)
Compiler:          x86_64-pc-linux-gnu-gcc
OS:                Linux (x86_64) release 2.6.28-gentoo-r5

text containing diacritics like romanian characters ă,â,î,ș,ț and french characters é,à,ù,û,ç although they are rendered correclty in pdf files the text selection tool does not copy them correctly, only the hats, accents or cedillas are copied.
Comment 1 Pino Toscano 2009-05-27 13:18:29 UTC
Can you please attach a sample document showing the issue?
Comment 2 Radu Benea 2009-05-27 13:41:57 UTC
Created attachment 34044 [details]
document demonstrating the problem

attached file demonstrating the problem, I don't know how to add french to this but I put there romanian and hungarian, this should be enough to prove my point.
Comment 3 Pino Toscano 2009-05-27 14:03:36 UTC
Actually, I get the very same problems with the following PDF viewers:
- Okular + Poppler 0.10.6
- Okular + Poppler HEAD
- Evince + Poppler 0.10.6
- Acrobar Reader 9.1.1
- XPDF 3.02

It looks to me tex system you're using (or how you are using it) generates wrongly-encoded PDF documents.
Comment 4 Radu Benea 2009-05-27 15:41:46 UTC
You seem to be right, I have found a few well written documents myself, thanks for the informantion.

Apparently it's not just me, plenty of pdf documents on the internet have the same problem, example http://tel.archives-ouvertes.fr/docs/00/25/01/37/PDF/these_final.pdf, this last one was not written by me, strangely I have no problem copying chinese text

the tex document started with
\documentclass[a4paper,10pt]{report}

\usepackage[romanian]{babel}
\usepackage{ucs}
\usepackage[utf8x]{inputenc}

and the rest was written plainly in utf8, was saved with encoding utf8 and pdflatex didn't even give a warning except that the romanian language support in babel was missing hyphenation patterns
Comment 5 Pino Toscano 2009-05-29 16:19:06 UTC
Ok, closing this, as it is a problem (= wrong encoding) in generated documents.