Bug 194299

Summary:	correctly rendered text is not copied to clipboard correctly when it contains diacritics
Product:	[Applications] okular	Reporter:	Radu Benea <radub82>
Component:	general	Assignee:	Okular developers <okular-devel>
Status:	RESOLVED NOT A BUG
Severity:	normal
Priority:	NOR
Version First Reported In:	0.8.3
Target Milestone:	---
Platform:	unspecified
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:
Attachments:	document demonstrating the problem

Description Radu Benea 2009-05-27 13:12:55 UTC

Version:           0.8.3 (using 4.2.3 (KDE 4.2.3), Gentoo)
Compiler:          x86_64-pc-linux-gnu-gcc
OS:                Linux (x86_64) release 2.6.28-gentoo-r5

text containing diacritics like romanian characters ă,â,î,ș,ț and french characters é,à,ù,û,ç although they are rendered correclty in pdf files the text selection tool does not copy them correctly, only the hats, accents or cedillas are copied.

Comment 1 Pino Toscano 2009-05-27 13:18:29 UTC

Can you please attach a sample document showing the issue?

Comment 2 Radu Benea 2009-05-27 13:41:57 UTC

Created attachment 34044 [details]
document demonstrating the problem

attached file demonstrating the problem, I don't know how to add french to this but I put there romanian and hungarian, this should be enough to prove my point.

Comment 3 Pino Toscano 2009-05-27 14:03:36 UTC

Actually, I get the very same problems with the following PDF viewers:
- Okular + Poppler 0.10.6
- Okular + Poppler HEAD
- Evince + Poppler 0.10.6
- Acrobar Reader 9.1.1
- XPDF 3.02

It looks to me tex system you're using (or how you are using it) generates wrongly-encoded PDF documents.

Comment 4 Radu Benea 2009-05-27 15:41:46 UTC

You seem to be right, I have found a few well written documents myself, thanks for the informantion.

Apparently it's not just me, plenty of pdf documents on the internet have the same problem, example http://tel.archives-ouvertes.fr/docs/00/25/01/37/PDF/these_final.pdf, this last one was not written by me, strangely I have no problem copying chinese text

the tex document started with
\documentclass[a4paper,10pt]{report}

\usepackage[romanian]{babel}
\usepackage{ucs}
\usepackage[utf8x]{inputenc}

and the rest was written plainly in utf8, was saved with encoding utf8 and pdflatex didn't even give a warning except that the romanian language support in babel was missing hyphenation patterns

Comment 5 Pino Toscano 2009-05-29 16:19:06 UTC

Ok, closing this, as it is a problem (= wrong encoding) in generated documents.