Summary: | Okular: Incomplete export of PDF czech content to plain text | ||
---|---|---|---|
Product: | [Applications] okular | Reporter: | sobik2 <sobik2> |
Component: | general | Assignee: | Okular developers <okular-devel> |
Status: | RESOLVED NOT A BUG | ||
Severity: | normal | ||
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Ubuntu | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Attachments: |
source latex file
created pdf document with 'pdflatex document.tex ' plain text exported from pdf |
Description
sobik2
2010-02-25 15:54:52 UTC
i have installed latex with: 'sudo apt-get install texlive texlive-lang-czechslovak' using kubuntu 9.10, KDE 4.4.0 Please attach the pdf file Created attachment 41118 [details]
source latex file
Created attachment 41119 [details]
created pdf document with 'pdflatex document.tex
'
Created attachment 41120 [details]
plain text exported from pdf
The pdf is not correctly formed to make text extraction possible, try to do the text extraction with Adobe Reader and you'll see how it also fails, you might want to contact pdflatex people about it i tried it in adobe acrobat 9, it seems to be really invalid pdf.. because 1) when i create odt document in open office 3.1 2) export it like pdf 3) open this pdf in okular -> export as plain text -> the content is spelled correctly... question is, how the user can know, that his pdf is non valid (i didnt find any linux tool) SUGGEST ======= so maybe i suggest to implement some validity control, which shows warning before exporting non-valid pdf to plain text, so then users neednt report it as bug (exactly like me) There is not way to differenciate a broken pdf from a non broken one other than reading the extracted text, so what you ask is impossible. |