Bug 158517 - wrong orientation of the extracted text
Summary: wrong orientation of the extracted text
Status: RESOLVED FIXED
Alias: None
Product: okular
Classification: Applications
Component: PDF backend (show other bugs)
Version: 0.6
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
: 160274 161016 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-02-27 19:57 UTC by Salvo "LtWorf" Tomaselli
Modified: 2008-04-20 15:31 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Salvo "LtWorf" Tomaselli 2008-02-27 19:57:02 UTC
Version:           0.6 (using KDE 4.0.1)
Installed from:    Unlisted Binary Package
OS:                Linux

I've been trying to copy some text from a pdf file, and as long as i know, there were not drm restrictions on it.

When i paste the text, the resulting text some contains words from the selected area (but not the whole selected text), and also words from unselected areas.

I tried to do the same with kpdf, and the result wasn't perfect, but it contained all the words i wanted and no words from other areas.

The pdf is available here: http://www.dmi.unict.it/~nicolosi/LezioniPS/Lezione14.pdf

Q'plà
Comment 1 Tobias Koenig 2008-02-28 00:05:46 UTC
On Wed, Feb 27, 2008 at 06:57:03PM -0000, Salvo Tomaselli wrote:
Hej,

> When i paste the text, the resulting text some contains words from the selected
> area (but not the whole selected text), and also words from unselected areas.

@Pino or @Albert, can reproduce it here. Somehow the rotation
calculation is messed up. If you rotate the document ccw and select the
bottom area of the page, you'll get the headline of the text... really
strange.

Is it a poppler bug?

Ciao,
Tobias
Comment 2 Pino Toscano 2008-02-28 10:57:14 UTC
Not sure, but while the pages have a 270 degree rotation, somehow the text does not.
I have a local fix that makes the orientation testcase (from poppler) working fine, but then with this document gets even worse than how it is now...
Comment 3 Benjamin Wohlwend 2008-03-06 08:47:07 UTC
Here are another two documents which show this problem:

http://www.ifi.uzh.ch/dbtg/uploads/media/01-Einfuehrung_03.pdf
http://www.ifi.uzh.ch/dbtg/uploads/media/02-XML_03.pdf

Additionally, annotation tools which rely on text selection do not work correctly.

Evince 2.21.91 is able to select text correctly in these documents, so poppler is perhaps not be the root of the problem.
Comment 4 Pino Toscano 2008-03-12 09:50:53 UTC
SVN commit 784735 by pino:

Apply Albert's patch to use the new functions in Poppler-Qt4 for getting the bounding box of the characaters in the correct way (almost).
(Unfortunately, this requires the master version from Poppler's GIT repository, to be released hopefully today as 0.8RC1.)

BUG: 158517


 M  +51 -1     generator_pdf.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=784735
Comment 5 Pino Toscano 2008-03-12 09:50:54 UTC
SVN commit 784736 by pino:

Backport: apply Albert's patch to use the new functions in Poppler-Qt4 for getting the bounding box of the characaters in the correct way (almost).
(Unfortunately, this requires the master version from Poppler's GIT repository, to be released hopefully today as 0.8RC1.)

CCBUG: 158517


 M  +51 -1     generator_pdf.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=784736
Comment 6 Pino Toscano 2008-04-19 15:51:06 UTC
Giving a better title, so can be found easily.
Comment 7 Pino Toscano 2008-04-19 15:52:51 UTC
*** Bug 161016 has been marked as a duplicate of this bug. ***
Comment 8 Pino Toscano 2008-04-20 15:31:22 UTC
*** Bug 160274 has been marked as a duplicate of this bug. ***