212458 – wrong algorithm used by the copy-to-clipboard selection tool

Bug 212458 - wrong algorithm used by the copy-to-clipboard selection tool

Summary: wrong algorithm used by the copy-to-clipboard selection tool

Status:	RESOLVED DUPLICATE of bug 168953

Alias:	None

Product:	okular
Classification:	Applications
Component:	PDF backend (other bugs)
Version First Reported In:	0.9.1
Platform:	Gentoo Packages Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Okular developers

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-10-31 01:01 UTC by Fabio Rossi
Modified:	2009-10-31 10:52 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
test.pdf (21.41 KB, application/pdf) 2009-10-31 01:02 UTC, Fabio Rossi	Details
test.odt (8.70 KB, application/vnd.oasis.opendocument.text) 2009-10-31 01:03 UTC, Fabio Rossi	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Fabio Rossi 2009-10-31 01:01:41 UTC

Version:           0.9.1 (using KDE 4.3.1)
OS:                Linux
Installed from:    Gentoo Packages

I created a PDF sample document (attached) which seems problematic when I'm using the copy-to-clipboard selection tool of okular. The copied text has a wrong order so the original meaning is lost.

For instance, from the attached document I get the following text:

1. First chapter
2. Second chapter
a) Section number one
b) Section number two
c) and three
i. ok
ii. two
d) and four
3. Last chapter
1
10
11
18
22
23
30
40
100

which is of course in the wrong order.

Comment 1 Fabio Rossi 2009-10-31 01:02:30 UTC

Created attachment 37980 [details]
test.pdf

the problematic document

Comment 2 Fabio Rossi 2009-10-31 01:03:25 UTC

Created attachment 37981 [details]
test.odt

the source file used to create test.pdf

Comment 3 Pino Toscano 2009-10-31 01:15:27 UTC

FYI, the order of the copied text is the order the text characters are stored in the PDF document.

Comment 4 Fabio Rossi 2009-10-31 10:04:04 UTC

I think that the user expects the data in the clipboard with the "visual" order (as with kpdf).

Is there a chance that the data ordering will be changed? Maybe there could be a configuration option for choosing the behaviour of the operation.

Comment 5 Pino Toscano 2009-10-31 10:19:00 UTC

> I think that the user expects the data in the clipboard with the "visual" order
> (as with kpdf).

The generic "block" selection (the only one available in KPDF) is still there. 

> Is there a chance that the data ordering will be changed? Maybe there could be
> a configuration option for choosing the behaviour of the operation.

What would adding a "configuration option" solve?
This is a case much similar to bug #161324 (recognising logical text columns), ie the text information in the document are not in the "logical" order for reading, so there must text analysis for solving this "problem". (Also note that your testcase is a perfect example of what would be eventually considered as a two-column text flow.)

Comment 6 Fabio Rossi 2009-10-31 10:43:18 UTC

(In reply to comment #5)
> The generic "block" selection (the only one available in KPDF) is still there. 

But it doesn't seem enabled in the GUI. The "selection tool" in okular behaves in a different way respect the "select tool" in kpdf.
 
> ie the text information in the document are not in the "logical" order for
> reading, so there must text analysis for solving this "problem". (Also note
> that your testcase is a perfect example of what would be eventually
> considered as a two-column text flow.)

Does kpdf do some logical analysis on the text?

Comment 7 Pino Toscano 2009-10-31 10:52:12 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > The generic "block" selection (the only one available in KPDF) is still there. 
> 
> But it doesn't seem enabled in the GUI. The "selection tool" in okular behaves
> in a different way respect the "select tool" in kpdf.

Now I see; although, the PDF libraries used by KPDF and Okular are different in some extend, and Okular uses the Poppler library.

> > ie the text information in the document are not in the "logical" order for
> > reading, so there must text analysis for solving this "problem". (Also note
> > that your testcase is a perfect example of what would be eventually
> > considered as a two-column text flow.)
> 
> Does kpdf do some logical analysis on the text?

No.

*** This bug has been marked as a duplicate of bug 168953 ***