Bug 212458

Summary:	wrong algorithm used by the copy-to-clipboard selection tool
Product:	[Applications] okular	Reporter:	Fabio Rossi <rossi.f>
Component:	PDF backend	Assignee:	Okular developers <okular-devel>
Status:	RESOLVED DUPLICATE
Severity:	normal
Priority:	NOR
Version First Reported In:	0.9.1
Target Milestone:	---
Platform:	Gentoo Packages
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:
Attachments:	test.pdf test.odt

Description Fabio Rossi 2009-10-31 01:01:41 UTC

Version:           0.9.1 (using KDE 4.3.1)
OS:                Linux
Installed from:    Gentoo Packages

I created a PDF sample document (attached) which seems problematic when I'm using the copy-to-clipboard selection tool of okular. The copied text has a wrong order so the original meaning is lost.

For instance, from the attached document I get the following text:

1. First chapter
2. Second chapter
a) Section number one
b) Section number two
c) and three
i. ok
ii. two
d) and four
3. Last chapter
1
10
11
18
22
23
30
40
100

which is of course in the wrong order.

Comment 1 Fabio Rossi 2009-10-31 01:02:30 UTC

Created attachment 37980 [details]
test.pdf

the problematic document

Comment 2 Fabio Rossi 2009-10-31 01:03:25 UTC

Created attachment 37981 [details]
test.odt

the source file used to create test.pdf

Comment 3 Pino Toscano 2009-10-31 01:15:27 UTC

FYI, the order of the copied text is the order the text characters are stored in the PDF document.

Comment 4 Fabio Rossi 2009-10-31 10:04:04 UTC

I think that the user expects the data in the clipboard with the "visual" order (as with kpdf).

Is there a chance that the data ordering will be changed? Maybe there could be a configuration option for choosing the behaviour of the operation.

Comment 5 Pino Toscano 2009-10-31 10:19:00 UTC

> I think that the user expects the data in the clipboard with the "visual" order
> (as with kpdf).

The generic "block" selection (the only one available in KPDF) is still there. 

> Is there a chance that the data ordering will be changed? Maybe there could be
> a configuration option for choosing the behaviour of the operation.

What would adding a "configuration option" solve?
This is a case much similar to bug #161324 (recognising logical text columns), ie the text information in the document are not in the "logical" order for reading, so there must text analysis for solving this "problem". (Also note that your testcase is a perfect example of what would be eventually considered as a two-column text flow.)

Comment 6 Fabio Rossi 2009-10-31 10:43:18 UTC

(In reply to comment #5)
> The generic "block" selection (the only one available in KPDF) is still there. 

But it doesn't seem enabled in the GUI. The "selection tool" in okular behaves in a different way respect the "select tool" in kpdf.
 
> ie the text information in the document are not in the "logical" order for
> reading, so there must text analysis for solving this "problem". (Also note
> that your testcase is a perfect example of what would be eventually
> considered as a two-column text flow.)

Does kpdf do some logical analysis on the text?

Comment 7 Pino Toscano 2009-10-31 10:52:12 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > The generic "block" selection (the only one available in KPDF) is still there. 
> 
> But it doesn't seem enabled in the GUI. The "selection tool" in okular behaves
> in a different way respect the "select tool" in kpdf.

Now I see; although, the PDF libraries used by KPDF and Okular are different in some extend, and Okular uses the Poppler library.

> > ie the text information in the document are not in the "logical" order for
> > reading, so there must text analysis for solving this "problem". (Also note
> > that your testcase is a perfect example of what would be eventually
> > considered as a two-column text flow.)
> 
> Does kpdf do some logical analysis on the text?

No.

*** This bug has been marked as a duplicate of bug 168953 ***