Bug 212458

Summary: wrong algorithm used by the copy-to-clipboard selection tool
Product: [Applications] okular Reporter: Fabio Rossi <rossi.f>
Component: PDF backendAssignee: Okular developers <okular-devel>
Status: RESOLVED DUPLICATE    
Severity: normal    
Priority: NOR    
Version: 0.9.1   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: test.pdf
test.odt

Description Fabio Rossi 2009-10-31 01:01:41 UTC
Version:           0.9.1 (using KDE 4.3.1)
OS:                Linux
Installed from:    Gentoo Packages

I created a PDF sample document (attached) which seems problematic when I'm using the copy-to-clipboard selection tool of okular. The copied text has a wrong order so the original meaning is lost.

For instance, from the attached document I get the following text:

1. First chapter
2. Second chapter
a) Section number one
b) Section number two
c) and three
i. ok
ii. two
d) and four
3. Last chapter
1
10
11
18
22
23
30
40
100

which is of course in the wrong order.
Comment 1 Fabio Rossi 2009-10-31 01:02:30 UTC
Created attachment 37980 [details]
test.pdf

the problematic document
Comment 2 Fabio Rossi 2009-10-31 01:03:25 UTC
Created attachment 37981 [details]
test.odt

the source file used to create test.pdf
Comment 3 Pino Toscano 2009-10-31 01:15:27 UTC
FYI, the order of the copied text is the order the text characters are stored in the PDF document.
Comment 4 Fabio Rossi 2009-10-31 10:04:04 UTC
I think that the user expects the data in the clipboard with the "visual" order (as with kpdf).

Is there a chance that the data ordering will be changed? Maybe there could be a configuration option for choosing the behaviour of the operation.
Comment 5 Pino Toscano 2009-10-31 10:19:00 UTC
> I think that the user expects the data in the clipboard with the "visual" order
> (as with kpdf).

The generic "block" selection (the only one available in KPDF) is still there. 

> Is there a chance that the data ordering will be changed? Maybe there could be
> a configuration option for choosing the behaviour of the operation.

What would adding a "configuration option" solve?
This is a case much similar to bug #161324 (recognising logical text columns), ie the text information in the document are not in the "logical" order for reading, so there must text analysis for solving this "problem". (Also note that your testcase is a perfect example of what would be eventually considered as a two-column text flow.)
Comment 6 Fabio Rossi 2009-10-31 10:43:18 UTC
(In reply to comment #5)
> The generic "block" selection (the only one available in KPDF) is still there. 

But it doesn't seem enabled in the GUI. The "selection tool" in okular behaves in a different way respect the "select tool" in kpdf.
 
> ie the text information in the document are not in the "logical" order for
> reading, so there must text analysis for solving this "problem". (Also note
> that your testcase is a perfect example of what would be eventually
> considered as a two-column text flow.)

Does kpdf do some logical analysis on the text?
Comment 7 Pino Toscano 2009-10-31 10:52:12 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > The generic "block" selection (the only one available in KPDF) is still there. 
> 
> But it doesn't seem enabled in the GUI. The "selection tool" in okular behaves
> in a different way respect the "select tool" in kpdf.

Now I see; although, the PDF libraries used by KPDF and Okular are different in some extend, and Okular uses the Poppler library.

> > ie the text information in the document are not in the "logical" order for
> > reading, so there must text analysis for solving this "problem". (Also note
> > that your testcase is a perfect example of what would be eventually
> > considered as a two-column text flow.)
> 
> Does kpdf do some logical analysis on the text?

No.

*** This bug has been marked as a duplicate of bug 168953 ***