Bug 279859 - table selection tool - new feature (patch enclosed)
Summary: table selection tool - new feature (patch enclosed)
Status: RESOLVED FIXED
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: 0.12.2
Platform: unspecified Linux
: NOR wishlist
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-11 04:51 UTC by sabik
Modified: 2012-09-07 13:23 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In: 4.8.0


Attachments
implementation of the feature (17.24 KB, patch)
2011-08-11 04:51 UTC, sabik
Details
improved implementation of the feature (20.80 KB, patch)
2011-08-18 05:38 UTC, sabik
Details
improved #2 implementation of the feature (21.19 KB, patch)
2011-08-29 12:11 UTC, sabik
Details
improved #3 implementation of the feature (35.15 KB, patch)
2011-09-23 05:07 UTC, sabik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description sabik 2011-08-11 04:51:05 UTC
Created attachment 62739 [details]
implementation of the feature

Version:           0.12.2 (using KDE 4.6.2) 
OS:                Linux

In many situations, it is useful to be able to copy and paste tables from PDF files into a spreadsheet for further processing. At NICTA, we have developed this functionality for Okular and wish to contribute it to the project.

Usage: from the menu, select Tools | Table Selection Tool (or use the shortcut Ctrl-5). Use the mouse to select the whole table (in a manner similar to the existing Selection Tool), then click near the edges of the table to add and remove row and column dividers; the table is automatically copied to the clipboard after each change. When ready, paste into another document (spreadsheet, word processor, etc). Press C to clear the selection.

Notes:
* There seem to be several different places where credit is recorded (AUTHORS, aboutdata.h, docbook); I'm not sure whether, where and in what form it would be appropriate to insert credit for this feature.
* I think I've pretty much omitted any DRM checks.

Could I have a code review and/or instructions how to best go ahead with submitting this feature, please? (via patches or git or other mechanism?)

Reproducible: Always


Actual Results:  
Copying and pasting tables from PDF files into a spreadsheet for further processing is tedious and error-prone.

Expected Results:  
It is easy to reliably copy and paste tables from PDF files into a spreadsheet for further processing.
Comment 1 Albert Astals Cid 2011-08-11 08:48:58 UTC
Aboutdata is probably the easiest place, you can also add your name to the headers if you feel like it.

Ignoring the drm is a no go, please do what the rest of the code does.

If you prefer we have git.reviewboard.kde.org that sometimes helps a bit with patch reviewing.

Also i'd prefer if you tried to fix the problem you have with Esc.

Finally i'm not sure this is not a extra specialized feature and that it really makes sense to add to okular itself. Can you please describe the use case?
Comment 2 sabik 2011-08-11 11:38:08 UTC
Hello,

Albert Astals Cid:
> Aboutdata is probably the easiest place, you can also add
> your name to the headers if you feel like it.

Thanks.

> Ignoring the drm is a no go, please do what the rest of the code does.

OK, I'll do that. I guess that always was going to go in pretty soon, so 
better do it now.

> If you prefer we have git.reviewboard.kde.org that sometimes helps a bit
> with patch reviewing.

Thanks, I may post it there.

> Also i'd prefer if you tried to fix the problem you have with Esc.

Do you have any suggestions for that, please? As you can see, I've got 
Qt::Key_Escape and  Qt::Key_C doing the same functionality, but for some 
reason the Esc doesn't seem to get there.

> Finally i'm not sure this is not a extra specialized feature and that it
> really makes sense to add to okular itself. Can you please describe the
> use case?

I imagine it's a fairly common requirement...? Many documents contain tables, 
and being able to copy'n'paste them while preserving the row/column structure 
will probably be useful in many circumstances.

Bugs 168953 and 212458 may be somewhat related - they're asking for different 
features, but the table selection tool should be able to help at least for the 
copy'n'paste (not so much for the highlighting).

I'll put together additional rationale and post it on the bug.


Jiri
Comment 3 sabik 2011-08-12 09:22:59 UTC
Albert Astals Cid:
> Finally i'm not sure this is not a extra specialized feature and that it
> really makes sense to add to okular itself. Can you please describe the
> use case?

Many documents contain tables, and often it is the tables that present the key 
information, whether it be a table of scores or results, a budget or financial 
report, list of participants, prices, points and counter-points in prose, or 
other information. As the Wikipedia puts it, "[T]he use of tables is pervasive 
throughout all communication, research and data analysis."

With this pervasiveness comes a requirement to manipulate these tables. 

Currently, Okular can select consecutive text or a rectangle, but it has no 
capability for selecting a table as a table. Work-arounds include selecting 
the whole table as text and then relying on the text-parsing of the 
spreadsheet program to recover the structure; selecting the table cell by 
cell; or re-typing the information. The first fails unless each cell contains 
exactly one easily-parsed item (no blank cells, no cells with two words such 
as "not applicable"), the second and third become increasingly tedious and 
error-prone as the number of cells rises.

The proposed feature will allow users to select and copy'n'paste a table as a 
single operation, preserving the row/column structure regardless of blank 
cells, multiple words in a cell or even multiple lines or a whole paragraph 
within each cell.


The feature may also help with the use cases for bugs 168953 and 212458; 
they're asking for a different approach, but the table selection tool should 
be able to help at least for the copy'n'paste - the whole page can be selected 
as a table with a single row and two columns and pasted into another document 
in that format.

Jiri
Comment 4 sabik 2011-08-18 05:38:46 UTC
Created attachment 62924 [details]
improved implementation of the feature
Comment 5 sabik 2011-08-18 05:40:20 UTC
Albert Astals Cid:
> Aboutdata is probably the easiest place, you can also add
> your name to the headers if you feel like it.

Done - thank you.

> Ignoring the drm is a no go, please do what the rest of the code does.

Done, I think - I'm afraid I don't have any DRM'd document to test on.

> If you prefer we have git.reviewboard.kde.org that sometimes helps a bit
> with patch reviewing.

Done, https://git.reviewboard.kde.org/r/102358/

> Also i'd prefer if you tried to fix the problem you have with Esc.

Done. Esc was the shortcut for closing the FindBar, which was active 
regardless of whether or not the FindBar is visible. I've fixed FindBar to 
only have the shortcut when it's shown, not when it's hidden.


Jiri
Comment 6 sabik 2011-08-29 12:11:16 UTC
Created attachment 63211 [details]
improved #2 implementation of the feature

improved based on the review on https://git.reviewboard.kde.org/r/102358/ except the zoom behaviour which is still TODO
Comment 7 sabik 2011-09-23 05:07:18 UTC
Created attachment 63877 [details]
improved #3 implementation of the feature

improved based on the review on https://git.reviewboard.kde.org/r/102358/
part II — deal correctly with zooming (and also view mode changes)

Also took the opportunity to draw the dividers in variant of the selection colour (rather than in red) and on the correct layer (rather than on top).
Comment 8 sabik 2011-09-23 05:17:00 UTC
BTW, the rectExtractText function is now only used in one place, so my factoring it out may or may not still make sense...
Comment 9 Albert Astals Cid 2011-10-12 14:35:05 UTC
Git commit f81a49fafaa6bf0991b72280f8704fe254f2d909 by Albert Astals Cid, on behalf of Jiri Baum.
Committed on 12/10/2011 at 15:50.
Pushed by aacid into branch 'master'.

table selection tool

BUGS: 279859
REVIEW: 102358
FIXED-IN: 4.8.0

M  +1    -0    AUTHORS
M  +3    -1    aboutdata.h
M  +6    -5    part.cpp
M  +1    -0    part.h
M  +1    -0    part.rc
M  +405  -32   ui/pageview.cpp
M  +6    -1    ui/pageview.h

http://commits.kde.org/okular/f81a49fafaa6bf0991b72280f8704fe254f2d909
Comment 10 Fabio Correa 2012-09-07 13:23:42 UTC
Sabik, I write to commend you for this excellent feature; this really stands out for its creativity and ease of use. This is really helpful to fix the layout of copied documents.

Thanks for this fine contribution to Okular.

Sincerely,

user.