Created attachment 62739 [details] implementation of the feature Version: 0.12.2 (using KDE 4.6.2) OS: Linux In many situations, it is useful to be able to copy and paste tables from PDF files into a spreadsheet for further processing. At NICTA, we have developed this functionality for Okular and wish to contribute it to the project. Usage: from the menu, select Tools | Table Selection Tool (or use the shortcut Ctrl-5). Use the mouse to select the whole table (in a manner similar to the existing Selection Tool), then click near the edges of the table to add and remove row and column dividers; the table is automatically copied to the clipboard after each change. When ready, paste into another document (spreadsheet, word processor, etc). Press C to clear the selection. Notes: * There seem to be several different places where credit is recorded (AUTHORS, aboutdata.h, docbook); I'm not sure whether, where and in what form it would be appropriate to insert credit for this feature. * I think I've pretty much omitted any DRM checks. Could I have a code review and/or instructions how to best go ahead with submitting this feature, please? (via patches or git or other mechanism?) Reproducible: Always Actual Results: Copying and pasting tables from PDF files into a spreadsheet for further processing is tedious and error-prone. Expected Results: It is easy to reliably copy and paste tables from PDF files into a spreadsheet for further processing.
Aboutdata is probably the easiest place, you can also add your name to the headers if you feel like it. Ignoring the drm is a no go, please do what the rest of the code does. If you prefer we have git.reviewboard.kde.org that sometimes helps a bit with patch reviewing. Also i'd prefer if you tried to fix the problem you have with Esc. Finally i'm not sure this is not a extra specialized feature and that it really makes sense to add to okular itself. Can you please describe the use case?
Hello, Albert Astals Cid: > Aboutdata is probably the easiest place, you can also add > your name to the headers if you feel like it. Thanks. > Ignoring the drm is a no go, please do what the rest of the code does. OK, I'll do that. I guess that always was going to go in pretty soon, so better do it now. > If you prefer we have git.reviewboard.kde.org that sometimes helps a bit > with patch reviewing. Thanks, I may post it there. > Also i'd prefer if you tried to fix the problem you have with Esc. Do you have any suggestions for that, please? As you can see, I've got Qt::Key_Escape and Qt::Key_C doing the same functionality, but for some reason the Esc doesn't seem to get there. > Finally i'm not sure this is not a extra specialized feature and that it > really makes sense to add to okular itself. Can you please describe the > use case? I imagine it's a fairly common requirement...? Many documents contain tables, and being able to copy'n'paste them while preserving the row/column structure will probably be useful in many circumstances. Bugs 168953 and 212458 may be somewhat related - they're asking for different features, but the table selection tool should be able to help at least for the copy'n'paste (not so much for the highlighting). I'll put together additional rationale and post it on the bug. Jiri
Albert Astals Cid: > Finally i'm not sure this is not a extra specialized feature and that it > really makes sense to add to okular itself. Can you please describe the > use case? Many documents contain tables, and often it is the tables that present the key information, whether it be a table of scores or results, a budget or financial report, list of participants, prices, points and counter-points in prose, or other information. As the Wikipedia puts it, "[T]he use of tables is pervasive throughout all communication, research and data analysis." With this pervasiveness comes a requirement to manipulate these tables. Currently, Okular can select consecutive text or a rectangle, but it has no capability for selecting a table as a table. Work-arounds include selecting the whole table as text and then relying on the text-parsing of the spreadsheet program to recover the structure; selecting the table cell by cell; or re-typing the information. The first fails unless each cell contains exactly one easily-parsed item (no blank cells, no cells with two words such as "not applicable"), the second and third become increasingly tedious and error-prone as the number of cells rises. The proposed feature will allow users to select and copy'n'paste a table as a single operation, preserving the row/column structure regardless of blank cells, multiple words in a cell or even multiple lines or a whole paragraph within each cell. The feature may also help with the use cases for bugs 168953 and 212458; they're asking for a different approach, but the table selection tool should be able to help at least for the copy'n'paste - the whole page can be selected as a table with a single row and two columns and pasted into another document in that format. Jiri
Created attachment 62924 [details] improved implementation of the feature
Albert Astals Cid: > Aboutdata is probably the easiest place, you can also add > your name to the headers if you feel like it. Done - thank you. > Ignoring the drm is a no go, please do what the rest of the code does. Done, I think - I'm afraid I don't have any DRM'd document to test on. > If you prefer we have git.reviewboard.kde.org that sometimes helps a bit > with patch reviewing. Done, https://git.reviewboard.kde.org/r/102358/ > Also i'd prefer if you tried to fix the problem you have with Esc. Done. Esc was the shortcut for closing the FindBar, which was active regardless of whether or not the FindBar is visible. I've fixed FindBar to only have the shortcut when it's shown, not when it's hidden. Jiri
Created attachment 63211 [details] improved #2 implementation of the feature improved based on the review on https://git.reviewboard.kde.org/r/102358/ except the zoom behaviour which is still TODO
Created attachment 63877 [details] improved #3 implementation of the feature improved based on the review on https://git.reviewboard.kde.org/r/102358/ part II — deal correctly with zooming (and also view mode changes) Also took the opportunity to draw the dividers in variant of the selection colour (rather than in red) and on the correct layer (rather than on top).
BTW, the rectExtractText function is now only used in one place, so my factoring it out may or may not still make sense...
Git commit f81a49fafaa6bf0991b72280f8704fe254f2d909 by Albert Astals Cid, on behalf of Jiri Baum. Committed on 12/10/2011 at 15:50. Pushed by aacid into branch 'master'. table selection tool BUGS: 279859 REVIEW: 102358 FIXED-IN: 4.8.0 M +1 -0 AUTHORS M +3 -1 aboutdata.h M +6 -5 part.cpp M +1 -0 part.h M +1 -0 part.rc M +405 -32 ui/pageview.cpp M +6 -1 ui/pageview.h http://commits.kde.org/okular/f81a49fafaa6bf0991b72280f8704fe254f2d909
Sabik, I write to commend you for this excellent feature; this really stands out for its creativity and ease of use. This is really helpful to fix the layout of copied documents. Thanks for this fine contribution to Okular. Sincerely, user.