Bug 343355

Summary:	Allow highlighting by rectangle
Product:	[Applications] okular	Reporter:	ned <naught101>
Component:	general	Assignee:	Okular developers <okular-devel>
Status:	RESOLVED DUPLICATE
Severity:	wishlist	CC:	aacid
Priority:	NOR
Version First Reported In:	0.19.3
Target Milestone:	---
Platform:	Other
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description ned 2015-01-27 00:20:00 UTC

The highlighter review tool is really useful, because it allows you to extract quotes easily with other tools (e.g. zotero/zotfile).

The problem is that PDFs (especially OCR'd pdfs) are generally crap at identifying text flow. So if you have a PDF with two columns of text, and you want to highlight a multi-line section from one column, it often selects across two columns. This makes for a fairly useless annotation. You can manually do it line-by-line, but then you end up with lots of separate annotations that require a lot of editing.

What would be good is a box-highlighter, that acts the same as the highlighter, but is drawn as a rectangle, and highlights all of the words inside the rectangle.

Of course, that doesn't help when your quotes are split over two columns, but that's a fairly small edge-case, and probably can reasonably be ignored.

Reproducible: Always

Comment 1 Albert Astals Cid 2015-01-27 00:24:25 UTC

Why not just use a line with the width you want and some opacity?

Comment 2 ned 2015-01-27 04:17:28 UTC

Because that doesn't let you extract the highlighted text.

Current behaviour:

1. open PDF (from zotero) in okular
2. highlight text with the highlighter
3. save as, overwrite
4. in zotero, with zotfile installed, right-click PDF > Manage attachments > Extract annotations. This creates a note in zotero, and it includes the highlighted text, and the page number. Note that this text is *not* from a comment added to the highlighter annotation in step 2, it is the original text from the PDF itself.

You can't do this with any of the other review tools, because they are just aesthetic - they don't interact with the PDF text. You can manually add text as notes to each of them, but that is a different process, and also a lot more work in some cases.

Note that this example is zotero-specific, but there are other tools that let you extract PDF annotations.

Comment 3 Albert Astals Cid 2015-01-27 23:07:13 UTC

Well, you said it's an image, you can not extract text from an image (even if the image is text), no?

Comment 4 ned 2015-01-28 02:00:34 UTC

I didn't say it's an image... I am talking about text. OCR'd PDFs usually have the image, with the text underlayed. This means that you can see the image, but you can also highlight the text (or rectangle select it and copy it).

Comment 5 ned 2015-01-28 02:09:33 UTC

Try opening this PDF: http://docs.lib.noaa.gov/rescue/mwr/097/mwr-097-11-0739.pdf

In Okular, if you try to use the text-selection tool, it is not possible to select, for example, just the first column of the Contents on page 1. You can do it with the Rectangle selection tool. Similarly, you can not use the highlighter review tool to just highlight the first column. But if you rectangle-review the column, you can't extract the highlighted text as part of the annotation, as you can with the highlighter tool's annotations.

Perhaps there is a broader bug here. In firefox, if you open that PDF, you *can* select just the second column of the Contents, although if you try to select the first column, it screws up a bit and selects a little bit of column 2 too. I guess firefox is using some kind of text-column detection? Perhaps Okular could implement something like that? If it worked well, it would solve this feature request for me.

Comment 6 Albert Astals Cid 2015-02-03 21:15:21 UTC

We're using text-column detection, it just has some bugs, honestly i'd just prefer someone to invest time in the improvement of the text-column algorithm than implementing another tool whose only purpose is workarounding that bug.

What do you think?

Comment 7 Christoph Feck 2015-02-14 22:21:38 UTC

If you can provide the information requested in comment #6, please add it.

Comment 8 ned 2015-02-15 22:55:25 UTC

Yes, I think better column detection would probably solve the use cases I'm thinking of.

Comment 9 Christoph Feck 2015-03-15 19:07:25 UTC

Albert, ... probably this is the bug you wanted to resolve as a duplicate of bug 340637?

Comment 10 Albert Astals Cid 2015-03-16 20:49:34 UTC


*** This bug has been marked as a duplicate of bug 340637 ***