Bug 335133 - regression: margins not trimmed on scanned documents, works on non-image files
Summary: regression: margins not trimmed on scanned documents, works on non-image files
Status: RESOLVED NOT A BUG
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: 0.18.4
Platform: Debian unstable Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-21 11:42 UTC by Japs
Modified: 2014-06-01 21:59 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
screenshot of articles not being trimmed. (702.12 KB, image/png)
2014-05-21 11:46 UTC, Japs
Details
Screen capture of the rendering of the file (611.66 KB, image/png)
2014-05-24 11:38 UTC, Albert Astals Cid
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Japs 2014-05-21 11:42:16 UTC
The trim margin function doesn't trim the margins of documents representing scanned texts (e.g. old scientific articles). It does work normally on recent ones, that do not come from a scan.

The trim margin function used to work on scanned articles as well. This ended likely with the following debian upgrade (aptitude log):
aptitude.1:[UPGRADE] libokularcore3:amd64 4:4.11.5-1 -> 4:4.12.4-1
aptitude.1:[UPGRADE] okular:amd64 4:4.11.5-1 -> 4:4.12.4-1
aptitude.1:[UPGRADE] okular-extra-backends:amd64 4:4.11.5-1 -> 4:4.12.4-1



Reproducible: Always

Steps to Reproduce:
1. Open a pdf scan of an old article, for instance any Physical Review Letter from the 70s
2. Try to trim margins on it.
Actual Results:  
Margins are not trimmed

Expected Results:  
Margins are trimmed
Comment 1 Japs 2014-05-21 11:46:42 UTC
Created attachment 86744 [details]
screenshot of articles not being trimmed.

The article displayed on the left and right sides are
L: Phys. Rev. Lett. 23 1430 (a scanned document)
R: Phys. Rev B 88 104509 (electronically generated)

On the top row, trim margin is off, whereas it is on on the bottom row.
On previous versions of okular, the trim margins function used to work on both.
Comment 2 Albert Astals Cid 2014-05-23 18:17:59 UTC
I very much doubt this is a regression, the trimming code hasn't changed for ages.

Anyway if you don't attach a file with the problem it's impossible for us to fix anything
Comment 3 Japs 2014-05-23 21:54:40 UTC
(In reply to comment #2)
> Anyway if you don't attach a file with the problem it's impossible for us to
> fix anything

I didn't attach any cause most scientific articles are paywalled. 
This one has been released open access and can be downloaded without a subscription:
http://journals.aps.org/pr/pdf/10.1103/PhysRev.108.1175
Comment 4 Albert Astals Cid 2014-05-24 11:36:51 UTC
The line has almost impercetible white-ish lines scattered all over the document (probably caused by scanning), since we only crop out really white backgrounds we don't trim this document. As said this algorithm hasn't changed recently so this can't be a regression.

If you'd like the trim algorithm to be smarter and ignore this lines (which is kind of very hard) feel free to open a bug/wish about it.
Comment 5 Albert Astals Cid 2014-05-24 11:38:53 UTC
Created attachment 86796 [details]
Screen capture of the rendering of the file

If you open it with kolourpaint and say, convert it to monochrome, you'll see there's actually non white stuff all over (or go to image more effects and reduce the gamma completely)
Comment 6 Japs 2014-05-27 22:20:47 UTC
(In reply to comment #5)
> Created attachment 86796 [details]
> Screen capture of the rendering of the file
> 
> If you open it with kolourpaint and say, convert it to monochrome, you'll
> see there's actually non white stuff all over (or go to image more effects
> and reduce the gamma completely)

I see what you mean, especially using kmag like in your screenshot. Indeed a faint yellow regular lattice appears.
However, I'm fairly sure that, in the past, these documents were handled right. Do you think that changes in the PDF rendering backend could produce the yellow artifacts?
This could also explain why, when opening the file in Gimp at 600dpi I cannot see any yellow lines.

Also, I'm trying to mess around with Utils::imageBoundingBox(). As a first attempt, I'm changing the check for a white border with one for qGray values less than a threshold.
Comment 7 Albert Astals Cid 2014-06-01 21:59:12 UTC
I'm pretty sure that code hasn't changed, since you seem to know how to handle yourself with code, feel free to play a bit with the git history and see if you can find a version in which it did "work" with that file.