Bug 442280 - Okular does not take /UserUnit into account (page size incorrect for certain files)
Summary: Okular does not take /UserUnit into account (page size incorrect for certain ...
Status: REPORTED
Alias: None
Product: okular
Classification: Applications
Component: PDF backend (show other bugs)
Version: 21.08.1
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-10 19:52 UTC by geisserml
Modified: 2021-10-06 12:42 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
userunit_10.pdf (342.89 KB, application/pdf)
2021-09-10 19:52 UTC, geisserml
Details
userunit_screenshot (61.13 KB, image/png)
2021-09-10 20:08 UTC, geisserml
Details
Proportions pdf (338.86 KB, application/pdf)
2021-09-10 20:15 UTC, geisserml
Details
adobe_reader (101.04 KB, image/png)
2021-09-10 20:41 UTC, geisserml
Details
(unrelated) okular-mupdf-backend build error (6.41 KB, text/plain)
2021-10-06 11:30 UTC, geisserml
Details

Note You need to log in before you can comment on or make changes to this bug.
Description geisserml 2021-09-10 19:52:45 UTC
Created attachment 141453 [details]
userunit_10.pdf

SUMMARY
[PDF background]
In the PDF format, coordinates are given in PDF points, where by default 1 point is equivalent to 1/72 of an inch (1in -> 2.54cm). However, PDFs can define custom units on a per-page basis, using the /UserUnit key.
/UserUnit is a float or decimal that scales the default conversion fraction of 1/72, so for a /UserUnit of 10, 1pt would mean 10/72in.

[What Okular does]
It seems that Okular (like many other open-source PDF software) does not take /UserUnit into account for the displayed page size.
The attached test document `userunit_10.pdf` defines a /UserUnit of 10.
The document's /MediaBox looks like this:
```python3
[ Decimal('0.0'), Decimal('0.0'), Decimal('1785.6'), Decimal('1785.6')
```
Now the default conversion with 1pt -> 1/72in returns 630x630mm, which is what Okular displays. However, this is incorrect. In reality, the size is 6300mm, 10 times larger!
(In particular, /UserUnit is used by Adobe Illustrator and possibly other PDF software to circumvent the maximum number of 14400pt imposed by Adobe Reader and some other PDF renderers.)

STEPS TO REPRODUCE
1. Open the attached file in Okular
2. Go to File -> Properties
3. See the displayed page size
4. Inspect the document with the pikepdf python library, or and other PDF library of your choice
5. Print the /MediaBox and /UserUnit of page 0

OBSERVED RESULT
Displayed page size is too small by factor 10.

EXPECTED RESULT
Displayed page size should always reflect the real page size and take /UserUnit into account.

Operating System: KDE neon 5.22
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.85.0
Qt Version: 5.15.3
Kernel Version: 5.11.0-34-generic (64-bit)
Graphics Platform: Wayland
Comment 1 geisserml 2021-09-10 20:00:00 UTC
Python shell code to reproduce (replace TestFiles.userunit_10 with the path string where you saved the file, and skip the first import which depends on custom test infrastructure of the lib I am developing):

```python3
Python 3.8.10 (default, Jun  2 2021, 10:49:15) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tests_pdfnodegraph.testfiles import TestFiles
>>> import pikepdf
>>> pdf = pikepdf.Pdf.open(TestFiles.userunit_10)
>>> page = pdf.pages[0]
>>> page.MediaBox
pikepdf.Array([ Decimal('0.0'), Decimal('0.0'), Decimal('1785.6'), Decimal('1785.6') ])
>>> page.UserUnit
Decimal('10.0')
>>> 1785.6 * 1/72 * 25.4
629.9199999999998
>>> 1785.6*10 * 1/72 * 25.4
6299.2
```
Comment 2 geisserml 2021-09-10 20:08:02 UTC
Created attachment 141454 [details]
userunit_screenshot
Comment 3 geisserml 2021-09-10 20:08:56 UTC
To clarify, I think it is not only the displayed size number that is incorrect, but also the space reserved for rendering the actual page:
The screenshot I just added illustrates it better: The first page is from the userunit_10 file. The other 2 pages are ANSI A and A4 size, which is very roughly   200mm width - put one of the smaller pages three times next to each other, and it approximately matches the width of the larger page, although in fact it should be a lot larger - roughly thirty times the width of the smaller page!
Comment 4 geisserml 2021-09-10 20:10:13 UTC
> the space reserved for rendering the actual page
or better formulated: the proportions of different pages to each other
Comment 5 geisserml 2021-09-10 20:15:45 UTC
Created attachment 141455 [details]
Proportions pdf

For you to confirm the UserUnit is set on the first page of the document in the screenshot, but not on the other pages.

```python3
>>> from tests_pdfnodegraph.pathtools import TestOutput
>>> pdf = pikepdf.open(join(TestOutput,'out_14.pdf'))
>>> page = pdf.pages[0]
>>> page.UserUnit
Decimal('10.0')
>>> page_2 = pdf.pages[1]
>>> page_2.UserUnit
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/manuel/.local/lib/python3.8/site-packages/pikepdf/_methods.py", line 1143, in __getattr__
    return getattr(self.obj, name)
AttributeError: /UserUnit
>>> 
```
Comment 6 geisserml 2021-09-10 20:36:19 UTC
List of other affected PDF software:
* Chromium integrated PDF viewer (uses PDFium)
* Firefox integrated PDF viewer (uses pdf.js)
* Inkscape PDF importer (uses Poppler)
* Scribus PDF importer
* PDFStitcher (uses pikepdf)
* PDF Arranger (uses pikepdf)
* even the proprietary Master PDF Editor 4 and 5

Probably more ...
Comment 7 geisserml 2021-09-10 20:41:58 UTC
Created attachment 141456 [details]
adobe_reader

... only Adobe Reader gets the proportions right
Comment 8 Albert Astals Cid 2021-09-10 21:31:47 UTC
Do *not* add me to bugs.

I don't understand what makes you think that is normal behaviour, but it's not, you're only making me ignore you.
Comment 9 geisserml 2021-09-10 21:40:33 UTC
Sorry. I just thought you'd be the maintainer of Okular, and wondered why you are not in the CC list, but apparently this has its reason. Sorry, really.
Comment 10 Oliver Sander 2021-09-11 04:27:28 UTC
Can you reproduce the problem using one of the poppler command line tools like pdfinfo or pdftocairo?  It may be a poppler bug.
Comment 11 geisserml 2021-09-11 10:00:40 UTC
Pdfinfo from poppler-utils does not show regular units like centimetres or inches, but it keeps the PDF points. Pdfinfo is a low-level tool that does not perform unit conversion on its own. However, it does not display the UserUnit value, so you could say it's somewhat wrong in the sense that it withholds information.

So to judge who is at fault, it would be relevant to know how Okular obtains the displayed page size. Does it inspect CropBox/MediaBox and convert to units itself, or does it retrieve finished unit values from Poppler? In the first case, the source of the bug would be in Okular, in the second case it would be in Poppler.
Comment 12 geisserml 2021-09-11 10:07:11 UTC
I've searched a bit in the code, and at least the rendering proportions issue is Okular's fault I think: https://github.com/KDE/okular/blob/3a513f34b8bbba87bd96718dc96089e079578d55/generators/poppler/generator_pdf.cpp#L721
Comment 14 David Hurka 2021-09-11 10:10:30 UTC
(In reply to Oliver Sander from comment #10)
> Can you reproduce the problem using one of the poppler command line tools
> like pdfinfo or pdftocairo?  It may be a poppler bug.

`pdfinfo userunit_10.pdf` reports `Page size: 1785.6 x 1785.6 pts`

(In reply to Manuel Geißer from comment #6)
> List of other affected PDF software:
> * Chromium integrated PDF viewer (uses PDFium)
> * Firefox integrated PDF viewer (uses pdf.js)
> * Inkscape PDF importer (uses Poppler)
> * Scribus PDF importer
> * PDFStitcher (uses pikepdf)
> * PDF Arranger (uses pikepdf)
> * even the proprietary Master PDF Editor 4 and 5
> 
> Probably more ...
I think you should report at PDFium, pdf.js, Poppler, and pikepdf.

Poppler is here: https://gitlab.freedesktop.org/poppler/poppler/issues
It is the library used by Okular.

There is also a muPDF backend for Okular. Did you try that? `mutool info userunit_10.pdf` reports `[ 0 0 17856 17856 ]`.
Comment 15 geisserml 2021-09-11 10:35:31 UTC
From the referenced code we can see that Okular uses the Poppler::Page::pageSizeF() function to obtain the page size:
https://poppler.freedesktop.org/api/qt5/classPoppler_1_1Page.html#a598c287971839a113552176fc387ab30
This function is based on CropBox and returns points.

What about the following solution:
- the pageSize() and pageSizeF() functions should be changed to take /UserUnit into account, as the docs suggest the returned value is always given in 1/72in units
- Additionally there should be some way to obtain the /UserUnit value with poppler. I couldn't find any such option in the documentation, though I only skimmed it.
Comment 16 geisserml 2021-09-11 10:42:18 UTC
> I think you should report at PDFium, pdf.js, Poppler, and pikepdf.
Be careful - there are considerable differences between these libraries. I don't really know about pdf.js and PDFium, but pikepdf is quite low-level and does not provide a function to obtain page size on its own - this needs to be done downstream using CropBox/MediaBox, UserUnit, and Rotate.

> There is also a muPDF backend for Okular. Did you try that? `mutool info userunit_10.pdf` reports `[ 0 0 17856 17856 ]`.
Yes, I am aware that MuPDF directly takes /UserUnit into account. I noticed this during the tests for my lib (which also has a (Py)MuPDF rendering backend).
How do I obtain the MuPDF backend for Okular, though? Is it possible that KDE Neon does not provide it? (I already have the okular-extra-backends package installed...)
Comment 17 geisserml 2021-09-11 10:44:30 UTC
> I think you should report at PDFium, pdf.js, Poppler, and pikepdf.
I think it might be better if the Okular developers would report to Poppler, since I never used the Poppler library interface myself and thus don't have the required background.
Comment 18 geisserml 2021-09-13 11:54:25 UTC
> I think it might be better if the Okular developers would report to Poppler, 
> since I never used the Poppler library interface myself and thus don't have the 
> required background.
I now filed an issue at Poppler nevertheless, as nobody else seems to have felt any responsibility to do so. The report essentially just references this thread, as it should contain all relevant information.
https://gitlab.freedesktop.org/poppler/poppler/-/issues/1139

@OkularDevelopers: Please verify/confirm whether changing pageSize() and pageSizeF() would really be sufficient to fix the UserUnit issue.
Comment 19 geisserml 2021-09-16 17:17:37 UTC
> There is also a muPDF backend for Okular. Did you try that?
The Ubuntu Focal mupdf package currently fails to open the file (https://bugs.launchpad.net/ubuntu/+source/mupdf/+bug/1943366). This likely is fixed in newer versions of mupdf or affects the MuPDF GUI only, though.
Comment 20 geisserml 2021-10-05 13:49:53 UTC
> There is also a muPDF backend for Okular. Did you try that?
Is this at all still current? I checked out okular from https://invent.kde.org/graphics/okular.git and built with CMake, but couldn't find any hints on a MuPDF backend.

`ls generators/` only shows
```
chm  CMakeLists.txt  comicbook  djvu  dvi  epub  fax  fictionbook  kimgio  markdown  mobipocket  plucker  poppler  spectre  tiff  txt  xps
```
Comment 21 geisserml 2021-10-05 13:50:44 UTC
`grep -r mupdf` on the Okular source tree doesn't find anything, either
Comment 22 David Hurka 2021-10-05 20:44:27 UTC
No, the muPDF backend is not part of Okular, it is an independent project. Just search for it in the internet for okular-backend-mupdf or okular-mupdf-backend.
Comment 23 geisserml 2021-10-06 11:14:49 UTC
I guess you are referring to https://invent.kde.org/sandsmark/okular-mupdf-backend ? The thing is, there are multiple unofficial Okular MuPDF generators around...
Moreover, why is this not officially part of Okular and not packaged in Debian, Ubuntu, and KDE Neon?
Comment 24 geisserml 2021-10-06 11:28:42 UTC
So I installed the dependencies and tried to build okular-mupdf-backend (from git master), but it fails with some "Variable not declared in this scope" error. Also there have been no commits to the repository since a year. Is this backend still functional?
Comment 25 geisserml 2021-10-06 11:30:17 UTC
Created attachment 142205 [details]
(unrelated) okular-mupdf-backend build error
Comment 26 Oliver Sander 2021-10-06 12:41:06 UTC
Can you guys please move the mupdf discussion elsewhere?  While it is certainly interesting, it is only tangentially related to this bug.
Comment 27 geisserml 2021-10-06 12:42:18 UTC
Sure.