Bug 471792 - Annotations greatly increase PDF file size and massively slow down Okular
Summary: Annotations greatly increase PDF file size and massively slow down Okular
Status: RESOLVED FIXED
Alias: None
Product: okular
Classification: Applications
Component: PDF backend (show other bugs)
Version: 22.12.3
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-30 11:22 UTC by Joe Breuer
Modified: 2023-07-04 06:35 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joe Breuer 2023-06-30 11:22:39 UTC
SUMMARY

Adding annotations to a run-of-the-mill PDF file (out of pdfLaTeX, I presume) hugely increases the file size with each additional annotation.

STEPS TO REPRODUCE
1. Open a PDF file
2. Add a couple of annotations (I mostly used Inline Notes, but also some underlines and Pop-up Notes)
3. Save as PDF

OBSERVED RESULT

Saving my 6.4 MB sample PDF with very few test annotations and saving as PDF yields a 7.4 MB file.

Removing those and adding a couple of 'real' annotations, the saved PDF weighs in at 20.7 MB.

Changing font size of two Inline Note annotations after they have been created yields a huge PDF of 143.4 MB.

With the increase in file size also comes an increase in CPU usage to the point that Okular slows down and becomes practically unusable (especially / immediately noticable with the "resized font" case).

EXPECTED RESULT

Annotations should only slightly increase file size and not strongly affect interactive performance.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: gentoo Linux 2.13, Kernel 6.1.31-gentoo (64-bit)
KDE Plasma Version: 5.27.5
KDE Frameworks Version: 5.106.0
Qt Version: 5.15.9

ADDITIONAL INFORMATION

Since attachments cannot exceed 4000kB, and all my samples are larger than that, I've put together my sample files here:
https://github.com/jmbreuer/heap/tree/main/kde-okular-bug-20230630

I've also seen bug #470128, and also noticed that changing the font size of an Inline Note annotation after adding it would hugely increase the file size (changing two notes, from those 6.4/7.4 MB to 143.4 MB).
Comment 1 Sune Vuorela 2023-06-30 11:38:21 UTC
What version of the underlying Poppler library are you using ?

Some versions did embed the same fonts multiple times, and your "big" document contains ~140 copies of noto fonts

It should be fixed in version 23.06 of the poppler library.

With poppler 23.06, adding annotations should grow the document with approximately 600k. With poppler 23.07, it should grow the document with approximately 300 k, assuming noto sans (okular default) is used.
Comment 2 Joe Breuer 2023-06-30 11:57:20 UTC
(In reply to Sune Vuorela from comment #1)
> What version of the underlying Poppler library are you using ?

23.05

> Some versions did embed the same fonts multiple times, and your "big"
> document contains ~140 copies of noto fonts

Ah, thank you! That makes (altogether too much) sense.

> It should be fixed in version 23.06 of the poppler library.
> 
> With poppler 23.06, adding annotations should grow the document with
> approximately 600k. With poppler 23.07, it should grow the document with
> approximately 300 k, assuming noto sans (okular default) is used.

I'll have a look into that very shortly and let you know - 23.06 looks to be available in gentoo 'testing'.
Trying a 23.07 bump is not immediately accessible to me, since there's apparently no release or even git tag yet. Not sure that I want to run git master on a production system.
Comment 3 Joe Breuer 2023-07-02 10:22:08 UTC
(In reply to Sune Vuorela from comment #1)
> Some versions did embed the same fonts multiple times, and your "big"
> document contains ~140 copies of noto fonts

What method/tool do you use to obtain this information?

> It should be fixed in version 23.06 of the poppler library.

Simply upgrading my poppler library to 23.06 still gives me PDFs increasing significantly in size with each additional annotation, though by far by not as much as with 23.05.

I've added a corresponding sample: https://github.com/jmbreuer/heap/tree/main/kde-okular-bug-20230630

In "About Backend", my Okular 22.12.3 now displays "Using Poppler 23.06.0, Built against Poppler 23.01.0".
Building this version of Okular against Poppler 23.06.0 yields a compile error, I'll see about trying out a newer Okular version.

> With poppler 23.06, adding annotations should grow the document with
> approximately 600k. With poppler 23.07, it should grow the document with
> approximately 300 k, assuming noto sans (okular default) is used.

Is that to be expected "per each annotation", or once per adding annotations at all, or per annotation type, ...?
Comment 4 Sune Vuorela 2023-07-03 07:09:14 UTC
(In reply to Joe Breuer from comment #3)
> (In reply to Sune Vuorela from comment #1)
> > Some versions did embed the same fonts multiple times, and your "big"
> > document contains ~140 copies of noto fonts
> 
> What method/tool do you use to obtain this information?

poppler source package (Often in a binary package like poppler-utils or poppler-tools) contains pdffonts tool that list all of the fonts in the pdf file.

> I've added a corresponding sample:
> https://github.com/jmbreuer/heap/tree/main/kde-okular-bug-20230630

That file embeds noto sans and noto sans regular. It seems to have grown with 1.5 mb. For two fonts that's not that much off.

> > With poppler 23.06, adding annotations should grow the document with
> > approximately 600k. With poppler 23.07, it should grow the document with
> > approximately 300 k, assuming noto sans (okular default) is used.
> 
> Is that to be expected "per each annotation", or once per adding annotations
> at all, or per annotation type, ...?

Technically, it is per different font, assuming every font is as big as Noto Sans and compresses with zlib similar to Noto Sans.
Comment 5 Joe Breuer 2023-07-03 08:59:49 UTC
(In reply to Sune Vuorela from comment #4)
> (In reply to Joe Breuer from comment #3)
> > What method/tool do you use to obtain this information?
> 
> poppler source package (Often in a binary package like poppler-utils or
> poppler-tools) contains pdffonts tool that list all of the fonts in the pdf
> file.

Ah, that's great to know!

> > > With poppler 23.06, adding annotations should grow the document with
> > > approximately 600k. With poppler 23.07, it should grow the document with
> > > approximately 300 k, assuming noto sans (okular default) is used.
> > 
> > Is that to be expected "per each annotation", or once per adding annotations
> > at all, or per annotation type, ...?
> 
> Technically, it is per different font, assuming every font is as big as Noto
> Sans and compresses with zlib similar to Noto Sans.

Ah thanks, I get it and know where to look now!

I've added another example using Okular 23.02.4 (with/against Poppler 23.06.0), and that noticeably keeps a stable file size also when adding multiple annotations. It is somewhat smaller than the Okular 22.12.3/Poppler 23.06.0/built against Poppler 23.01.0 sample, the difference seems to be down to Okular 23 only embedding Noto Sans, not Noto Sans and Noto Sans Regular both.

So - for my purposes - Poppler 23.06.0 and possibly Okular 23.02.4 are the required combination for a PDF annotation workflow without... "file size and performance surprises."
Comment 6 Sune Vuorela 2023-07-04 06:35:02 UTC
(In reply to Joe Breuer from comment #2)
> Trying a 23.07 bump is not immediately accessible to me, since there's
> apparently no release or even git tag yet. Not sure that I want to run git
> master on a production system.

Just fyi; 23.07 is now released.