Bug 470128 - Saving document with resized text creates huge memory footprint
Summary: Saving document with resized text creates huge memory footprint
Status: REPORTED
Alias: None
Product: okular
Classification: Applications
Component: PDF backend (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-22 13:57 UTC by dorla.hutch
Modified: 2023-07-15 17:37 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dorla.hutch 2023-05-22 13:57:10 UTC
This issue haunts me for a longer time and I'd like to share it with you.

When I browse my PDF and I add an annotation to the slides or form, sometimes the selected font size is too small or large. When I resize the text (same for annotated text inside a box) and save it, it will not only take a long time but the size of the PDF will get huge, more than 100MiB are added during saving! Going from 2 MiB to 103MiB! This is like saving a high-resolution image for every character in the alphabet. This memory waste is added to each affected document.

This does not happen if the font size is not changed.

There are significant problems

- It cannot be undone!! Saving after removing the change that inflated the file will not help. The only way to regain the previous size is to scrap the document with all annotations and redownload the original one, creating new annotations.
- file becomes too large for emails, uploading or submission with size restrictions
- it wastes a lot of memory for no good reason
- it's the size of a large or long compressed video

If the issue cannot be fixed, there are some intermediate but weak solutions:

- export + import of annotations between documents (it should work at least for equal documents)
- if the problem occurs, warn users and let them cancel the saving process
- allow changes made by saving to be undone completely

Version: Flathub, installed via Software Center on Ubuntu 20.04.3 LTS.
Comment 1 Sune Vuorela 2023-06-13 14:02:20 UTC
Can you provide a sample document? Several bugs regarding this has been fixed in latest poppler release
Comment 2 dorla.hutch 2023-06-13 19:59:51 UTC
(In reply to Sune Vuorela from comment #1)
> Can you provide a sample document? Several bugs regarding this has been
> fixed in latest poppler release

I have created a PDF with only one single word of text as content: "test". Then I added an inline note with one test sentence. I scaled it from font size 10 to 18 and saved. From initially 23KiB, it went to 20MiB (at least not 100MiB) and also takes longer to save. I assume as the document will scale, the additional overhead probably scales as well.

I am trying it with this file sharing host (deleted after 7 days):
https://tmpsend.com/V6BNmWWQ
Comment 3 dorla.hutch 2023-06-13 20:02:10 UTC
(In reply to dorla.hutch from comment #2)
> From initially 23KiB, it went to 20MiB (at least
> not 100MiB) …

After adding the comment box before scaling, it's already 918KiB to be honest but after scaling, it becomes 20MiB.
Comment 4 Sune Vuorela 2023-06-14 09:51:53 UTC
(In reply to dorla.hutch from comment #3)
> (In reply to dorla.hutch from comment #2)
> > From initially 23KiB, it went to 20MiB (at least
> > not 100MiB) …
> 
> After adding the comment box before scaling, it's already 918KiB to be
> honest but after scaling, it becomes 20MiB.

adding ~900kb is kind of expected unless you use poppler (the pdf library underneath okular) from the newest version (released a couple of weeks ago, maybe with some of my patches on top), given that noto sans regular is the default font and it will be embeedded.

But for some reason, though I didn't manage to succeed myself, your file also contains a 20mb noto sans CJK HK file, so unless you explicitly have specified it, I'm a bit puzzled. (CJK contains Chinese, Japanese, Korean and maybe other east asian glyphs)

What are your exact steps to resize the text ?
Comment 5 Joe Breuer 2023-06-30 11:24:20 UTC
My bug #471792 has one sample which massively demonstrates this - file size from 6.4 MB (original) to 143.4 MB (two Inline Notes with resized fonts).
Comment 6 dorla.hutch 2023-07-07 20:42:49 UTC
(In reply to Sune Vuorela from comment #4)
> (In reply to dorla.hutch from comment #3)
> > (In reply to dorla.hutch from comment #2)
> > > From initially 23KiB, it went to 20MiB (at least
> > > not 100MiB) …
> > 
> > After adding the comment box before scaling, it's already 918KiB to be
> > honest but after scaling, it becomes 20MiB.
> 
> adding ~900kb is kind of expected [...]
> 
> But for some reason, though I didn't manage to succeed myself, your file
> also contains a 20mb noto sans CJK HK file, so unless you explicitly have
> specified it, [...]
> 
> What are your exact steps to resize the text ?

I am sorry for my late response. I only resize the text in the single way I know. It affects multiple annotations, not just inline notes but also text. I open the text properties widget (using the Browse mode -> right click -> properties) and choose another font size, nothing else. The problem is, no matter what I do, it will not shrink to the previous size, even if I undo the resizing of text (choosing normal font size) and save again. I would need to redownload the original, especially if the annotated PDF is supposed to be uploaded somewhere or to an email. Redownloading makes me lose all annotations I had.

Now, I tested it and the problem does not occur 😃 ! When resizing a text box with Cantarell font from 11 to 20, it will save quickly and there is no unusual footprint.

I think the problem is, that Okular on Ubuntu 20.04 only lists (shows) CJK-versions for any Noto Sans font so if Noto Sans is used, it will grow huge upon changing font properties. It probably changes the font to Noto Sans CJK xx.
Comment 7 dorla.hutch 2023-07-15 17:37:17 UTC
Update: this time, the problem popped up again but it seems to be random. Going from 2MiB to 102MiB. I have manually redownloaded the PDF and transfered all anotations manually to the new PDF, which is 100MiB smaller now. Unfortunately, this is stressful, when I try to learn for an oral exam in two days and have only finished half of the materials.

I did not resize anything this time. I probably pressed accidentally an unknown key combination.

The `pdffonts` tools does not show any suspicious fonts and is the same for the 102MiB and the 2MiB version of the same document:
```
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
AAAAAC+ArialMT                       TrueType          WinAnsi          yes yes yes      9  0
AAAAAE+Calibri                       TrueType          WinAnsi          yes yes yes     17  0
AAAAAG+ArialMT                       TrueType          WinAnsi          yes yes yes     24  0
AAAAAI+ArialMT                       TrueType          WinAnsi          yes yes yes     41  0
AAAAAK+Wingdings-Regular             TrueType          WinAnsi          yes yes yes     64  0
AAAAAM+Calibri-Italic                TrueType          WinAnsi          yes yes yes     91  0
AAAAAO+Calibri                       TrueType          WinAnsi          yes yes yes    109  0
```