Bug 459209 - Using tab character in form increases file size by several MB
Summary: Using tab character in form increases file size by several MB
Status: REPORTED
Alias: None
Product: okular
Classification: Applications
Component: PDF backend (show other bugs)
Version: 22.04.3
Platform: Debian testing Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-09-16 12:06 UTC by Florine W. Dekker
Modified: 2022-09-18 22:25 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Form that grows when you use a tab symbol in an input (89.13 KB, application/pdf)
2022-09-16 12:06 UTC, Florine W. Dekker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florine W. Dekker 2022-09-16 12:06:16 UTC
Created attachment 152096 [details]
Form that grows when you use a tab symbol in an input

SUMMARY
If a user puts a tab character into a form, the size of the PDF is increased by several megabytes once saved, even if the tab character is removed immediately after placing it. As a result, I cannot comfortably use Okular to fill in forms without being paranoid about absent-mindedly entering and removing a tab character.

STEPS TO REPRODUCE
1. Open the PDF that is attached to this issue.
2. Select "Show Forms".
3. Select the input box to the right of "In welk land heeft u kosten gemaakt?".
4. Press the tab key, or paste the tab symbol (`	`).
   Optionally, press backspace to remove the tab symbol again.
   (Using Ctrl+Z to remove the character "undoes" the bug correctly.)
5. Deselect "Show Forms".
6. Save the PDF file (using Ctrl+S).
7. Check the file size.

OBSERVED RESULT
The size of the form has increased by several megabytes, even if the tab symbol is removed with backspace before saving.

EXPECTED RESULT
The size of the form does not increase (significantly) after saving.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Debian testing
KDE Plasma Version: 5.25.4
KDE Frameworks Version: 5.97.0
Qt Version: 5.15.4

ADDITIONAL INFORMATION
I wasn't sure which component to select for this issue. My apologies if it is incorrect.
Comment 1 Florine W. Dekker 2022-09-16 12:14:45 UTC
Correction: Ctrl+Z does *not* undo the bug correctly. See following procedure:

1. Open the PDF that is attached to this issue.
2. Select "Show Forms".
3. Select the input box to the right of "In welk land heeft u kosten gemaakt?".
4. Press the tab key, or paste the tab symbol (`	`).
5. Press Ctrl+Z.
6. Verify that the file does not have unsaved changed (i.e. there is no asterisk after the filename in the window title).
7. Enter some text (e.g. "Lorem ipsum") in the input box to the right of "In welk land heeft u kosten gemaakt?".
8. Deselect "Show Forms".
9. Save the PDF file (using Ctrl+S).
10. Check the file size.

The file size grows from 91.270 bytes to 5.190.386 bytes.

As a sanity check, compare what happens when steps 4 and 5 of this procedure are skipped: The file size grows from 91.270 bytes to a mere 96.006 bytes.
Comment 2 Albert Astals Cid 2022-09-18 19:21:20 UTC
I know this is unfortunate, but it is how it is, you're adding text to the document, we need a font that supports that character, so we add that font to the file.

Could it be done better, yes? But we sadly don't have the people to code that better way to do it at this time.

FWIW this is mostly something that needs improvement in poppler, not in Okular itself (though some stuff may be needed to change in okular too)
Comment 3 Florine W. Dekker 2022-09-18 20:48:11 UTC
(In reply to Albert Astals Cid from comment #2)
> I know this is unfortunate, but it is how it is, you're adding text to the
> document, we need a font that supports that character, so we add that font
> to the file.
> 
> Could it be done better, yes? But we sadly don't have the people to code
> that better way to do it at this time.
> 
> FWIW this is mostly something that needs improvement in poppler, not in
> Okular itself (though some stuff may be needed to change in okular too)

Interesting, and indeed unfortunate.

However, in that case I would expect the growth to occur only once, but in some additional experiments I see that once the tab character has been added, each time the form is edited it grows by a megabyte or so. In the past, I've had a PDF grow to a whopping 500MB because I compulsively save the form after every sentence I write.

Here's how to reproduce the ever-growing PDF:
1. Open the PDF that is attached to this issue.
2. Select "Show Forms".
3. Add a tab character in the input box to the right of "In welk land heeft u kosten gemaakt?".
4. Add a tab character in the input box to the right of "Wat is er gebeurd en welke medische zorg kreeg u?".
5. Save the PDF file (using Ctrl+S), and check the file size. It should have grown by a few MBs.
6. Select the input box to the right of "In welk land heeft u kosten gemaakt?".
7. Add "aaa" after the tab character.
8. Save the PDF file (using Ctrl+S), and check the file size. It should have grown by a MB or so.
9. Repeat steps 7 and 8 indefinitely.

Interestingly, this bug requires the tab to have been added to two input boxes. If it's done to only one, the file doesn't keep growing in size.
Comment 4 Albert Astals Cid 2022-09-18 22:25:07 UTC
hmm, that is indeed a bug, (a poppler bug) we seem to be saving the font again and again each time, it should only be added once, i'll have a look