Bug 496369 - Saving PDF form data multiples times, increases PDF size (possible leak / not removing previous form data)
Summary: Saving PDF form data multiples times, increases PDF size (possible leak / not...
Status: RESOLVED NOT A BUG
Alias: None
Product: okular
Classification: Applications
Component: PDF backend (show other bugs)
Version: 24.08.3
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-11-17 10:16 UTC by Andrew Rembrandt
Modified: 2024-11-20 22:40 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Blank form with no saves (23.92 KB, application/pdf)
2024-11-17 10:16 UTC, Andrew Rembrandt
Details
Form after 1st save (41.91 KB, application/pdf)
2024-11-17 10:16 UTC, Andrew Rembrandt
Details
Form after 50 saves (in Okular) (732.57 KB, application/pdf)
2024-11-17 10:17 UTC, Andrew Rembrandt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Rembrandt 2024-11-17 10:16:09 UTC
Created attachment 175875 [details]
Blank form with no saves

SUMMARY
I have a PDF with 2 large text box forms (it's contains a large amount of form text - addendum for a legal contract).
It increased in size from 400kB (including embedded form fonts) to 18MB after approximately 20-30 edits of form data and 20-30 saves.
At first, I thought this might be a bug with embedded fonts being duplicated, but after checking, there was only 5 or so fonts in the PDF, not >20. 

Unfortunately, I can't publicly share that PDF (happy to privately if you wish).
However, I created a test PDF which I reproduced the problem after 50 saves of form data in Okular.

STEPS TO REPRODUCE
1. Record PDF file size
2. Click Show Forms
3. Pasting a ‘large’ amount of text in the form, e.g. from: https://www.lipsum.com/feed/html
4. Save
5. Record PDF file size (~44KB)
6. Make a minor modification (e.g. remove a character / add a new line or space)
7. Save
8. Repeat the previous 2 steps 50 times
9. Record PDF file size (~730KB)


OBSERVED RESULT
File size increases each save, until it reaches 730KB after 50 times.

EXPECTED RESULT
File size remains approximately 44K, similar in size to the first save.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Arch linux kernel 6.11.8 x64, last full system update 2024-11-15
KDE Plasma Version: 6.2.3
KDE Frameworks Version: 6.8.0
Qt Version: 6.8.0
Poppler: 24.11.0-2

ADDITIONAL INFORMATION
Many thanks for the amazing project work all - the KDE Project is amazing.
Comment 1 Andrew Rembrandt 2024-11-17 10:16:44 UTC
Created attachment 175876 [details]
Form after 1st save
Comment 2 Andrew Rembrandt 2024-11-17 10:17:13 UTC
Created attachment 175877 [details]
Form after 50 saves (in Okular)
Comment 3 Andrew Rembrandt 2024-11-17 10:22:51 UTC
P.S. The https://github.com/pdfcpu/pdfcpu cli-util allows resetting the PDF data and you'll see the file size return to 45KB:
> pdfcpu form reset 'Sample Form Test - 50 or so saves.pdf' 'Sample Form Test - Reset 50 saves Form Data.pdf'
Comment 4 Albert Astals Cid 2024-11-20 22:40:46 UTC
This is almost by definition given how PDF files work, we're not rewriting the file, only appending to it so yeah, each time you change the thing and save it, the file grows.

rewriting it is too dangerous so this is not going to change.