Bug 470174 - 40 Characters Annotation consuming 20MB
Summary: 40 Characters Annotation consuming 20MB
Status: RESOLVED FIXED
Alias: None
Product: okular
Classification: Applications
Component: general (show other bugs)
Version: 23.04.1
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-23 17:11 UTC by der.wolfgang.amadeus.mozart
Modified: 2023-06-25 22:57 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
File with one comment with text "a" - original size ~80kb now 2MB! (1.99 MB, application/pdf)
2023-05-23 17:11 UTC, der.wolfgang.amadeus.mozart
Details
original file (86.25 KB, application/pdf)
2023-05-23 17:14 UTC, der.wolfgang.amadeus.mozart
Details
Annotated File from March 2023 with "normal" storage consumption (617.20 KB, application/pdf)
2023-06-09 15:45 UTC, der.wolfgang.amadeus.mozart
Details

Note You need to log in before you can comment on or make changes to this bug.
Description der.wolfgang.amadeus.mozart 2023-05-23 17:11:10 UTC
Created attachment 159208 [details]
File with one comment with text  "a" - original size ~80kb now 2MB!

I commented a file with a few characters. Initially the file had ~ 80KB. With the annotations ~20MB and Okular had an excessive CPU consumption (100% of a core - should be 1 Thread that behaves strange) The CPU consumtion starts when scrolling. With 12 comments my file was lagging on scroll.  When comment is deleted, file keeps memory consumption.

Tried it with some other files. About 4MB for one comment with a few characters like "aspofksdfjdsafjfsadpfg"!

EXPECTED RESULT
Annotations to consume < 10KB for text only per comment and use 0.01% of CPU. The file should have the original size after deletion of comments.

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
Comment 1 der.wolfgang.amadeus.mozart 2023-05-23 17:14:08 UTC
Created attachment 159209 [details]
original file

original file for comparison
Comment 2 Sune Vuorela 2023-06-01 13:13:02 UTC
There are at least 3 different fixes needed here.
Poppler (The pdf library used by okular) does not currently compress streams (font data) on saving. There is a very recent open merge request to address that.

There is also less space consuming ways of storing font widths in pdf. There is also a very recent open merge request to address that against poppler.

The pdf standard also supports the concept of only storing a font subsets so all of the above don't happen with the full font, but only font info for those needed. That is probably a major task and no current work is happening here.

Using a different font than Noto could also make it grow less. That would be up to the usability people to change the default. But from a user perspective you could want to do that.
Comment 3 der.wolfgang.amadeus.mozart 2023-06-09 15:45:55 UTC
Created attachment 159566 [details]
Annotated File from March 2023 with "normal" storage consumption

I found a file with a relatively big number of annotations created with okular (i don't know the version) in march. The storage consumption is "normal" - a few KB. I attached the file, I hope it is helpful.
Comment 4 der.wolfgang.amadeus.mozart 2023-06-09 15:47:36 UTC
Comment on attachment 159566 [details]
Annotated File from March 2023 with "normal" storage consumption

I found a file with a relatively big number of annotations created with okular (i don't know the version) in march. The storage consumption is "normal" - a few KB. I attached the file, I hope it is helpful.
Comment 5 Sune Vuorela 2023-06-12 07:18:51 UTC
(In reply to der.wolfgang.amadeus.mozart from comment #4)
> Comment on attachment 159566 [details]
> Annotated File from March 2023 with "normal" storage consumption
> 
> I found a file with a relatively big number of annotations created with
> okular (i don't know the version) in march. The storage consumption is
> "normal" - a few KB. I attached the file, I hope it is helpful.

This is kind of interesting. The main reason the files grows are embdednig fonts.  In the "normal" storage consumption, font embedding fails and it just adds "Invalid_font" instead of the requested font.

In your "grows to 2mb", the font is wrongly embedded twice and not at all optimized for storage. 

With only one copy of noto font (The Okular default), not optimized for storage, approximately 900 kb is the expected growth.

With poppler 23.06, adding a annotation is expected to grow the document with 5-700 kbytes, and when my upcoming changes hopefully lands in poppler 23.07, annotations are expected to grow the document with 250-300 kbytes.
Comment 6 der.wolfgang.amadeus.mozart 2023-06-24 18:59:13 UTC
(In reply to Sune Vuorela from comment #5)
> (In reply to der.wolfgang.amadeus.mozart from comment #4)
> > Comment on attachment 159566 [details]
> > Annotated File from March 2023 with "normal" storage consumption
> > 
> > I found a file with a relatively big number of annotations created with
> > okular (i don't know the version) in march. The storage consumption is
> > "normal" - a few KB. I attached the file, I hope it is helpful.
> 
> This is kind of interesting. The main reason the files grows are embdednig
> fonts.  In the "normal" storage consumption, font embedding fails and it
> just adds "Invalid_font" instead of the requested font.
> 
> In your "grows to 2mb", the font is wrongly embedded twice and not at all
> optimized for storage. 
> 
> With only one copy of noto font (The Okular default), not optimized for
> storage, approximately 900 kb is the expected growth.
> 
> With poppler 23.06, adding a annotation is expected to grow the document
> with 5-700 kbytes, and when my upcoming changes hopefully lands in poppler
> 23.07, annotations are expected to grow the document with 250-300 kbytes.

I've tried some annotations - it seems to work now. As yous say the consumption is now about 250-300 kbytes.

Thanks a lot for your work and the support - I love Okular, Plasma and use a lot of the KDE Apps.