479204 – Extremely slow compression - single-threaded compression appears to be the limiting factor.

Bug 479204 - Extremely slow compression - single-threaded compression appears to be the limiting factor.

Summary: Extremely slow compression - single-threaded compression appears to be the li...

Status:	CONFIRMED

Alias:	None

Product:	kbackup
Classification:	Applications
Component:	general (show other bugs)
Version:	23.08.4
Platform:	Other Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Martin Koller

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-12-30 22:59 UTC by pallaswept
Modified:	2024-01-01 09:40 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description pallaswept 2023-12-30 22:59:39 UTC

SUMMARY
Compression is painfully slow.

I am currently backing up a profile including some rather large files - VM images. One is 16.8GB, another is 84.1GB. Kbackup is compressing them, as per the profile settings. So far I have finished the 16GB file and am at 23% of the compression of the 84GB file, so I'm about 35 GB in, and the time is 1:50:00 (1 hour 50 minutes) edit: While writing this, the backup completed at 2:22:48, with a total of about 101GB of files, compressed down to an impressive 6.7GB.

If I perform a full disk backup (which I do occasionally) using clonezilla, an entire ~800GB of data can be archived and compressed (to a similar ratio) in around 30-35 minutes. Accordingly, I would expect that Kbackup should be capable of making a backup 1/8th of the size in roughly 1/8th of the time.... Which would be about 5 minutes. But at 142 minutes it's taking at least 30 times as long, which means it's about 30 times slower than one would reasonably expect it to be.

I can see by looking at a process monitor that the archiving is using 100% of a single 1 of the 24 cores available to it. Which lines up about right with the rough math above. Kbackup's archiving algorithm simply isn't using the resources available to it.

STEPS TO REPRODUCE
1. Backup a large file or files
2. Grow grey hairs while waiting
3. .... Don't profit?
:)

OBSERVED RESULT
Extremely long backup times

EXPECTED RESULT
Extremely short backup times

SOFTWARE/OS VERSIONS

OpenSUSE Tumbleweed
KDE Plasma Version: 5.27.10
KDE Frameworks Version: 5.113.0
Qt Version: 5.15.11

ADDITIONAL INFORMATION
Enough complaining, because I generally *really* like Kbackup, so I'd like to try to help with a solution.

Perhaps supporting a new compression algorithm/format, which uses multithreading natively, might be an easy solution to this issue?

Or perhaps better still, the existing compression formats, which produce an exceptional compression ratio, can be configured to use a multithreaded implementation?

Another alternative, even if the existing compression algorithm was used, but multiple files were compressed simultaneously (in parallel rather than serially) that would make a great deal of difference. Given that the compression is so slow, and single-threaded, the disk throughput should not become a bottleneck here.

I really like Kbackup, it's just this one thing that makes it kinda painful. If there's anything I could do to help, please do let me know. I used to work in software development, and while I'm quite inexperienced with KDE such that I would not be able to provide a solution single-handedly, I might be able to assist someone else if you would like? I am retired disabled, so I have some time on my hands to assist.

PS: On looking into this, I have discovered that the 84GB file which I just spent two hours waiting to compress, is not even included in the backup. The 16GB file is, the 1GB file is, but the 84GB file is simply missing. I will file a separate bug for this, as this is a serious problem.

Comment 1 Martin Koller 2023-12-31 16:16:48 UTC

Yes, using just one thread for this huge files is the bottleneck.
In my local test backing up a 65GB file took more than 5 hours. Just for a cross-check, the "xz" tool alone would also take that long (when not
explicitely told to use multiple threads).

The current implementation uses the KDE class KCompressionDevice, which seems to  not leverage multiple threads,
so this needs to be implemented either in kbackup or in some way in the KDE classes used.

I suggest you better don't use compression when having huge files.

Comment 2 pallaswept 2024-01-01 09:40:13 UTC

(In reply to Martin Koller from comment #1)
> Yes, using just one thread for this huge files is the bottleneck.
> In my local test backing up a 65GB file took more than 5 hours. Just for a
> cross-check, the "xz" tool alone would also take that long (when not
> explicitely told to use multiple threads).
> 
> The current implementation uses the KDE class KCompressionDevice, which
> seems to  not leverage multiple threads,
> so this needs to be implemented either in kbackup or in some way in the KDE
> classes used.
> 
> I suggest you better don't use compression when having huge files.

Hi Martin,

Thanks so much for looking into this for me! I hope you are having a very Happy New Year celebration at the moment :)

Thankyou also for putting in such an effort, so much time spent to compress that file, it is very kind of you. 

Yes, for the time being I think I will disable compression for this profile with extremely large files, and I will manually compress the resulting file using some other tool which will use a mutli-threaded approach, afterwards. Thanks for that advice!

In the longer term, should I be logging a different case against the KCompressionDevice, or should I leave this case open to track it, or perhaps something else? Let me know what you would like me to do going forward.