Bug 451816 - improper usage of ZSTD_compressStream2 API
Summary: improper usage of ZSTD_compressStream2 API
Status: RESOLVED FIXED
Alias: None
Product: frameworks-karchive
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: Other Other
: NOR normal
Target Milestone: ---
Assignee: David Faure
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-23 09:31 UTC by toolinfo
Modified: 2022-05-31 08:53 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description toolinfo 2022-03-23 09:31:11 UTC
When used through KCompressionDevice, compression using Zstd format is not efficient.

Root cause is the following line in API KZstdFilter::Result KZstdFilter::compress(bool finish)  in file kzstdfilter.cpp

Line 126:
    const size_t result = ZSTD_compressStream2(d->cStream, &d->outBuffer, &d->inBuffer, finish ? ZSTD_e_end : ZSTD_e_flush);

It shall be changed into :

    const size_t result = ZSTD_compressStream2(d->cStream, &d->outBuffer, &d->inBuffer, finish ? ZSTD_e_end : ZSTD_e_continue);

To get compression working correctly.
Comment 1 Albert Astals Cid 2022-05-30 12:30:56 UTC
Can you explain why?

Give some test? some reasoning?
Comment 2 toolinfo 2022-05-30 12:46:44 UTC
Hi,
From faceboot zstd Q&A:

https://github.com/facebook/zstd/issues/1360

Comment is: 

using ZSTD_e_flush will force the compressor to emit a block immediately. In general, this requirement is for communication scenarios, when some data must be sent "now".

Because it increases the nb of blocks produced, this technique can be associated with some lower compression ratio. Therefore, whenever there is no need to send data immediately, it's better (for compression ratio) to let the data accumulate in the internal buffers, and let the compressor decide when it's best to output a new block.

As stated above:

Using flush in a device is not a good idea. It result in a flush for each pieces of data that is send to the device using << operators, which breaks compression. That is why i suggest to use ZSTD_e_continue instead.

I am using it regularily for my project and without the changes, the API is just not working at all (Not compression anything).

Regards
Comment 3 toolinfo 2022-05-30 12:50:25 UTC
Usage model is:
@code
    KCompressionDevice compressor(&compressedModule, false, KCompressionDevice::Zstd);
    if (compressor.open(QIODevice::WriteOnly)) {
      QDataStream stream(&compressor);
      stream << [My class: makes many calls to << operator for each component containers part,...] ;
      compressor.close();
      compressedModule.close();
    }
@endcode
Comment 4 Bug Janitor Service 2022-05-30 21:54:29 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/karchive/-/merge_requests/38
Comment 5 Albert Astals Cid 2022-05-31 08:53:00 UTC
Git commit 028802998b11163dd761b579000f5cb89fab18d3 by Albert Astals Cid.
Committed on 31/05/2022 at 08:47.
Pushed by aacid into branch 'master'.

Fix zstd KCompressionDevice not compressing as much as it could

M  +37   -4    autotests/kfiltertest.cpp
M  +1    -1    autotests/kfiltertest.h
M  +1    -1    src/kzstdfilter.cpp

https://invent.kde.org/frameworks/karchive/commit/028802998b11163dd761b579000f5cb89fab18d3