Summary: | Accelerating writing metadata back to image files | ||
---|---|---|---|
Product: | [Applications] digikam | Reporter: | Gerhard Kulzer <gerhardk> |
Component: | Metadata-Engine | Assignee: | Digikam Developers <digikam-bugs-null> |
Status: | RESOLVED UPSTREAM | ||
Severity: | wishlist | CC: | ahuggel, althio.forum, aspotashev, axel.krebs, caulier.gilles, toddrme2178 |
Priority: | NOR | ||
Version: | 2.1.1 | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | 7.5.0 | |
Sentry Crash Report: | |||
Bug Depends on: | 188925 | ||
Bug Blocks: |
Description
Gerhard Kulzer
2011-09-29 07:15:57 UTC
This is fully relevant of exiv2. I dont even know anything the details of metadata writing. Andreas Huggel from Exiv2 is in copy for more details. Gilles Caulier The Exiv2 write logic is optimized based on the image format and the kind of changes to the metadata. The two classes of image formats are TIFF-like images, where the metadata is not in a specific portion of the image but potentially spread over the entire image (image == metadata) and images which keep the metadata in a specific portion of the file (e.g., JPEG, PNG). The type of changes distinguish "intrusive" and "non-intrusive" changes. If any metadata tags are added, deleted or an existing metadata field is extended, the change is intrusive and requires Exiv2 to re-serialize the entire metadata structure. If an existing field is changed and its size is not extended (it can shrink), then Exiv2 makes the change in-place, without rewriting the entire metadata. This has the considerable advantage that the TIFF structure stays intact, even if Exiv2 can't parse it. A typical examples for a non-intrusive change is changing the Exif date/time of an image. Writing works as follows: intrusive non-intrusive ------------ ------------- TIFF-like : copy mmap Metadata block : copy copy* "copy" means the file is re-written and re-named (its size changes) "mmap" means the file is changed in-place (the file size remains the same) * In this case, the metadata structure is changed in-place but the file is copied and in the process, the new metadata block is inserted. The only further optimization I can see is that in the case of images with a metadata block and non-intrusive changes, it would be possible to change the entire file in-place rather than only the metadata block. For additional considerations (memory related), see http://dev.exiv2.org/issues/617 How does rsync work? Does it really operate on portions of a file (not only modified files + compression)? -ahu. First, thank you very much Andreas for this detailed explanation, it's good to memorize this one. Concerning the rsync mechanisms, I found this description on the Wikipedia site of rsync: "The rsync utility uses an algorithm invented by the Australian computer programmer Andrew Tridgell for efficiently transmitting a structure (such as a file) across a communications link when the receiving computer already has a similar, but not identical, version of the same structure. The recipient splits its copy of the file into fixed-size non-overlapping chunks and computes two checksums for each chunk: the MD4 hash, and a weaker 'rolling checksum'. (Version 30 of the protocol, released with rsync version 3.0.0, now uses MD5 hashes rather than MD4.[14]) It sends these checksums to the sender. The sender computes the rolling checksum for every chunk of size S in its own version of the file, even overlapping chunks. This can be calculated efficiently because of a special property of the rolling checksum: if the rolling checksum of bytes n through n + S − 1 is R, the rolling checksum of bytes n + 1 through n + S can be computed from R, byte n, and byte n + S without having to examine the intervening bytes. Thus, if one had already calculated the rolling checksum of bytes 1–25, one could calculate the rolling checksum of bytes 2–26 solely from the previous checksum, and from bytes 1 and 26. The rolling checksum used in rsync is based on Mark Adler's adler-32 checksum, which is used in zlib, and is itself based on Fletcher's checksum. The sender then compares its rolling checksums with the set sent by the recipient to determine if any matches exist. If they do, it verifies the match by computing the hash for the matching block and by comparing it with the hash for that block sent by the recipient. The sender then sends the recipient those parts of its file that did not match the recipient's blocks, along with information on where to merge these blocks into the recipient's version. This makes the copies identical." There is a longish but nice interview with Andrew Tridgell, the creator of rsync here: http://oceanpark.com/webmuseum/rsync.html So it works on blocks, which seem to be chunks of 500-1000 bytes (as I read on various sources). Anyways, judging from the logs I get from rsyncing, the change size is usually less than 1% of an image, and that may contain several blocks of course. Note this file depends of #188925 for few points... Gilles Caulier Note : writting metadata from Maintenance tool use now parallelized threads if you have multi-core CPU. This will increase a little bit the speed of process to write metadata on files. But the lead problem here, if i'm not too wrong still in Exiv2 shared library... Gilles Caulier *** Bug 252494 has been marked as a duplicate of this bug. *** This file is definitively an UPSTREAM entry which much be reported to Exiv2 bugzilla, as low level writing metadata to files are processed in background by Exiv2. Gilles Caulier *** Bug 269467 has been marked as a duplicate of this bug. *** |