Summary: | Smart detection whether file was been already downloaded | ||
---|---|---|---|
Product: | [Applications] digikam | Reporter: | Cristian Klein <cristiklein> |
Component: | Import-Gphoto2 | Assignee: | Digikam Developers <digikam-bugs-null> |
Status: | REPORTED --- | ||
Severity: | wishlist | CC: | caulier.gilles, kde_org, konrad.kostecki, nicofo, tpr |
Priority: | NOR | ||
Version: | 7.3.0 | ||
Target Milestone: | --- | ||
Platform: | unspecified | ||
OS: | All | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Cristian Klein
2013-04-27 23:33:13 UTC
The problem you encounter sooner or later (with gphoto cameras sooner than with UMS cameras) is that the time you need to compute the hash, by accessing the Exif data, will be disproportional to the gained functionality. Regarding the use of make, model and name, let's have a look at the DownloadHistory database header file: /** * Queries the status of a download item that is uniquely described by the four parameters. * The identifier is recommended to be an MD5 hash of properties describing the camera, * if available, and the directory path (though you are free to use all four parameters as you want) */ static Status status(const QString& identifier, const QString& name, qlonglong fileSize, const QDateTime& date); For me all points are very minor problems, yes we could make wild guesses that pictures on the camera were already downloaded based on some parameters, yet file name is not useful as there can be renames, file size is not useful as metadata can have been edited, date alone is by far too weak. Hi Marcel, Let me address your comments inline. On 2013-04-28 18:46, Marcel Wiesweg wrote: > The problem you encounter sooner or later (with gphoto cameras sooner than with > UMS cameras) is that the time you need to compute the hash, by accessing the > Exif data, will be disproportional to the gained functionality. I'm not sure I agree with this. When importing photos through UMS, the user is presented with a preview of each photo, so very likely the EXIF tag is already read in by digiKam. Even if the EXIF tag is for some reason not read by digiKam (e.g., using seek), the kernel will cache whole disk blocks (usually 4KB in size), therefore, reading the EXIF tag would have a minimum performance impact. I have already presented several use-cases when smart "already-downloaded" detection would help, so I don't find the cost disproportional. I'm not sure what would be the performance impact for gphoto cameras. Isn't the EXIF metadata read in anyway as part of image preview? > Regarding the > use of make, model and name, let's have a look at the DownloadHistory database > header file: > /** > * Queries the status of a download item that is uniquely described by the > four parameters. > * The identifier is recommended to be an MD5 hash of properties describing > the camera, > * if available, and the directory path (though you are free to use all > four parameters as you want) > */ > static Status status(const QString& identifier, const QString& name, > qlonglong fileSize, const QDateTime& date); For UMS, "identifier" depends on the media ID and not on the photo metadata. Therefore, if I receive the same photo through two source, DownloadHistory will mark the photo incorrectly as not-previously-downloaded. For me, this is cumbersome. > For me all points are very minor problems, yes we could make wild guesses that > pictures on the camera were already downloaded based on some parameters, yet > file name is not useful as there can be renames, file size is not useful as > metadata can have been edited, date alone is by far too weak. I agree that for legacy cameras, this might be difficult. However, like I wrote, newer cameras include a "unique photo ID" (something like a UUID) in the EXIF tags of each photo. Users might already have access to such cameras (I do), why not take advantage of it? I think EXIF is already read for all the photos, at least partially at some point, so this could be possible. If there's wide support for this, we could use that as a hash and fallback to our current calculation. Nevertheless I wasn't able to find any photos from my collection having anything this unique, do you have some samples? *** Bug 412999 has been marked as a duplicate of this bug. *** digiKam 7.0.0 stable release is now published: https://www.digikam.org/news/2020-07-19-7.0.0_release_announcement/ We need a fresh feedback on this file using this version. Best Regards Gilles Caulier *** Bug 435680 has been marked as a duplicate of this bug. *** (In reply to caulier.gilles from comment #5) > We need a fresh feedback on this file using this version. I'd vote for still very nice to have. Problem described by Cristian in 2013 is still valid. I'll also allow myself to copy my input from duplicated ticket: > There is already a setting in camera/import behaviour section to skip/replace/create_copy files in case they already exist in target location. Can we do this more reliable to classify file as already existing basing not only on its filename but also other attributes like at least file size? > I can imagine situation that you reset digikam, clean home directory or reinstall operating system and with fresh digikam instance you perform import of SD card which contain files already downloaded. Fingerprint history is empty, but files are already in target location. By default digikam import just creates duplicates with a different name. In case, just before downloading, it recognizes that filename and sizes are identical it could suggest skipping those files if there would be an option for that. I think it would be handy. I'm not talking here it should be a default setting, definitely not, but in some cases it could be very helpful. My take on this: the `import images` dialog indicates that `This item has never been downloaded` while the specified image file is known (e.g. in an album). I have noticed that the `DownloadHistory` table only contains a fraction (approx. 900) of the number of actual images (approx. 120K). This album was added as a collection from removable media. The download option is therefore confusing as after processing all duplicate images are still indicated as `.. never been downloaded`. As the feedback is also vague (the progress window is quickly removed) it is unclear what actually has been done. I would expect that all those images would be known as being `downloaded` at the start but I surely would expect it to be indicated _after_ processing. The files _are_ identical with name, size, date so I see no reason why those could not be added to the `DownloadHistory`. Because of this the option to `Download new` will always try to download everything, when there are duplicates offered, which can be time consuming and waste of time. v8.5.0 on MacOS 15.2 |