Summary: | Uniquely identifying each image in a collection of images | ||
---|---|---|---|
Product: | [Applications] digikam | Reporter: | Duncan Hill <kdebugs> |
Component: | Database-Schema | Assignee: | Digikam Developers <digikam-bugs-null> |
Status: | RESOLVED FIXED | ||
Severity: | wishlist | CC: | caulier.gilles, kde |
Priority: | NOR | ||
Version: | 0.9.0 | ||
Target Milestone: | --- | ||
Platform: | unspecified | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | 2.5.0 | |
Sentry Crash Report: |
Description
Duncan Hill
2006-04-17 15:14:57 UTC
ad 3: see also wish #110066 *** Bug 110066 has been marked as a duplicate of this bug. *** I think it won't be very easy to manage md5sum because if metadatas are saved inside a picture (iptc/exif), file md5sum will change each time you modify comment/tag/rating... The best would be to do the hash only on the pixel data. But I don't know if it's easy/possible to do it or not. Fabien, Well, if a metadata is changed into image file, we just need to update the MD5 stored in the database at the same time. Gilles I had to de-dupe my 70 GB archive of photos the other day, so I've got the technique sorted for finding duplicates. Now to find the time to code it! Started looking at using the KDE MD5 framework to execute MD5 in the io-slave that loads the files into the database, but found that was the wrong spot. Doubt I'll get to it over Christmas, but who knows what can happen while I'm at work between Christmas and New Years :) Yes, but I was thinking about "Trivial duplicate finding" and "Parent-child relationships"... Yes finding a dupplicate picture is not really integrated in digiKam with FindDuplicate kipi-plugin. A possiblility to find duplicate image is to use the "Haar" algorithm witch is based on wavelet theory. This one is more powerfull than MD5 and is used into ImgSeek program : http://imgseek.cvs.sourceforge.net/imgseek/imgseek/imgSeekLib/haar.cpp?revision=1.10&view=markup You can see a paper about Haar stuff here : http://scien.stanford.edu/class/ee368/projects2001/dropbox/project01/method_retrieval.html http://en.wikipedia.org/wiki/Haar_wavelet In the past, before to create the FindDupplicate plugin (witch is based on another algorithm used in ShowImg core program), i have proposed to integrate "Haar" algorithm in digiKam core and store the result matrix for each image in database, but Renchi have been opposed because it out of digiKam subject. I think it different now. What do you think about ? Gilles > I think it different now. What do you think about ?
Haar - I am for it (as a user ;) . Not only sure if Digikam is the best
place for this, maybe extragear/libs? You planned to export there exiv2
support, weren't you?
m.
Amarok has collected a lot of experience with uniquely identifying files: http://amarok.kde.org/wiki/Advanced_Tag_Features_(ATF) is the basic technology, http://amarok.kde.org/wiki/Dynamic_Collection adds support for removable media (CD, USB stick, NFS) I didn't really get what type of result you would get and store in the database, but it could be a base for nice features I guess. So, I'm up for it too :) *** This bug has been confirmed by popular vote. *** I'm new to digikam so I didn't know it saves only the filenames in the db, but today I had to find it out by accident :( I renamed some folders and files with krename because the rename dialog from the kipi plugins isn't that powerful. After renaming the files I started digikam and all my tags where wrong! For example: A = "2007_jan_birthday_001.jpg" B = "2007_jan_birthday_101.jpg" A was renamed as B. After that B has all tags from a file prior named as B, A has no tags at all... It would be very helpful to save md5sums in the database so this would not happen. I used kphotoalbum before and was used to move my files around or rename them with external programs. But now it isn't possible anymore. Am Tuesday 05 June 2007 schrieb Andi Clemens: [bugs.kde.org quoted mail] Notwithstanding your wish for unique ID of images, if you check in the digikam setup 'save tags to IPTC' your tags will be save in the files and re-imported when moved outside digikam. Gerhard I would love to have this feature (not haar-based) but based on pixel data. Also, my dream would be if this unique id is stored as meta-data and converted to a machine tag when i upload to flickr, this would solve my problem of having many duplicate images on flickr. Also, the pixel-data tagging could be optimized for jpeg by using the compressed image data! Regards, mark Marcel, What's about this file and new Advanced Search Tool + new Database Schema ? Gilles Caulier Marcel, Do you see my comment #15 ? Gilles Caulier Yes we have checksum, based on file data, not on image contents. And files which are exact copies are very easy to find with this. Ok, so i close this file as fixed now. Gilles Caulier |