Bug 210353 - digikam duplicates icons for TIFF iles
Summary: digikam duplicates icons for TIFF iles
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Albums-IconView (show other bugs)
Version: 1.0.0
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-12 19:18 UTC by Paweł Rumian
Modified: 2017-07-29 05:27 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In: 1.7.0


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paweł Rumian 2009-10-12 19:18:12 UTC
Version:           1.0.0_beta5 (using KDE 4.3.2)
Compiler:          gcc-4.3.2 gcc (Gentoo 4.3.2-r3 p1.6, pie-10.1.5) 4.3.2
OS:                Linux
Installed from:    Gentoo Packages

Icons for TIFF files are duplicated and assigned to wrong files.
An example can be seen on the following screenshot:
http://img383.imageshack.us/img383/2323/digikamicons.jpg

The repeating two icons belong to photos from totally different direcory, and so neither of the icons seen on the screenshot represents the underlying image.

The names of the TIFF images have constant pattern:
Negative0-02-01(1).tif
Negative0-02-01(1).tif
...
Negative6-38-37(1).tif

The repeating icons have names Negative5-31-30(1).tif and Negative6-38-37(1).tif, they come from the directory 
ALBUMS/2006/06 - Parafiada/

As far as I can see the JPG images are not affected.
Comment 1 Paweł Rumian 2009-10-12 22:52:29 UTC
Of course I meant thumbnails, not icons, sorry if that was misleading in any way (but I hope that the screenshot is self-explanatory)...
Comment 2 Marcel Wiesweg 2009-10-17 17:16:57 UTC
What happens if you change the modification date of one of the affected images:
touch Negative....tif
?
Comment 3 Paweł Rumian 2009-10-20 00:09:04 UTC
OK, I've got some results, but no resolution yet.

After doing simple
$ find . -name *.tif -exec touch {} \;
the thumbnails changed, but are still duplicated.

Adding 'sleep 1' between touches hasn't changed much.
I will try to examine it further...
Comment 4 Paweł Rumian 2009-10-20 00:32:08 UTC
It seems that digikam takes the most recent .tif thumbnail and assigns it to many other images, but not always.

Sometimes few thumbnails are unaffected, but only until the next .tif file has its ctime changed.

Neverthless, I can't yet see how the modification of ctime affects the generated (and duplicated) thumbnails.
Comment 5 Marcel Wiesweg 2009-11-07 16:05:00 UTC
Remote debugging is difficult.
Can you send me sample pictures? I need at least two for this problem. More would be better, dont know how large the files are. If necessary you can send them by private mail.
Comment 6 Paweł Rumian 2009-11-11 16:06:10 UTC
Private mail has been sent - I hope you'll be able to reproduce the problem...
Comment 7 Paweł Rumian 2009-12-03 19:35:21 UTC
Were you able to reproduce the bug?
Comment 8 Marcel Wiesweg 2009-12-04 18:39:09 UTC
Yes, I can reproduce.
The problem is that all images
- have exactly the same file size
- contain bit by bit the same metadata
- have the same creation date (none in metadata)
- have the same first 8k of data, bit by bit.

That is enough to make digikam believe it's all the same file...
Not sure about a good solution. When creating a list of criteria as above, someone will come who has created files that slip through the loopholes.
Comment 9 Johannes Wienke 2009-12-04 18:42:31 UTC
Why are these criteria needed? Isn't a file uniquely identified by it's path on the disk?
Comment 10 Marcel Wiesweg 2009-12-05 12:39:40 UTC
You can move, copy or rename files anytime, you can have backup collections and thus multiple times the same picture in your collection. You can even completely screw up your collection settings, just add a new collection and no tag is lost. It's pretty reassuring.

For normal photos taken with a digital camera, the criteria are always sufficient. The problem here, with identical metadata and identical filesize (completely uncompressed?? not even lossless compression?) we are hitting a corner case. The additional problem is that obviously the first 8kb do not contain pixel data.
Comment 11 Paweł Rumian 2009-12-08 01:39:42 UTC
It is not a problem with photos from an ordinary camera, indeed.

But I have hit it several times when batch-scanning photos, and in these cases the severity of this bug is high - digikam becomes quite unusable, because one cannot see and identify photos before opening them...

Maybe we should consider identifying the photos by some kind of content-dependent criterium? Like md5sum or something similar?
Comment 12 Marcel Wiesweg 2009-12-09 18:41:43 UTC
It's a content-based hash, but not over the whole file, only over parts, more precisely, the first 8kb. It's assumed that within the first 8k image data is contained. That also fails for your pictures. So your peculiarities here include:
- apparently no compression, normally lossless compression already results in differing file sizes
- no metadata
- identical file content in first 8kb.

A possible solution is to extend the 8kb, or take other small data parts from the middle and end of the file. I must think about the implications of changing the hash creation.
Comment 13 Marcel Wiesweg 2010-08-28 17:11:32 UTC
You are not forgotten.

Exiv2 author Andreas Huggel has analyzed the files and indeed, the first 8kb are identical: There is a list of image strip pointers (5600 bytes) and strip sizes (same count, always same value). This takes up the first 12kb.

So the suggestion is: increase the value from in dimgloader.cpp 8192 to 102400 (100kB) for a workaround.
The problem is that there are now a lot of databases around with hashes, so changing this algorithm cannot be done just anytime. If we do that, then well prepared, or optionally.
Comment 14 Marcel Wiesweg 2010-12-10 13:20:00 UTC
SVN commit 1205197 by mwiesweg:

Implement uniqueHash V2.
The hash has now a very simple specification: First 100 kB, last 100 kB.
All problematic cases known to me are solved.

1) Any new database created with 2.0 will use the new hash.
   That means you cannot use it with 1.x.
2) Any upgraded database from 1.x will keep the old hash.
   That means you can use it in parallel with 1.x.
3) There is a button to carry out an explicit update on the 
   Database setup page, for those that want the new hash
   for an updated database
   When upgrading, the thumbnail database will be updated in parallel,
   so nothing is lost.
4) The HistoryImageId in the history XML in metadata will only use the V2
   hash, because it is effectively not possible to specify the generation of
   the old hash, while the V2 hash is easily specified.
   If you have an updated DB with V1 hash, the history image id may not always
   contain the hash. (it is optional)

BUG: 210353


 M  +38 -1     digikam/scancontroller.cpp  
 M  +8 -0      digikam/scancontroller.h  
 M  +32 -1     libs/database/albumdb.cpp  
 M  +10 -0     libs/database/albumdb.h  
 M  +52 -14    libs/database/collectionscanner.cpp  
 M  +2 -0      libs/database/collectionscanner.h  
 M  +3 -0      libs/database/imageinfo.cpp  
 M  +12 -1     libs/database/imagescanner.cpp  
 M  +6 -0      libs/database/imagescanner.h  
 M  +92 -23    libs/database/schemaupdater.cpp  
 M  +8 -2      libs/database/schemaupdater.h  
 M  +5 -0      libs/database/thumbnaildatabaseaccess.cpp  
 M  +1 -0      libs/database/thumbnaildatabaseaccess.h  
 M  +7 -0      libs/database/thumbnaildb.cpp  
 M  +2 -0      libs/database/thumbnaildb.h  
 M  +28 -0     libs/dimg/dimg.cpp  
 M  +15 -0     libs/dimg/dimg.h  
 M  +46 -2     libs/dimg/loaders/dimgloader.cpp  
 M  +1 -0      libs/dimg/loaders/dimgloader.h  
 M  +1 -1      libs/widgets/common/databasewidget.h  
 M  +75 -5     utilities/setup/setupdatabase.cpp  
 M  +5 -0      utilities/setup/setupdatabase.h  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1205197
Comment 15 Marcel Wiesweg 2011-01-03 10:33:41 UTC
*** Bug 259880 has been marked as a duplicate of this bug. ***
Comment 16 Marcel Wiesweg 2011-01-07 23:39:33 UTC
*** Bug 262452 has been marked as a duplicate of this bug. ***