Bug 323718 - THUMBDB : rebuild all thumbnails does not get rid of all thumbnails first
Summary: THUMBDB : rebuild all thumbnails does not get rid of all thumbnails first
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Database-Thumbs (show other bugs)
Version: 3.2.0
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-19 12:51 UTC by Gerard Dirkse
Modified: 2017-02-08 14:17 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.5.0


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gerard Dirkse 2013-08-19 12:51:48 UTC
Noticed that I had more then 4 times as many thumbnails as I have images.  Looking in Filepaths and CustomeIdentifiers I see reference to old file names, so decided to do a FULL thumbnail rebuild, in the expectation that it would delete everything in in the thumbs database and then start rebuilding it.  After rebuild still more then 4 times as many thumbnails as images and still reference to old filenames.

Reproducible: Always

Steps to Reproduce:
1. Have images on a NFS mounted share,
2. Change the mount point and update AlbumRoots in digikam DB
3. Start up Digikam
4. There will now be thumbnails which still have old mount point in filepath, and there will be lots more thumbnails then there are images.

Actual Results:  
Rebuild all thumbnails has NO effect.  Old rubbish remains in thumbnail DB

Expected Results:  
Clean thumbnail DB with as many thumbnails as I have images.
Comment 1 caulier.gilles 2013-08-19 12:55:28 UTC
Probably your DB file have been corrupted. I never reproduce this problem here. I currently work on Maintenance tool to support Multicore CPU. I run a lots of test on huge image collection...

Gilles Caulier
Comment 2 Gerard Dirkse 2013-08-19 12:59:26 UTC
1) So why do I have 4 times the number of thumbnails then I have images ? (may have been caused by my renaming actions)
2) Why does rebuild ALL thumbnails not first delete COMPLETE content of tables in thumbs ? (that would solve it).
Comment 3 caulier.gilles 2013-08-19 13:07:41 UTC
1) So why do I have 4 times the number of thumbnails then I have images ? (may have been caused by my renaming actions)

Or it's versionning feature. If you edit and save as new version, a new file is created and show as current version of item in icon-view. All others previous version are cached from icon-view, excepted if you turn off right option from Setup dialog.

For each version file, one thumbnail is created... 

2) Why does rebuild ALL thumbnails not first delete COMPLETE content of tables in thumbs ? (that would solve it).

It must. item deletion in DB is performed item by item in fact.
Comment 4 Gerard Dirkse 2013-08-19 13:20:35 UTC
Very rarely I use versioning, 9 out of 10 times, I choose overwrite existing version, so that does not explain the more then 4 times number of thumbnails then there are images.

Browsing (using phpMyAdmin, DB is in MYSQL) the thumbs DB before and after the rebuild action in tables Filepaths and Customidentifiers (field path and identifier respectively) I see references to files that don't exist any more, either as a result moving the NFS mount point, but also from files that have been renamed using the digikam image rename option.

I would have expected rebuild ALL thumbnails to start with with a 'DELETE * from ..' each and every table in the thumbs DB and start repopulating them as a result of the rebuild. That will leave a clean thumbs DB after a rebuild ALL.
Comment 5 Gerard Dirkse 2013-08-19 13:23:10 UTC
You say 'It must. item deletion in DB is performed item by item in fact.', that is then the bug, because all the cases I mention, i.e. where image with that name no longer exists will not get deleted.
Comment 6 caulier.gilles 2013-08-19 13:29:06 UTC
Marcel, 

There is a way to clean up thumbs DB before to rebuild all thumbnails ?

Currently, i use this method : 

https://projects.kde.org/projects/extragear/graphics/digikam/repository/revisions/master/entry/utilities/maintenance/thumbstask.cpp#L80

Gilles Caulier
Comment 7 caulier.gilles 2013-12-06 22:50:05 UTC
Marcel,

Do you see my previous comment ?

Gilles Caulier
Comment 8 Gerard Dirkse 2013-12-07 06:51:41 UTC
In the meantime I developed some php programs to go through the database of images and thumbnails and eliminated almost 75% of my number and size of thumbnails.  Greatest gain was achieved by using the Uniquehashes table to eliminate all entries and associated thumbnails from this table where there was no uniquehas/filesize combination in the images table.  Havent figured out yet were all these obsolete entries cam from.
Comment 9 Marcel Wiesweg 2013-12-07 15:02:08 UTC
Gilles, would you simply like to clean out all thumbnails? In SQL, that's simply "DELETE FROM Tumbnails" to delete all thumbnail data. The trigger should clean the rest of the tables. 
If we want a sort of garbage collector, it would need to be something along what Gerard has developed, checking that a uniqueHash/filesize identifier from the albumDB still exists in the main database.
Comment 10 caulier.gilles 2013-12-07 15:12:04 UTC
Marcel

as you can see in code from thumbnailtask.cpp:line 84 :

d->catcher->thread()->deleteThumbnail(d->path);

We only delete valid previous file registered in DB. It do not clear all other dummy entries.

A garbage collector can be a powerful tool to prevent to rebuild all items. but for each album to process, we can clean all items, including all garbage entries... this can be most simple to implement. There is no method implemented in this way currently. Right ?

Note : the real question here is why garbage entries are present in DB. When an item is removed or disappear, thumb is DB is not removed automatically ?

Gilles
Comment 11 Marcel Wiesweg 2013-12-08 18:54:57 UTC
Thumbnails are primarily loaded via hash/file size. So in principle, whenever file contents or file size change, there can be a leftover entry in the database. These can only be found via "go through thumbnail db -> check if it exists in main db".
Today, when digikam changes file content, the thumbnail reuse/replacement is often managed, but probably not from all places. It cannot be managed when an external tool does the change. So a classical case for a garbage collector.
Comment 12 swatilodha27 2016-08-08 14:27:41 UTC
Is the file still valid using digiKam 5.1.0?
Please test and provide necessary updates.
Comment 13 caulier.gilles 2016-12-01 13:52:19 UTC
Can you reproduce the problem using digiKam Linux AppImage bundle ? The last
bundle is available at this url:

https://drive.google.com/drive/folders/0BzeiVr-byqt5Y0tIRWVWelRJenM

Gilles Caulier
Comment 14 Mario Frank 2017-02-02 08:57:40 UTC
There is a patch that introduces garbage collection as maintenance stage before thumbnail rebuild here: https://bugs.kde.org/show_bug.cgi?id=374591 .
This could solve your problem. But be advised: The patch is still in testing phase. So, backup your databases before you test.
Comment 15 caulier.gilles 2017-02-04 14:08:39 UTC
New 5.5.0 AppImage is done with garbage database collector patches.
Uploading to GDrive is under progress. It will be online in few minutes at usual
place :

https://drive.google.com/drive/folders/0BzeiVr-byqt5Y0tIRWVWelRJenM

New database Garbage Collector options are there :

https://www.flickr.com/photos/digikam/32549923912/in/dateposted-public/
https://www.flickr.com/photos/digikam/32549923632/in/dateposted-public/

Gilles Caulier
Comment 16 Mario Frank 2017-02-08 14:17:51 UTC
Git commit a1f67531b3269941df5ff531baa3e487cf58f1fc by Mario Frank.
Committed on 08/02/2017 at 14:09.
Pushed by mfrank into branch 'master'.

Merged the garbage collection into master. The garbage collector is a maintenance stage that runs before the rebuild of thumbnails and must be triggered explicitely.
It removes stale image entries in core db and if enabled also stale thumbnails and face identities from thumbnails and recognition DB.
If configured so, the core DB part of the garbage collector removes stale image entries in core db during the start of digiKam.
Note that cleaning the databases does not necessarily make them smaller as no auto-vacuum is done on the databases. The vacuuming proces differs highly between the three
supported database variants (SQLite, internal MySQL and external MySQL). Thus, currently there is no automatism.
Related: bug 374591
FIXED-IN: 5.5.0

M  +3    -1    NEWS

https://commits.kde.org/digikam/a1f67531b3269941df5ff531baa3e487cf58f1fc