Bug 255478 - digikam database tables only grow in size and get never cleaned
Summary: digikam database tables only grow in size and get never cleaned
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Maintenance-Database (show other bugs)
Version: 1.5.0
Platform: openSUSE Linux
: NOR major
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-28 09:33 UTC by Roman Fietze
Modified: 2017-07-26 07:06 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.7.0


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Roman Fietze 2010-10-28 09:33:15 UTC
Version:           1.5.0 (using KDE 4.5.2) 
OS:                Linux

The database tables for the image information always just grow, but never get cleaned when images are removed.

The MySQL Images table e.g. just gets the album field set to NULL.

When one adds and removes images a lot, this causes very big databases after a while.

Reproducible: Didn't try

Steps to Reproduce:
E.g. create a few hundred images, let digikam scan or find them. Then remove the images again.


Expected Results:  
Either there should be a function for database management that clears out unused entries, or digikam should do this right away when images do no longer exist or are deleted inside digikam.

Tested it with both MySQL and SQLite.
Comment 1 caulier.gilles 2010-10-28 09:36:06 UTC
Thanks Roman...

Gilles Caulier
Comment 2 Johannes Wienke 2010-10-28 11:16:20 UTC
I think one idea not to delete this immediately is for the case that you restore a photo from the trash. But a manual action is a good idea. Also to remove generated fingerprints ;)
Comment 3 Roman Fietze 2010-10-28 12:58:23 UTC
(In reply to comment #2)

> I think one idea not to delete this immediately is for the case that you
> restore a photo from the trash.

This was my first thought, I assumed you implemented it this way just to allow this.

> But a manual action is a good idea. Also to
> remove generated fingerprints

Or using a timestamp and remove aged (adjustable?) entries automatically, similar to the expire feature e.g. for mail folders in KMail.
Comment 4 Martin Klapetek 2010-10-28 13:45:36 UTC
Yes, currently it stays there for the case of restoration/copying again. This speeds up the processing then. But I agree with Roman to add a time expire feature. That would be great.
Comment 5 Marcel Wiesweg 2010-10-28 15:46:36 UTC
Idea and code is there for two years - CollectionScanner::checkDeleteRemoved() etc.
I have the impression though that there is a bug and the cleanup is never triggered ;-)
Comment 6 Andi Clemens 2010-11-14 16:53:18 UTC
Marcel, the problem seems to be the following line:

http://lxr.kde.org/source/extragear/graphics/digikam/libs/database/collectionscanner.cpp#270

I guess this can not work. The passed in albumID list will not contain albums that have been deleted, right?
So we either need to change the query in void AlbumDB::deleteRemovedItems(QList<int> albumIds):

SqlQuery query = d->db->prepareQuery( QString("DELETE FROM Images WHERE status=? AND (album=? or album IS NULL);") );

or use the version without the parameter:
void AlbumDB::deleteRemovedItems()

This seems to work for me.

Andi
Comment 7 Andi Clemens 2010-11-14 17:14:01 UTC
Marcel,

I guess the 
AlbumDB::deleteRemovedItems(QList<int> albumIds)
method will never work here, because when an image is delete, the album id is always NULL, so the query in this method can never clean up the database!

Andi
Comment 8 Andi Clemens 2010-11-14 17:21:06 UTC
Marcel, 

there seems to be another problem: When I delete an album, and recreate it (move it back from the trash or the other location I moved it in to), I can see two entries for my images:

167837||bla.jpg|3|1|2008-12-31T19:15:31|303900|5e5f380714166b45f9f8379a50433e5f                                                                                                                                              
167902|5690|bla.jpg|1|1|2008-12-31T19:15:31|303900|5e5f380714166b45f9f8379a50433e5f     


It seems like the entry has not been updated, although it should be, right? Because otherwise the same image has a different ID now, I guess this does matter, right?

This could explain my problems with thumbnails, sometimes a wrong thumbnail is displayed for restored images, and I can not fix it, only by removing the entry from the thumbnails-database.


Andi
Comment 9 Andi Clemens 2010-11-14 17:22:29 UTC
Marcel,

I CC'd you, just in case you don't read this .... :-)

Andi
Comment 10 Roman Fietze 2010-11-14 17:43:36 UTC
(In reply to comment #8)

> This could explain my problems with thumbnails, sometimes a wrong thumbnail is
> displayed for restored images, ...

I can reproduce this one as well, but I didn't know those problems are probably related, so I wanted to do some more testing before writing a bug report.
Comment 11 Marcel Wiesweg 2010-11-15 10:01:57 UTC
> 167837||bla.jpg|3|1|2008-12-31T19:15:31|303900|5e5f380714166b45f9f8379a50433e5f 
> 167902|5690|bla.jpg|1|1|2008-12-31T19:15:31|303900|5e5f380714166b45f9f8379a50433e5f 
> 
> 
> It seems like the entry has not been updated, although it should be, right?
> Because otherwise the same image has a different ID now, I guess this does
> matter, right?

This is all right, the process is like this:
A new file is found, added to the database and the basic parameters (like the hash) scanned. Then, the scanner sees that there exists an old entry with the same hash/file size, and decides, instead of rescanning everything, just to copy the information from that old file.

I'm not sure about the thumbnails problem, but it could be related to the fact that thumbnails are referenced by file path additionally to hash.
If there is a filename a.jpg with a hash h_a and the thumbnail shown is of file b.jpg with hash h_b, we'd need to find out
- which thumbnail images are referenced by h_a and h_b
- which thumbnail images are referenced by a.jpg and b.jpg
- is there a mismatch, like a.jpg, b.jpg and h_b point to the same thumbnail while h_a does not point to anything?
Comment 12 Marcel Wiesweg 2010-11-15 10:33:01 UTC
SVN commit 1197260 by mwiesweg:

No sense in passing album ids to delete removed entries.
Andi, I think you tested this? Then we can close the bug.

CCBUG: 255478

 M  +3 -0      albumdb.cpp  
 M  +0 -6      albumdb.h  
 M  +1 -1      collectionscanner.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1197260
Comment 13 Andi Clemens 2010-11-15 10:35:06 UTC
Yes I tested it and it worked fine, I will close the bug now.
Comment 14 Andi Clemens 2010-11-15 11:09:49 UTC
A positive side effect (at least with sqlite) is that now the AlbumUI is much more responsible and thumbnails are loaded way faster, sure, my "images" table is only 12% of its original size now :-)
Maybe this fix will also solve some problems were people describe slowdowns in loading and using digiKam.
Comment 15 caulier.gilles 2017-07-26 07:06:14 UTC
Since digiKam 5.6.0, we have a database maintenance tool to check integrity and clean tables.