Bug 362023 - Extremely slow metadata writing via maintenance
Summary: Extremely slow metadata writing via maintenance
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Maintenance-Metadata (show other bugs)
Version: 5.4.0
Platform: Debian unstable Linux
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-21 06:30 UTC by Simon
Modified: 2022-05-21 13:07 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In: 7.7.0


Attachments
Command line output of writing metadata to files. (492.08 KB, text/plain)
2016-04-21 06:32 UTC, Simon
Details
scancontroller.patch (904 bytes, patch)
2016-04-22 17:51 UTC, Maik Qualmann
Details
Startup of digikam and start of writing metadata with scancontroller.patch (75.08 KB, text/plain)
2016-04-22 22:41 UTC, Simon
Details
Command line output in the middle of writing metadat to files with scancontroller.patch (98.83 KB, text/plain)
2016-04-22 22:42 UTC, Simon
Details
scancontroller2.patch (1.89 KB, patch)
2016-04-27 19:14 UTC, Maik Qualmann
Details
Startup of digikam and start of writing metadata with scancontroller2.patch (175.61 KB, text/plain)
2016-04-28 17:50 UTC, Simon
Details
Command line output in the middle of writing metadat to files with scancontroller2.patch (195.43 KB, text/plain)
2016-04-28 17:51 UTC, Simon
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Simon 2016-04-21 06:30:55 UTC
I am writing tags that previously only existed in (sqlite) database to image metadata via the maintenance tool. This takes an enormous amount of time, I am currently at 10% after 3 days continuously running. The bottleneck is (obviously) disk io. Still this should take much less time (exiftool took about 3h to delete old tags of the same collection). The collection contains 100'000 items and probably about half of them are to be tagged.
What looks odd to me is, that throughout the process digikam.general reports that QFileSystemWatcher detected change in the folder that is currently written to by the metadata write. The occurrence of this is also not regular and at a fixed position within the log entries from writing metadata to tags, which suggest to me that the two things are separate.

As it is digikam that is writing to these folders and not an external programs, are these scans really necessary?
And if they are, can they be delayed till the maintenance tool finished writing to a directory.

I do not see an option to attach something. I will get a partial log into this bug report as soon as I found out how (or use a pastebin).

Reproducible: Always
Comment 1 Simon 2016-04-21 06:32:13 UTC
Created attachment 98490 [details]
Command line output of writing metadata to files.
Comment 2 Maik Qualmann 2016-04-21 19:04:23 UTC
Git commit 773f9361e5df5904c938ad9ee4cbb19acd1aa1f6 by Maik Qualmann.
Committed on 21/04/2016 at 19:03.
Pushed by mqualmann into branch 'master'.

fix absolute file path without symbolic links

M  +2    -2    libs/dmetadata/metaengine.cpp

http://commits.kde.org/digikam/773f9361e5df5904c938ad9ee4cbb19acd1aa1f6
Comment 3 Maik Qualmann 2016-04-21 19:19:52 UTC
This commit fixes only the writing of metadata for images which are linked via a symbolic link.

Maik
Comment 4 Maik Qualmann 2016-04-22 17:51:34 UTC
Created attachment 98521 [details]
scancontroller.patch

Can you try this test patch? And report how digiKam now behaves.

Maik
Comment 5 Simon 2016-04-22 22:40:08 UTC
Thanks for looking into this. I applied your patch. The results seems to be the same (maybe somewhat less frequent rescans). I attached the initial part of the log after startup, where redundant stuff is excluded (marked by [...]). The actual scanning starts at line 400. A second command line output is from later on during the scan.
Comment 6 Simon 2016-04-22 22:41:24 UTC
Created attachment 98527 [details]
Startup of digikam and start of writing metadata with  scancontroller.patch
Comment 7 Simon 2016-04-22 22:42:20 UTC
Created attachment 98528 [details]
Command line output in the middle of writing metadat to files with scancontroller.patch
Comment 8 Simon 2016-04-24 18:16:29 UTC
And the scan is now going clearly slower than before the patch. I am now 
running it almost two days and its at 3% only.
Comment 9 Maik Qualmann 2016-04-27 19:14:03 UTC
Created attachment 98651 [details]
scancontroller2.patch

That it is now slowly working because now images are processed with symbolic links.
Please try this patch. He also adds a time measurement.

Maik
Comment 10 Simon 2016-04-28 10:31:32 UTC
Do I apply this patch on top of the current HEAD or on top of your 
previous patch? I guess the first, but just to be sure.

On 27/04/16 21:14, Maik Qualmann via KDE Bugzilla wrote:
> https://bugs.kde.org/show_bug.cgi?id=362023
>
> --- Comment #9 from Maik Qualmann <metzpinguin@gmail.com> ---
> Created attachment 98651 [details]
>    --> https://bugs.kde.org/attachment.cgi?id=98651&action=edit
> scancontroller2.patch
>
> That it is now slowly working because now images are processed with symbolic
> links.
> Please try this patch. He also adds a time measurement.
>
> Maik
>
Comment 11 Simon 2016-04-28 17:50:59 UTC
Created attachment 98664 [details]
Startup of digikam and start of writing metadata with scancontroller2.patch
Comment 12 Simon 2016-04-28 17:51:59 UTC
Created attachment 98665 [details]
Command line output in the middle of writing metadat to files with scancontroller2.patch

I applied the patch and added the command line output in the same fashion as before.
Comment 13 Maik Qualmann 2016-05-02 19:14:42 UTC
This is are long waiting times, up to 5 seconds until a scan is completed for one image. Disabling the scanning does not help, he would be rescheduled in any case. Modification date or file size have changed and need to be updated in the DB. Writing to the SQLite DB is the time problem. The SQLite DB to put on an SSD drive is strongly recommended. Here are a few measured values, writing of one image information in the DB this include read new information from image (images on HDD - EXT4):

HDD:
SQLite: 180-270ms
internal MySQL: 40-70ms

SSD:
SQLite: 30-60ms

Are the images on an NTFS partition? Is also here the SQLite DB?

Maik
Comment 14 Simon 2016-05-04 14:46:58 UTC
Indeed, my setup is far from optimal for disk io. I have both the 
database and the images on a ntfs pratition of a hard disk on my laptop 
(at least not system hd). I thought that the database would be 
automatically cached in ram. I will look at it again some time.
Thanks again for your help.

On 02/05/16 21:14, Maik Qualmann via KDE Bugzilla wrote:
> https://bugs.kde.org/show_bug.cgi?id=362023
>
> --- Comment #13 from Maik Qualmann <metzpinguin@gmail.com> ---
> This is are long waiting times, up to 5 seconds until a scan is completed for
> one image. Disabling the scanning does not help, he would be rescheduled in any
> case. Modification date or file size have changed and need to be updated in the
> DB. Writing to the SQLite DB is the time problem. The SQLite DB to put on an
> SSD drive is strongly recommended. Here are a few measured values, writing of
> one image information in the DB this include read new information from image
> (images on HDD - EXT4):
>
> HDD:
> SQLite: 180-270ms
> internal MySQL: 40-70ms
>
> SSD:
> SQLite: 30-60ms
>
> Are the images on an NTFS partition? Is also here the SQLite DB?
>
> Maik
>
Comment 15 Bizy 2016-07-09 00:09:40 UTC
Same here (Ubuntu 16.4).

Already more than 4 hours to update (via 'Maintenance', tags database --> images) a folder with some 4000 images.  Memory use more than 4 Gb... Progression window still indicating 0%...
Guess that's not how it's supposed to be...

Workaround:  selecting all images and same command via 'Edit', takes 15 minutes...

PS:  If you want me to do something, please be very specific... (most of the conversation above is beyond my comprehension...)
Comment 16 caulier.gilles 2016-11-25 14:38:09 UTC
What's about this file using digiKam AppImage bundle 5.4.0 pre release given at this url :

https://drive.google.com/drive/folders/0BzeiVr-byqt5Y0tIRWVWelRJenM

Gilles Caulier
Comment 17 Simon 2016-11-25 23:30:52 UTC
Hi Gilles,

This problem is still the same. I reduced it for me by using internal
mysql database and preloading most of the database to memory.
When testing with appimage, sqlite on system hd and data on separate hd
(no ssl in my laptop :) ), now even the UI gets unresponsive during
writing. Maybe it would be more efficient to remember which files were
written and issue a rescan after writing of metadata is done?
Generally the only issue I have with syncthing is its constant scanning
of stuff on the HD. Whenever I start it causes tons of read access (to
images, not database) producing command line output like
"digikam.dimg***: JPEG file identified" and "digikam.metaengine:
Orientation => Exif.Image.Orientation => 1", as if these files were new
or modified (they are not).

Cheers,
Simon

On 25/11/16 15:38, bugzilla_noreply@kde.org wrote:
> https://bugs.kde.org/show_bug.cgi?id=362023
>
> caulier.gilles@gmail.com changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |caulier.gilles@gmail.com
>
> --- Comment #16 from caulier.gilles@gmail.com ---
> What's about this file using digiKam AppImage bundle 5.4.0 pre release given at
> this url :
>
> https://drive.google.com/drive/folders/0BzeiVr-byqt5Y0tIRWVWelRJenM
>
> Gilles Caulier
>
Comment 18 Mario Frank 2017-02-22 15:10:48 UTC
Git commit 2f8ddd42ef62d7aea9e490cdb05ffcc644810c81 by Mario Frank.
Committed on 22/02/2017 at 15:05.
Pushed by mfrank into branch 'master'.

Merged the current state of the garbage collection branch which improves the database cleanup stage of the maintenance
and improves the reactiveness of the maintenance overall. We ported the way items are processed to a queue based method
that can use the CPUs more effectively and does not create thousands of threads.
Related: bug 283062, bug 216895, bug 374225, bug 351658, bug 329353
FIXED-IN: 5.5.0

M  +17   -12   NEWS

https://commits.kde.org/digikam/2f8ddd42ef62d7aea9e490cdb05ffcc644810c81
Comment 19 caulier.gilles 2020-08-02 13:20:09 UTC
digiKam 7.0.0 stable release is now published:

https://www.digikam.org/news/2020-07-19-7.0.0_release_announcement/

We need a fresh feedback on this file using this version.

Best Regards

Gilles Caulier
Comment 20 caulier.gilles 2022-01-10 15:51:32 UTC
Maik,

Why this file still open even the comment from comment #18 ?

Gilles
Comment 21 Maik Qualmann 2022-05-21 13:07:05 UTC
Hmm, it looks like closing the bug didn't work. I close it.

Maik