Bug 362023

Summary: Extremely slow metadata writing via maintenance
Product: [Applications] digikam Reporter: Simon <freisim93>
Component: Maintenance-MetadataAssignee: Digikam Developers <digikam-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: caulier.gilles, koen.bizy, metzpinguin, sven.burmeister
Priority: NOR    
Version: 5.4.0   
Target Milestone: ---   
Platform: Debian unstable   
OS: Linux   
Latest Commit: Version Fixed In: 7.7.0
Sentry Crash Report:
Attachments: Command line output of writing metadata to files.
scancontroller.patch
Startup of digikam and start of writing metadata with scancontroller.patch
Command line output in the middle of writing metadat to files with scancontroller.patch
scancontroller2.patch
Startup of digikam and start of writing metadata with scancontroller2.patch
Command line output in the middle of writing metadat to files with scancontroller2.patch

Description Simon 2016-04-21 06:30:55 UTC
I am writing tags that previously only existed in (sqlite) database to image metadata via the maintenance tool. This takes an enormous amount of time, I am currently at 10% after 3 days continuously running. The bottleneck is (obviously) disk io. Still this should take much less time (exiftool took about 3h to delete old tags of the same collection). The collection contains 100'000 items and probably about half of them are to be tagged.
What looks odd to me is, that throughout the process digikam.general reports that QFileSystemWatcher detected change in the folder that is currently written to by the metadata write. The occurrence of this is also not regular and at a fixed position within the log entries from writing metadata to tags, which suggest to me that the two things are separate.

As it is digikam that is writing to these folders and not an external programs, are these scans really necessary?
And if they are, can they be delayed till the maintenance tool finished writing to a directory.

I do not see an option to attach something. I will get a partial log into this bug report as soon as I found out how (or use a pastebin).

Reproducible: Always
Comment 1 Simon 2016-04-21 06:32:13 UTC
Created attachment 98490 [details]
Command line output of writing metadata to files.
Comment 2 Maik Qualmann 2016-04-21 19:04:23 UTC
Git commit 773f9361e5df5904c938ad9ee4cbb19acd1aa1f6 by Maik Qualmann.
Committed on 21/04/2016 at 19:03.
Pushed by mqualmann into branch 'master'.

fix absolute file path without symbolic links

M  +2    -2    libs/dmetadata/metaengine.cpp

http://commits.kde.org/digikam/773f9361e5df5904c938ad9ee4cbb19acd1aa1f6
Comment 3 Maik Qualmann 2016-04-21 19:19:52 UTC
This commit fixes only the writing of metadata for images which are linked via a symbolic link.

Maik
Comment 4 Maik Qualmann 2016-04-22 17:51:34 UTC
Created attachment 98521 [details]
scancontroller.patch

Can you try this test patch? And report how digiKam now behaves.

Maik
Comment 5 Simon 2016-04-22 22:40:08 UTC
Thanks for looking into this. I applied your patch. The results seems to be the same (maybe somewhat less frequent rescans). I attached the initial part of the log after startup, where redundant stuff is excluded (marked by [...]). The actual scanning starts at line 400. A second command line output is from later on during the scan.
Comment 6 Simon 2016-04-22 22:41:24 UTC
Created attachment 98527 [details]
Startup of digikam and start of writing metadata with  scancontroller.patch
Comment 7 Simon 2016-04-22 22:42:20 UTC
Created attachment 98528 [details]
Command line output in the middle of writing metadat to files with scancontroller.patch
Comment 8 Simon 2016-04-24 18:16:29 UTC
And the scan is now going clearly slower than before the patch. I am now 
running it almost two days and its at 3% only.
Comment 9 Maik Qualmann 2016-04-27 19:14:03 UTC
Created attachment 98651 [details]
scancontroller2.patch

That it is now slowly working because now images are processed with symbolic links.
Please try this patch. He also adds a time measurement.

Maik
Comment 10 Simon 2016-04-28 10:31:32 UTC
Do I apply this patch on top of the current HEAD or on top of your 
previous patch? I guess the first, but just to be sure.

On 27/04/16 21:14, Maik Qualmann via KDE Bugzilla wrote:
> https://bugs.kde.org/show_bug.cgi?id=362023
>
> --- Comment #9 from Maik Qualmann <metzpinguin@gmail.com> ---
> Created attachment 98651 [details]
>    --> https://bugs.kde.org/attachment.cgi?id=98651&action=edit
> scancontroller2.patch
>
> That it is now slowly working because now images are processed with symbolic
> links.
> Please try this patch. He also adds a time measurement.
>
> Maik
>
Comment 11 Simon 2016-04-28 17:50:59 UTC
Created attachment 98664 [details]
Startup of digikam and start of writing metadata with scancontroller2.patch
Comment 12 Simon 2016-04-28 17:51:59 UTC
Created attachment 98665 [details]
Command line output in the middle of writing metadat to files with scancontroller2.patch

I applied the patch and added the command line output in the same fashion as before.
Comment 13 Maik Qualmann 2016-05-02 19:14:42 UTC
This is are long waiting times, up to 5 seconds until a scan is completed for one image. Disabling the scanning does not help, he would be rescheduled in any case. Modification date or file size have changed and need to be updated in the DB. Writing to the SQLite DB is the time problem. The SQLite DB to put on an SSD drive is strongly recommended. Here are a few measured values, writing of one image information in the DB this include read new information from image (images on HDD - EXT4):

HDD:
SQLite: 180-270ms
internal MySQL: 40-70ms

SSD:
SQLite: 30-60ms

Are the images on an NTFS partition? Is also here the SQLite DB?

Maik
Comment 14 Simon 2016-05-04 14:46:58 UTC
Indeed, my setup is far from optimal for disk io. I have both the 
database and the images on a ntfs pratition of a hard disk on my laptop 
(at least not system hd). I thought that the database would be 
automatically cached in ram. I will look at it again some time.
Thanks again for your help.

On 02/05/16 21:14, Maik Qualmann via KDE Bugzilla wrote:
> https://bugs.kde.org/show_bug.cgi?id=362023
>
> --- Comment #13 from Maik Qualmann <metzpinguin@gmail.com> ---
> This is are long waiting times, up to 5 seconds until a scan is completed for
> one image. Disabling the scanning does not help, he would be rescheduled in any
> case. Modification date or file size have changed and need to be updated in the
> DB. Writing to the SQLite DB is the time problem. The SQLite DB to put on an
> SSD drive is strongly recommended. Here are a few measured values, writing of
> one image information in the DB this include read new information from image
> (images on HDD - EXT4):
>
> HDD:
> SQLite: 180-270ms
> internal MySQL: 40-70ms
>
> SSD:
> SQLite: 30-60ms
>
> Are the images on an NTFS partition? Is also here the SQLite DB?
>
> Maik
>
Comment 15 Bizy 2016-07-09 00:09:40 UTC
Same here (Ubuntu 16.4).

Already more than 4 hours to update (via 'Maintenance', tags database --> images) a folder with some 4000 images.  Memory use more than 4 Gb... Progression window still indicating 0%...
Guess that's not how it's supposed to be...

Workaround:  selecting all images and same command via 'Edit', takes 15 minutes...

PS:  If you want me to do something, please be very specific... (most of the conversation above is beyond my comprehension...)
Comment 16 caulier.gilles 2016-11-25 14:38:09 UTC
What's about this file using digiKam AppImage bundle 5.4.0 pre release given at this url :

https://drive.google.com/drive/folders/0BzeiVr-byqt5Y0tIRWVWelRJenM

Gilles Caulier
Comment 17 Simon 2016-11-25 23:30:52 UTC
Hi Gilles,

This problem is still the same. I reduced it for me by using internal
mysql database and preloading most of the database to memory.
When testing with appimage, sqlite on system hd and data on separate hd
(no ssl in my laptop :) ), now even the UI gets unresponsive during
writing. Maybe it would be more efficient to remember which files were
written and issue a rescan after writing of metadata is done?
Generally the only issue I have with syncthing is its constant scanning
of stuff on the HD. Whenever I start it causes tons of read access (to
images, not database) producing command line output like
"digikam.dimg***: JPEG file identified" and "digikam.metaengine:
Orientation => Exif.Image.Orientation => 1", as if these files were new
or modified (they are not).

Cheers,
Simon

On 25/11/16 15:38, bugzilla_noreply@kde.org wrote:
> https://bugs.kde.org/show_bug.cgi?id=362023
>
> caulier.gilles@gmail.com changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |caulier.gilles@gmail.com
>
> --- Comment #16 from caulier.gilles@gmail.com ---
> What's about this file using digiKam AppImage bundle 5.4.0 pre release given at
> this url :
>
> https://drive.google.com/drive/folders/0BzeiVr-byqt5Y0tIRWVWelRJenM
>
> Gilles Caulier
>
Comment 18 Mario Frank 2017-02-22 15:10:48 UTC
Git commit 2f8ddd42ef62d7aea9e490cdb05ffcc644810c81 by Mario Frank.
Committed on 22/02/2017 at 15:05.
Pushed by mfrank into branch 'master'.

Merged the current state of the garbage collection branch which improves the database cleanup stage of the maintenance
and improves the reactiveness of the maintenance overall. We ported the way items are processed to a queue based method
that can use the CPUs more effectively and does not create thousands of threads.
Related: bug 283062, bug 216895, bug 374225, bug 351658, bug 329353
FIXED-IN: 5.5.0

M  +17   -12   NEWS

https://commits.kde.org/digikam/2f8ddd42ef62d7aea9e490cdb05ffcc644810c81
Comment 19 caulier.gilles 2020-08-02 13:20:09 UTC
digiKam 7.0.0 stable release is now published:

https://www.digikam.org/news/2020-07-19-7.0.0_release_announcement/

We need a fresh feedback on this file using this version.

Best Regards

Gilles Caulier
Comment 20 caulier.gilles 2022-01-10 15:51:32 UTC
Maik,

Why this file still open even the comment from comment #18 ?

Gilles
Comment 21 Maik Qualmann 2022-05-21 13:07:05 UTC
Hmm, it looks like closing the bug didn't work. I close it.

Maik
Comment 22 S. Burmeister 2024-11-18 20:48:52 UTC
Should I open a new bug if this is still valid for 8.5 on windows, using mariadb.
Comment 23 Maik Qualmann 2024-11-18 22:01:06 UTC
The speed of writing metadata depends on many factors.
The type of drive, local hard drive, whether HDD or SSD or even a network drive.
Is writing with ExifTool enabled? ExifTool is safer but also significantly slower.
The type of images, large TIFF files or RAW files?

When we write metadata, we also have to read the file again because important parameters have changed, file size, UUID of the file, etc., in order to keep the DB up to date.

In general, writing metadata is not particularly fast. I don't think we currently have a bug or can significantly increase the speed.

A word about MariaDB. MariaDB is not automatically faster, especially under Windows where we have to work with the DB over the network. A SQLite DB on a fast SSD can be significantly more performant.

Maik