Bug 218633 - "Write Metadata to Selected Images" Very Slow for Large Number of Images
Summary: "Write Metadata to Selected Images" Very Slow for Large Number of Images
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Metadata-Engine (show other bugs)
Version: 1.2.0
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-14 12:46 UTC by DGardner
Modified: 2017-08-12 11:59 UTC (History)
7 users (show)

See Also:
Latest Commit:
Version Fixed In: 1.3.0


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description DGardner 2009-12-14 12:46:41 UTC
Version:           1.0.0-rc1 (using KDE 4.3.4)
OS:                Linux
Installed from:    Ubuntu Packages

I had used 1.0.0-beta5 to set the ratings on thousands on images, but
when I checked the JPEG files, the metadata had not been updated in
most of them. In 1.0.0-rc1, if I change the metadata on a single image,
it seems to be written OK, so I don't know if this is a problem that
has been fixed.

Anyway, the issue now is that I have thousands of image files that are
missing their metadata. I got into this situation a year or two ago and
sync'd the image metadata to the database--unaware that the metadata was
missing--and lost thousands of tags. In order to fix the metadata, I'm
going through each album, checking that the metadata (from the database)
looks OK, selecting the images in the album, and then clicking the menu
item "Image" -> "Write Metadata to Selected Images".

This works fine; the metadata is updated correctly in the files. However,
if there are a large number of selected image files, the process becomes
extremely slow. There appears to be a non-linear relationship between the
number of files and the time it takes to write the metadata in RC1:

  39 files -> 2 seconds
  71 files -> 4 seconds
  87 files -> 9 seconds
  120 files -> 43 seconds
  145 files -> 38 seconds
  155 files -> 44 seconds
  276 files -> 132 seconds
  484 files -> 312 seconds
  1,200 files -> I went for lunch.

Throughout this time, digiKam uses about 70-90% CPU.

I have turned on the option to update the file modification time when
the metadata is written. This will make it easier for me to spot when
digiKam is not updating the files properly in the future. Turning off
this option makes a big difference. For example, I updated 120 files in
43 seconds and the same operation took 38 seconds when I simply repeated
it immediately, but when I turned off the file modification time update,
the third run took 15 seconds. Similarly, the album with 484 images took
312 seconds when file modification time stamps were updated and 87 seconds
when the time stamps were not updated.

Yesterday, when I was not really timing things, the end of the 1,200+
file update seemed to hang. However, when I went to the console and
looked at the timestamps on the files, I could see that about five files
would have their time stamps updated and then there would be a long wait
(maybe 30-60 seconds) with no apparent changes and then five more files
would be updated and then nothing for another while, and so on until it
eventually finished.

I can see about 30 images in the thumbnail view at one time. When I
select all images in an album of, say, 100 files and update the metadata,
the progress bar quickly hits about 20-30% and then pauses while all of
the thumbnail images disappear and, one-by-one, slowly appear again.
After that the progress bar slowly moves on to 100%.

Is digiKam re-scanning an album each time it detects a file time stamp
change, even when it is making those changes itself on another thread?
That would explain the kind of behaviour I see. Is this likely to have
a big effect on the performance of other things like batch processing?
Comment 1 caulier.gilles 2009-12-14 12:56:57 UTC
Yes you have right. A process run from KDELibs and use KDirWatch API to check items changed. I think file date time is handle in by it.

I have no idea how to change this behavior for this moment. Anyway your investigation are very interesting for the future. we will take a care about...

Gilles Caulier
Comment 2 DGardner 2009-12-14 13:21:31 UTC
It just occurred to me that while I did not time the 1,200+ file
metadata update yesterday, my filesystem did.

There were 1,283 files and the difference between the earliest and
the latest modification time stamp is 30 min 19 sec. So you can
add this to my list of timings:

  1,283 files -> 1,819 seconds

These were from an older 2MP camera with an average files size of
about 800kB.

Would it be possible to stop digiKam from reacting to notifications
from KDirWatch while it has a background operation in progress?
Perhaps digiKam could simply store up these notifications and then
apply the appropriate ones when it has finished the first job. It
does not sound easy to figure out what "appropriate" would be, though.

In the meantime, I'll disable the modification time stamp updates,
write the metadata to the files, run "touch" from the command line
and then turn on the modification time stamp updates again, as I
won't be doing any more large batches of images again for a while.
Comment 3 caulier.gilles 2009-12-14 13:34:53 UTC
stopping KDirWatch events will have side-effect, especially if you use batch tool to sync metadata with database.

This tool is non modal dialog. It run in background and you can switch to main gui and continue to work. Now imagine than folder can non receive events from KDirWatch when you working on with icon-view...

Gilles Caulier
Comment 4 DGardner 2009-12-14 13:54:47 UTC
I'm imagining.... It would be much faster, wouldn't it? ;-)

I could always just hit the refresh key when I am interested in
an update.

I have found a slightly easier way around this. If I leave on the
modification time stamp updates (which is what I want), I can avoid
the long waits by selecting an album, clicking "Album" -> "Write
Metadata to Images" and then, once the job starts, selecting a
different album folder and waiting. I did that and it took 4.5 mins
to update 935 files. The file-watching and scanning only seems to
apply to the album currently being viewed, so viewing a different
album from the one where the metadata changes are under way seems to
avoid the issue.

I prefer to wait in the second album, because if I start a second
job to update the metadata, the progress of the first job cannot be
monitored any more. Once the second job completes, the progress bar
is cleared even though I can still hear the disk thrashing away on
the first job (and see the time stamps changing on the files from
the console). Maybe you could integrate the progress with the KDE
notifications widgets that can track several jobs at once (like in
the Dolphin file manager).
Comment 5 caulier.gilles 2009-12-14 13:58:22 UTC
>Maybe you could integrate the progress with the KDE
>notifications widgets that can track several jobs at once (like in
>the Dolphin file manager).

Already done with batch tool to write metadata, not yet with single album action...

Gilles Caulier
Comment 6 Marcel Wiesweg 2009-12-17 20:58:31 UTC
Probable explanation for the fact that changing the album helps:
A change of modification date will also trigger recreation of the thumbnail.
Comment 7 Fathi Boudra 2010-02-28 09:59:53 UTC
Debian has a similar report (bug #571811):
Package: digikam
Version: 2:1.1.0-1

at the moment, digikam is very slow on writing metadata to images.

I select round about 20 JPGs between 3 and 7 MB, modify the description and click "apply". The update of the internal DB is fast, but the following write opration to the files is very, very slow, up to 5 minutes.

I'm not sure when this issue occured the first time, a few weeks ago everything run fine.

There's an open Bug at KDE: https://bugs.kde.org/show_bug.cgi?id=218633. I'm unsure if this is the same issue. I've deselected "update modification time" with no result.

iotop doesn't show any unusual, 0 B/s write and read. top reports digikam at 100% CPU.

If you need any additional information, please let me know.
Comment 8 Michael Holtermann 2010-04-27 20:22:35 UTC
Hello,

this issue is still present in 1.2.0 (using in Debian unstable and experimental).

Interesting, in gwenview the same thing happens. Is it possible, that libexiv causes these problems?

Setting and managing of metadata is a key feature of digikam. Is there any chance of getting it back?

Nearly distressed, but with kind regards
Michael
Comment 9 caulier.gilles 2010-04-30 13:16:12 UTC
Yes, this file is fully relevant of Exiv2 library. For Info, I CC Andreas who manage this project.

Q : which Exiv2 version you use exactly ? Go to Help/ Components Info for details.

Gilles Caulier
Comment 10 Andreas Huggel 2010-04-30 13:55:48 UTC
Exiv2 0.19 has a known performance issue with images from some Nikon SLRs. It is _very_ slow for these. Images from any other cameras are not affected.

Just from reading the descriptions above, comment #7 might be related to this problem, and it should be easy to determine whether that is really the case. If it is, upgrading to exiv2 from SVN trunk will help.

For any other (new) performance issue, if you think it is related to exiv2, please narrow the test down to exiv2 and report to the exiv2 bugtracker at dev.exiv2.org preferably.

A simple test could be to run a more or less complex update command over a number of images, e.g., something like this:

$ time exiv2 -M'set Exif.Image.ImageDescription Some description' *.jpg

and compare with the time digiKam takes for a similar operation and with the time a simple cp takes to just copy the files within the same directory. Ideally, they should all be pretty close with cp-time < exiv2-time < digiKam-time

Andreas
Comment 11 Michael Holtermann 2010-05-01 13:02:25 UTC
@9 from Gilles:
I'm using Debian unstable's version 0.19.

@10 from Andreas:
I've checked with some Panasonic Lumix images, this works very well. I'm shooting with a Nikon D90 SLR, which fits into your description.
I'm going to open a bug at Debian's libexiv2-6 package for an update to the trunk version of libexiv. If this doesn't help, I'll fill a bug at exiv2.org.

Comment #7 was opened by me at bugs.debian.org and forwarded by Fathi :-)

As suggested, some timings for manipulating 10 images:
Panasonic Lumix:
real    0m0.467s
user    0m0.128s
sys     0m0.204s
Nikon D90:
real    1m41.809s
user    1m35.098s
sys     0m0.516s

Many thanks!
Kind regards, Michael.
Comment 12 Mark Purcell 2010-05-01 14:05:37 UTC
This looks like a duplicate of Bug 224094

Forwarded upstream:
http://dev.exiv2.org/issues/show/677
Comment 13 Michael Holtermann 2010-06-06 10:25:55 UTC
exiv2 was recently updated to 0.20, which should not contain this bug any more.
Comment 14 caulier.gilles 2010-06-07 10:11:36 UTC
Yes, This file must be fixed using Exiv2 0.20.0. 

I close it now. Re-open if necessary

Gilles Caulier
Comment 15 caulier.gilles 2010-06-07 10:14:38 UTC
*** Bug 237104 has been marked as a duplicate of this bug. ***
Comment 16 DGardner 2011-01-28 12:25:49 UTC
I think my bug report was hijacked by comment #7. My original problem has nothing to do with Nikon images and Exiv2 problems, I shoot Canon. The issue is that enabling updates to the file modification times causes problems when DigiKam's own changes to images causes DigiKam to rescan the images and generates new thumbnails even though it is unnecessary.

Can we wind this bug back to comment #6? Is the problem as described originally and in comments #1-#6 fixed, or was this bug marked as "FIXED" because the unrelated issue with Nikon images and Exiv2 was fixed?