Bug 393205 - Advanced rename very slow
Summary: Advanced rename very slow
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: AdvancedRename-engine (show other bugs)
Version: 5.9.0
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-16 21:39 UTC by Flaviu Tamas
Modified: 2022-01-31 22:03 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In: 6.0.0
Sentry Crash Report:


Attachments
exerpts from log when renaming images (10.27 KB, text/plain)
2018-04-17 17:10 UTC, Flaviu Tamas
Details
overall flame graph from startup to while renaming images (1.44 MB, text/html)
2018-04-17 17:11 UTC, Flaviu Tamas
Details
new flamegraph after the patch was applied (1.90 MB, image/svg+xml)
2018-05-28 23:59 UTC, Flaviu Tamas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Flaviu Tamas 2018-04-16 21:39:09 UTC
I have a large collection (13000) of files that I'm trying to rename. However, it takes about 5 seconds per image to rename each file. This is unreasonable, since I am running this job on an SSD with a reasonably fast processor. iotop shows digikam at 0% IO, and digikam is using 2/4 of my cores almost completely.

When I run this rename under digikam, I get a whole bunch of the following for each file renamed:

statx(AT_FDCWD, "/home/user/tmp/recoveredpics/2010-02-04T17:05:44.JPG", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL, stx_attributes=0, stx_mode=S_IFREG|0777, stx_size=2186973, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3545, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3545, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3545, ...}) = 0

The rename "preview" is also slow, but it's acceptable, and I don't think there's room for (or point to) improvement there.
Comment 1 Maik Qualmann 2018-04-17 06:57:39 UTC
So 5 seconds is really very long. DigiKam not only rename the files but also has to delete the thumbnail and create a new one, etc. Here there are certainly still optimization possibilities, for example just only refresh the path in the thumbnail DB. Also, the current serial rename process to parallization is an option. Nevertheless, I have never observed 5 seconds, are the databases also on an SSD?

Maik
Comment 2 Flaviu Tamas 2018-04-17 17:09:10 UTC
> are the databases also on an SSD?

Yes

Actually, it appears that sometimes several files will be renamed very quickly. But then the next file will take a while.

Another thing: if I move just a few of my 13000 photos and videos into another directory and rename them there, the process finishes instantly. This makes me think that the problem is with one directory containing many files.

I've attached some exerts from the digikam log, as well as a flamegraph from running digikam under perf (this contains stuff from startup too).
Comment 3 Flaviu Tamas 2018-04-17 17:10:39 UTC
Created attachment 112079 [details]
exerpts from log when renaming images
Comment 4 Flaviu Tamas 2018-04-17 17:11:15 UTC
Created attachment 112080 [details]
overall flame graph from startup to while renaming images
Comment 5 Maik Qualmann 2018-04-18 06:09:24 UTC
Git commit fee46b9602a1dfe4f9ed003529dce84dc3646901 by Maik Qualmann.
Committed on 18/04/2018 at 06:08.
Pushed by mqualmann into branch 'master'.

not the thumbnail loading thread determines the clock when renaming

M  +8    -19   core/utilities/advancedrename/advancedrenameprocessdialog.cpp

https://commits.kde.org/digikam/fee46b9602a1dfe4f9ed003529dce84dc3646901
Comment 6 Maik Qualmann 2018-04-18 08:39:45 UTC
A question, the folder in which the files are renamed contain 13000 files? If so, the cause is the QFileSystemWatcher, which is supposed to only trigger external events. Perform a permanent rescan of the huge collection here.

Maik
Comment 7 Maik Qualmann 2018-04-18 10:34:01 UTC
Git commit 8dad5fb6ccbee42d75afa79b2c1e7b6a78e5cdd1 by Maik Qualmann.
Committed on 18/04/2018 at 10:32.
Pushed by mqualmann into branch 'master'.

use a very long timer for external events for the collection scan

M  +1    -1    core/libs/album/albumwatch.cpp
M  +24   -0    core/libs/database/utils/scancontroller.cpp
M  +10   -0    core/libs/database/utils/scancontroller.h

https://commits.kde.org/digikam/8dad5fb6ccbee42d75afa79b2c1e7b6a78e5cdd1
Comment 8 Maik Qualmann 2018-04-18 10:38:53 UTC
It would be nice if Gilles updated the AppImage for digiKam-6.0.0 you would do a test.

Maik
Comment 9 caulier.gilles 2018-04-18 10:49:42 UTC
Hi Maik,

I will update the bundles this evening...

Gilles
Comment 10 caulier.gilles 2018-04-18 18:56:02 UTC
new AppImages are uploaded...

https://files.kde.org/digikam/

Gilles
Comment 11 Flaviu Tamas 2018-05-19 18:15:55 UTC
I thought I'd replied to this, but I guess not:

>A question, the folder in which the files are renamed contain 13000 files?

That is correct.

I ran into some kind of issue with digikam's image recognition database earlier, so I haven't been able to test this patch.

Latest master (04bc525) doesn't have the database migration issue, but renaming files is still slow.

I tried changing 'd->externalTimer->setInterval(2000);' (see 8dad5fb) to '200000' to fit the spirit of "very long timer" a little better (since setInterval is in milliseconds), but that doesn't help.
Comment 12 Maik Qualmann 2018-05-19 19:23:03 UTC
A test here with external local MySQL DB on an SSD and 1500 images on a HDD gives the following result with a simple counter renaming:

krename: 3:00 minutes
digiKam: 3:30 minutes

You notice that the more files in the directory, the renaming becomes slower and slower. Considering that digiKam deletes the old thumbnail in addition to renaming, renames the file in the database, and creates a preview thumbnail, I think it can not get any faster. It almost seems to me that the more files in a directory, the slower the access to the file system.

Maik
Comment 13 Flaviu Tamas 2018-05-19 20:42:49 UTC
1500 photos are not enough to reproduce this bug. I did an experiment where I used krename and digikam to rename some files with the EXIF modify date and a counter. Here are the results with both the images and the database on an SSD:

#     | krename | digikam           |
1001  | 6s      | 34s               |
5001  | 64s     | 21% done at 6040s |

I don't believe this is a "digikam is 5x slower than krename" issue, it seems more like krename takes n time, and digikam takes n^2 time.
Comment 14 Maik Qualmann 2018-05-22 20:59:03 UTC
Git commit d765c17ce96706c5f6a1116ffe3d898100aa47b0 by Maik Qualmann.
Committed on 22/05/2018 at 20:58.
Pushed by mqualmann into branch 'master'.

use static function QFile::exist() is a little bit faster

M  +2    -3    core/libs/iojobs/iojob.cpp

https://commits.kde.org/digikam/d765c17ce96706c5f6a1116ffe3d898100aa47b0
Comment 15 Maik Qualmann 2018-05-27 16:01:29 UTC
The problem is detected, it is also related to Bug 391115. The icon view takes more and more time, the more elements are displayed. The QListView and QSortFilterProxyModel gets slower as you add or remove items. Moving (renaming) in the database initiates this process. At the moment I do not see that there is a mistake in the implementation or there is a possibility for us to accelerate QListView. I have a patch here that dramatically speeds up the renaming process. I rename the ImageInfo, also in the ImageInfo cache, in the database and in the thumbnail database. Therefore, the item in the QListView does not need to be removed and added with a new name. The patch has to be tested a bit...

Maik
Comment 16 Maik Qualmann 2018-05-28 20:29:16 UTC
Git commit 24ec6a3bdcf7f4b5ed50ecd38bd9bc7682649ded by Maik Qualmann.
Committed on 28/05/2018 at 20:26.
Pushed by mqualmann into branch 'master'.

speed up renaming the image and rename the thumbnail path
Related: bug 382474

M  +7    -0    core/libs/database/coredb/coredb.cpp
M  +6    -0    core/libs/database/coredb/coredb.h
M  +14   -0    core/libs/database/item/imageinfo.cpp
M  +6    -0    core/libs/database/item/imageinfo.h
M  +7    -0    core/libs/database/thumbsdb/thumbsdb.cpp
M  +2    -0    core/libs/database/thumbsdb/thumbsdb.h
M  +22   -14   core/libs/database/utils/dio.cpp

https://commits.kde.org/digikam/24ec6a3bdcf7f4b5ed50ecd38bd9bc7682649ded
Comment 17 Flaviu Tamas 2018-05-28 23:58:52 UTC
I get huge improvements with that new code, thank you! In the time it took me to write this comment, Digikam is at 25% renaming my 13000 image collection.

It could still be faster (I don't think I'm ever happy with the performance of anything), but I think it's a good time to call this bug fixed.

Just for fun, I've attached another flame graph. About 50% of the time is now being spent loading the thumbnail, another 10% in __tzset, and another 10% in software aes (I really need to get a cpu with aes-ni :)).
Comment 18 Flaviu Tamas 2018-05-28 23:59:40 UTC
Created attachment 112936 [details]
new flamegraph after the patch was applied
Comment 19 caulier.gilles 2018-05-29 04:12:03 UTC
Comment on attachment 112936 [details]
new flamegraph after the patch was applied

Which profiling tool did you use to capture cpu idle with digiKam ? 

Which modifications did you introduce in compilation workflow of digiKam to achieve this profiling ?

Gilles Caulier
Comment 20 Maik Qualmann 2018-05-29 06:56:26 UTC
I leave the bug still open, that rename at override does not update the Icon view correctly. I want to fix that first. Disabling the thumbnail in the dialog box would not make it any better. At the moment, renaming is a single core job, but whether it would really do anything to split it into multiple jobs is questionable. Here digiKam is faster than krename.

Maik
Comment 21 Maik Qualmann 2018-05-29 16:46:41 UTC
Git commit 35e125ac8445e9e77db75e85aa0d4e53ab7a0e36 by Maik Qualmann.
Committed on 29/05/2018 at 16:45.
Pushed by mqualmann into branch 'master'.

fix rename by overwrite
Related: bug 382474
FIXED-IN: 6.0.0

M  +3    -1    NEWS
M  +1    -0    core/libs/album/albummanager.cpp
M  +3    -3    core/libs/database/coredb/coredb.cpp
M  +1    -1    core/libs/database/coredb/coredb.h
M  +1    -1    core/libs/database/item/imageinfo.cpp
M  +1    -0    core/libs/database/models/imagelistmodel.cpp
M  +11   -1    core/libs/database/utils/dio.cpp
M  +1    -0    core/libs/models/imagealbummodel.cpp
M  +1    -0    core/utilities/queuemanager/views/queuelist.cpp

https://commits.kde.org/digikam/35e125ac8445e9e77db75e85aa0d4e53ab7a0e36
Comment 22 Flaviu Tamas 2018-05-29 22:10:59 UTC
(In reply to caulier.gilles from comment #19)
> Comment on attachment 112936 [details]
> new flamegraph after the patch was applied
> 
> Which profiling tool did you use to capture cpu idle with digiKam ? 

I used perf, but I wouldn't trust the cpu idle info to be accurate.

For details on how to use perf, there's Brendan Gregg's excellent guide at http://www.brendangregg.com/perf.html#OneLiners. The tool I used to generate the flame graph can be found at http://www.brendangregg.com/perf.html#FlameGraphs.

> Which modifications did you introduce in compilation workflow of digiKam to
> achieve this profiling ?

You must compile with debug information to get accurate stack traces, but that's it.

> Gilles Caulier
Comment 23 caulier.gilles 2018-05-31 18:28:36 UTC
Flaviu,

Thanks for the feedback. The links given are very instructive/useful.

Best

Gilles Caulier