Bug 414320 - Performance regression when scanning directories with many sub-directories
Summary: Performance regression when scanning directories with many sub-directories
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Database-Scan (show other bugs)
Version: 6.4.0
Platform: macOS (DMG) macOS
: NOR grave
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-20 10:18 UTC by Daniel Barea
Modified: 2020-04-20 11:22 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In: 7.0.0
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Barea 2019-11-20 10:18:50 UTC
SUMMARY
When scanning a collection which includes an album with many sub-directories, the scanner's performance is heavily impacted.

This regression was introduced in commit e00ade3c (first included in v6.4.0), due to calling function QDir::entryList too many times in the loop that iterates the collection (collectionscanner_scan.cpp:489 and potentially also 552).

One possible fix is to cache the result of entryList, as well as the parent sub-album's directory from the previous iteration. In the next iteration, if the parent directory of the current sub-album is the same as the one saved from the previous iteration, use the cached result of entryList, instead of calling the function again. Since all sub-albums with the same root will be iterated together, this implementation should reduce significantly the performance penalty.

STEPS TO REPRODUCE
1. Create the collection directory with many (tested with 9K) sub-directories. Each sub-directory shall contain a number of pictures.
2. Add the created directory to digikam and trigger a media scan.

OBSERVED RESULT
The scan takes a very long time to complete (from hours to days) or stalls indefinitely.

EXPECTED RESULT
The scan completes in a few minutes.

SOFTWARE/OS VERSIONS
Windows: -
macOS: 10.14.6
Linux/KDE Plasma: -
(available in About System)
KDE Plasma Version: -
KDE Frameworks Version: 5.63.0
Qt Version: 5.13.2

ADDITIONAL INFORMATION
Tested with pre-built v6.4.0 on macOS and Linux, as well as self-compiled v7.0.0-git (5a1bc45b) on macOS.
Comment 1 caulier.gilles 2019-11-20 10:23:00 UTC
To be more precise, the commit is this one :

https://invent.kde.org/kde/digikam/commit/e00ade3c4bd32822db4531bf32906633c84e6c53

Gilles Caulier
Comment 2 Maik Qualmann 2019-11-20 10:50:49 UTC
Well, it's the only way to get the real file name of a folder under Windows. I am amazed that this code takes so long and MacOS. I had carried out intensive tests and there were hardly measurable delays under Linux. Because the folders should be long in the cache, because we already determine the number of folders. Ok, we can only allow this code for Windows.

Maik
Comment 3 Maik Qualmann 2019-11-20 11:31:24 UTC
Git commit 2d214984e574e63c082e5461cf2c8d72abe95bc1 by Maik Qualmann.
Committed on 20/11/2019 at 11:30.
Pushed by mqualmann into branch 'master'.

read whole dir only under Windows to get correct file name
FIXED-IN: 7.0.0

M  +2    -1    NEWS
M  +6    -4    core/libs/database/collection/collectionscanner_scan.cpp

https://invent.kde.org/kde/digikam/commit/2d214984e574e63c082e5461cf2c8d72abe95bc1
Comment 4 Maik Qualmann 2019-11-20 11:37:20 UTC
Since we are running in least 3 times through network or local folders in the collection scanner, I've been planning a try with a cache for a while now. We'll see how much a QHash consumes with 100K QFileInfo objects and whether it significantly improves performance.

Maik
Comment 5 Daniel Barea 2019-11-20 12:07:18 UTC
Hi Maik

Thanks for your fast support, the last commit fixes the issue in my setup.

Since the collection I am working with is accessed through a FUSE mount, it is possible that filesystem caches are not being used. So I believe the issue is not macOS/Linux specific and could also arise on Windows for certain volume types.

Daniel
Comment 6 Maik Qualmann 2020-04-20 11:22:18 UTC
*** Bug 420334 has been marked as a duplicate of this bug. ***