Bug 396086

Summary: Changing elements order takes near 5 minutes in a folder with 50 images
Product: [Applications] digikam Reporter: Rafael Linux User <rafael.linux.user>
Component: Albums-ItemsSortAssignee: Digikam Developers <digikam-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: caulier.gilles, jens-bugs.kde.org, metzpinguin, rafael.linux.user
Priority: NOR    
Version: 5.9.0   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In: 7.2.0
Sentry Crash Report:
Attachments: Wasted area in red square
Bash script to create 390k hardlinks
New script (2020) to create 4 folders with 10k distinct images each one

Description Rafael Linux User 2018-07-02 10:53:43 UTC
Using digikam with internal mysql with a database of +100.000 photos, when we are in a folder of 50 only photos and change the elements order to any kind (type, size of file, filename) digikam takes more than 4 minutes. What's happening? Is digikam reordering entire database?

Regards
Comment 1 Maik Qualmann 2018-07-02 11:30:04 UTC
No, it's not normal that sorting takes so long for 50 images in the album. It also has nothing to do with the database, in principle. Can you download and test the AppImage from www.digikam.org? AppImage does not install anything in the system, only needs to be provided with execute rights.

Maik
Comment 2 Rafael Linux User 2018-07-02 22:02:43 UTC
I'll try it tomorrow. Is expected some change (corruption or similar) when linking to the actual internal mysql database?
Comment 3 Maik Qualmann 2018-07-03 06:07:28 UTC
Yes, we make changes to the database in digiKam-6.0.0. So it's better you make a backup of your internal MySQL folder.

Maik
Comment 4 Rafael Linux User 2018-07-03 08:54:53 UTC
I have a question about the backup ... being an "internal mysql", how to backup my database? I was thinking to use "mysqldump" but I realized that I don't know user/password .... I didn't find nothing about "Internal mysql" in the wiki.

Please, any help?
Comment 5 caulier.gilles 2018-07-03 09:13:55 UTC
The internal mysql databases are all located at the same place in a hidden directory that you setup from DK config dialog. Just copy all the contents to a backup drive and that all.

Gilles Caulier
Comment 6 Rafael Linux User 2018-07-03 13:13:09 UTC
Thank you Caullier, I found the folder. 

I'm sorry, but user was too busy to let me try the AppImage today, cause - for reasons I don't know - he lost remain files from the camera after he made a "Delete selected" just after he imported that selected files from Digikam Import. He needed to recover them with PhotoRec.

I'll try sorting tomorrow.

Regards
Comment 7 Rafael Linux User 2018-07-04 13:15:09 UTC
Bad news. Version 6 didn't change the bad behaviour about sortening. No changes at all at this. Same folder (or any other one) with less than 50 jpeg files of about 7MiB each one. Any type of reordering, took more than 2m30s.

:(
Comment 8 caulier.gilles 2018-07-04 13:51:48 UTC
Wait a minute,

I currently rebuild the 6.0.0 pre-release AppImage bundle. It will be only in one hour. Please try again with this version.


Gilles Caulier
Comment 9 Maik Qualmann 2018-07-04 16:48:25 UTC
I currently have no explanation why it takes so long. An album with 500 images is the time not measurable, the sorting is done immediately. A virtual album with 20,000 images in the view needs to sort by date about 1-2 seconds, sort by name about 5-6 seconds. And there are significantly more powerful computers than my double core...

Maik
Comment 10 Rafael Linux User 2018-07-05 10:14:50 UTC
Maybe is related to that is a 370K photos database. Anyway, latest bundle doesn't change anything about this. It took near 4 minutes in reordering 13 photos!!!
This time I was watching the system load thru "htop", to see if the problem was in mysql ... but it's not the guilty. Is Digikam how is eating 99% CPU (I forgot to say that Digikam is unusable while reordering). It's not a poor CPU power issue, cause is an Intel Core i5-6500 and 8GiB RAM.

Let me know if I can help you with more info.
Comment 11 Rafael Linux User 2018-07-05 10:40:53 UTC
I forgot to say that launch Digikam takes about 5 minutes too, and is Digikam who is eating CPU (at 100%).
Comment 12 Maik Qualmann 2018-07-05 10:41:46 UTC
It does not matter how many images are in the database when the 50 images album is already displayed. There are only the 50 images in the already loaded image model sorted. The CPU usage of MySQL is more relevant. Is it possible that digiKam still creates thumbnails in the background? Or not yet captured all the images and albums. This can take some time, even hours ... until digiKam is ready and fully available. Is a running progress bar displayed in the status bar?

Maik
Comment 13 Rafael Linux User 2018-07-05 12:57:13 UTC
> loaded image model sorted. The CPU usage of MySQL is more relevant. Is it
> possible that digiKam still creates thumbnails in the background? Or not yet
> captured all the images and albums. This can take some time, even hours ...
> until digiKam is ready and fully available. 
Is not the case. All the images (we are talking about 13 jpg images) were processed by Digikam before changing ordering.

> Is a running progress bar displayed in the status bar?
No, no running progress bar was showed before reordering elements.
Comment 14 Rafael Linux User 2018-07-06 12:57:03 UTC
Created attachment 113800 [details]
Wasted area in red square

Wasted area inside red square.
Comment 15 Rafael Linux User 2018-07-06 12:57:26 UTC
Great!!. Issue solved. Thank you.

I noticed (maybe you know it) that in this beta appears a wasted area while showing thumbsnails (or not), like the attached screenshot. Did you notice that?
Comment 16 caulier.gilles 2018-07-06 13:04:00 UTC
yes, me i see it recently, with AppImage only. I don't know why, perhaps a side effect with Qt 5.9.6 LTS used to compile whole AppImage contents.

In all case this space is the same than thumb-bar, even if this one is not show in icon view mode. A weird bug.

Go to this area, and press right mouse button. An empty pup-up menu (with a checkbox inside) appears. Select it and the empty space must be replaced by the thumb-bar. Incredible no ?

Gilles Caulier
Comment 17 Rafael Linux User 2018-07-06 13:19:17 UTC
I imagined something like that. I wish this issue will dissapear in final release ;)
Comment 18 Maik Qualmann 2018-07-06 17:22:22 UTC
Git commit 967a93ee109a9e16f2f565d7738e370fbd37ecc1 by Maik Qualmann.
Committed on 06/07/2018 at 17:21.
Pushed by mqualmann into branch 'master'.

this could fix the problem with the thumb-bar

M  +1    -1    core/app/views/stackedview.cpp

https://commits.kde.org/digikam/967a93ee109a9e16f2f565d7738e370fbd37ecc1
Comment 19 caulier.gilles 2018-07-06 18:30:01 UTC
Maik,

No passing the parent to thumbar dock is not enough.

Try the last AppImage to see the effect.

Gilles
Comment 20 Maik Qualmann 2018-07-07 06:54:03 UTC
It is interesting that my compiled version with Qt-5.11 also has this empty CheckBox, if you click with the right mouse button on the narrow area where you can move the dock-bar.

Maik
Comment 21 Maik Qualmann 2018-07-07 09:07:27 UTC
The CheckBox is normal and comes from here QDockWidget::toggleViewAction(). Maybe we should set a title so that the QAction has a name.

Maik
Comment 22 caulier.gilles 2018-07-07 12:03:37 UTC
So, you suspect that changes from Qt introduce this dysfunction ?

Remember that KF5 have been updated to last stable version in the bundle. So, as all Mainview use KMainWindow class as parent, perhaps something has changed to manage layouts ?

Gilles
Comment 23 Maik Qualmann 2018-07-07 12:15:20 UTC
Git commit 0db966159ee98f7866b6a5ebbdabe2f3b059de75 by Maik Qualmann.
Committed on 07/07/2018 at 12:13.
Pushed by mqualmann into branch 'master'.

try to fix AppImage dock-bar problem

M  +1    -1    core/app/views/stackedview.cpp

https://commits.kde.org/digikam/0db966159ee98f7866b6a5ebbdabe2f3b059de75
Comment 24 Rafael Linux User 2018-07-07 12:20:02 UTC
Please, don't forget the main title (that is what really matters XD ). I shouldn't comment the beta bug in the same thread, sorry. My fault.  ;)
Comment 25 caulier.gilles 2018-07-07 16:01:27 UTC
Maik,

It's not yet fixed. Just built new AppImage bundle has always the problem with thumbbar

Gilles
Comment 26 caulier.gilles 2018-07-07 16:05:34 UTC
Maik,

I renamed ~/.config/digikamrc file as *.old, restarted digiKam from scratch, and the problem disapear with last 6.0.0 AppImage bundle.

So, it's look again a problem with GUI state storage in rc file or something like that.

gilles
Comment 27 Maik Qualmann 2018-08-05 16:35:38 UTC
Git commit a3104006d9047ccc338b7eb3d40a975f7059c0a9 by Maik Qualmann.
Committed on 05/08/2018 at 16:34.
Pushed by mqualmann into branch 'master'.

check if group info is already in the cache
Related: bug 397110

M  +20   -3    core/libs/database/item/imageinfo.cpp

https://commits.kde.org/digikam/a3104006d9047ccc338b7eb3d40a975f7059c0a9
Comment 28 Maik Qualmann 2018-08-05 16:50:39 UTC
Git commit 845a33a522e044949852589bf0c35cb577ae90da by Maik Qualmann.
Committed on 05/08/2018 at 16:48.
Pushed by mqualmann into branch 'master'.

check if tags info is already in the cache
Related: bug 397110

M  +18   -3    core/libs/database/item/imageinfo.cpp

https://commits.kde.org/digikam/845a33a522e044949852589bf0c35cb577ae90da
Comment 29 Jens 2018-08-20 12:24:54 UTC
FYI: The wasted space in the thumbnails view still exists in the current appimage as of yesterday (2018-08-19).
Comment 30 Jens 2018-08-21 08:11:28 UTC
... and it vanishes in thumbnail view when I resize the preview area (height) in the single photo view. Unfortunately, this setting is not kept ... but it's a workaround and it might help tracking this issue down.
Comment 31 Rafael Linux User 2018-08-21 08:51:38 UTC
And, as I said, the problem because I opened this bug still exist. Do anyone have a database with +300.000 photos to confirm this issue?
Comment 32 Jens 2018-08-21 11:05:31 UTC
No, but to reproduce you could write a script to hardlink a single photo 300.000 times, and then point a new fresh digikam installation (e.g. a separate user account) to this folder structure.

If you have such a script, I will be happy to test it on my hardware.
Comment 33 Rafael Linux User 2018-08-21 13:32:17 UTC
I did the script. I'm not sure about this "plain" and homogeneous source of photos (all are equal, folder names are not complex, there are no nested folders ... too simple) will do the trick, but we can try and I appreciate your help.
First you need, is a jpg photo named "photo.jpg" in the same folder that the script. Then, you can execute (previously, you should make it executable) the file. It will create 390.000 hardlinks. This hardlink grouped in 65000 links, are linked to 6 photos copied from your first one in each 6 folder. Each folder will have finally 65.000 hardlinks.

Now I'll try to attach the link. If that doesn't work, I'll copy here the script.
Comment 34 Rafael Linux User 2018-08-21 13:44:00 UTC
Created attachment 114531 [details]
Bash script to create 390k hardlinks

The script will create 6 folders each one with 65k hardlinks to one copy of photo.jpg copied to each folder.

Requirements: 
- A jpg file renamed to "photo.jpg"
- The script in the same folder
Comment 35 Jens 2018-08-22 22:36:18 UTC
Didn't work at first, but after changing {1..65000} to $(seq -w 1 65000), it worked fine.

First startup with initial scanning of folders was done in ~10 seconds.
Scanning of images - after startup - with progress bar at the bottom of the main Digikam window progressed with ~100 images per second, as I could see in the console where I started the Digikam appimge.

Scanning slowed down until at "FolderB", so after roughly 65000 images, it took 0.5s per image (!) - so I won't finish scanning all 300'000 images today. Will keep you updated.
Comment 36 Maik Qualmann 2018-08-23 06:04:36 UTC
The problem with this test is that all images are the same. DigiKam recognizes identical images in the database, which means that the number of duplicate images is constantly increasing and the database returns a larger search result for each new image. In addition, even with such an amount of images, these should be spread over 500-1000 albums, so that it is realistic.

Maik
Comment 37 caulier.gilles 2018-08-23 06:43:13 UTC
The scan process in DK is divided in 2 parts which are bottlenecks:

1/ scan of albums in recursive mode without contents inside. This populate few database tables.
2/ scan album contents listed in DB. This include all files registered in mimetype to support (photo and video). Other tables in database are populated.

Depending of collection sizes, both can take a while, but 2/ is always the most important.

1/ can be certainly parallelized, but in fact all is serialized in database. Even if we use multicore here to list directory, the gain will be minor.

2/ can be parallelized. The most import component used to populated the database with items properties is Exiv2 for photo, and ffmpeg for video.

Even if Exiv2 support multicore, Exiv2 use memory and is not optimum. If an improvement must be done it's here. So it's an UPSTREAM problem, already reported to Exiv2 team, but i never seen an improvements in this area.

In all cases, if we parallelize metadata parsing, the database serialization will limit the gain, excepted if the registration of items in database is done by chuncks of items, and not one by one.

And even if we chunck registration of items, i'm not sure if the gain will be visible with SQlite. Certainly it will be better with Mysql/Mariadb, especially with a remote server.

VoilĂ  for few explainations of DK DB scannner.
 
To resume, as Maik said, the one image linked plenty of time is not a valid test. The only parts which can be tested like this is 1/

Gilles Caulier
Comment 38 caulier.gilles 2018-08-23 06:46:06 UTC
The 99% of CPU is probably the auto-completion from album filter on bottom of tree-view.

This problem have been already reported in bugzilla, few code fixed by Maik, but, internally, algorithm from Qt5 are used and the complexity is so far to be perfect.

Perhaps this problem is fixed with more recent Qt5 implementation. AppImage bundle use Qt5.9.6 LTS, not the last one Qt 5.11.1.

Maik, if you have more details on this parts...

Gilles Caulier
Comment 39 Rafael Linux User 2018-08-24 07:50:44 UTC
(In reply to Jens from comment #35)
> Didn't work at first, but after changing {1..65000} to $(seq -w 1 65000), it
> worked fine.
> 
> First startup with initial scanning of folders was done in ~10 seconds.
> Scanning of images - after startup - with progress bar at the bottom of the
> main Digikam window progressed with ~100 images per second, as I could see
> in the console where I started the Digikam appimge.
> 
> Scanning slowed down until at "FolderB", so after roughly 65000 images, it
> took 0.5s per image (!) - so I won't finish scanning all 300'000 images
> today. Will keep you updated.

For me (in bash of OpenSUSE) that parameter for the loop worked. What's your o.s.?

Anyway, I did (better, I'm doing) the test, but after more than 28h, it didn't finish. There is no delay if I close and launch again Digikam, but today it only have (I guess) scanned completely "FolderA" and "FolderB", half of "FolderC" (29029 files) and 1580 files of "FolderD". "FolderF" is not showed. Doesn't appear (but it exist in Dolphin). But the worst is that Digikam doesn't show any album content (no thumbails) despite it's showing the images counter at the end of each album. Meanwhile, Digikam is taking one CPU core to 100%.

What about your experiencie, Jen?
Comment 40 Jens 2018-08-24 19:53:01 UTC
I can't see the images in the folders (but they do exist in the filessytem) and the scanning process is still at 17% with about 1 image per half second.

If the duplicate scanning takes so much time because the images are all the same, will it work better if I create a huge amount of small random JPEG files that are all different in content? Or does the duplicate scanner only take metadata int account?
Comment 41 Maik Qualmann 2018-08-24 20:02:16 UTC
Git commit f8d8dc6ebdbbb0f75561a6a4dc6a0a95d728ca42 by Maik Qualmann.
Committed on 24/08/2018 at 20:00.
Pushed by mqualmann into branch 'master'.

store the number of childs in the album
this commit reduces the start of digiKam here with 11000 albums from 1:30 minutes to 1:15 minutes
Related: bug 368468

M  +11   -0    core/libs/album/album.cpp
M  +8    -1    core/libs/album/album.h
M  +2    -2    core/libs/models/abstractalbummodel.cpp
M  +0    -14   core/libs/models/abstractalbummodelpriv.h

https://commits.kde.org/digikam/f8d8dc6ebdbbb0f75561a6a4dc6a0a95d728ca42
Comment 42 Maik Qualmann 2018-08-24 21:45:49 UTC
Git commit dcb01e39023564ea538eb06f2cf635a451f713e3 by Maik Qualmann.
Committed on 24/08/2018 at 21:44.
Pushed by mqualmann into branch 'master'.

implement a child album cache hash
this commit reduces the start of digiKam here with 11000 albums from 1:15 minutes to 0:35 minutes
Related: bug 368468

M  +16   -5    core/libs/album/album.cpp
M  +6    -1    core/libs/album/album.h
M  +2    -2    core/libs/models/abstractalbummodel.cpp
M  +0    -23   core/libs/models/abstractalbummodelpriv.h

https://commits.kde.org/digikam/dcb01e39023564ea538eb06f2cf635a451f713e3
Comment 43 Rafael Linux User 2018-08-24 23:10:18 UTC
(In reply to Jens from comment #40)
> I can't see the images in the folders (but they do exist in the filessytem)
> and the scanning process is still at 17% with about 1 image per half second.

Just the same. Images are there, but thumbnails are not showed in any folder.

> If the duplicate scanning takes so much time because the images are all the
> same, will it work better if I create a huge amount of small random JPEG
> files that are all different in content? Or does the duplicate scanner only
> take metadata int account?

I can't answer to that, but Gilles or Maik.   ;)
Comment 44 Maik Qualmann 2018-08-24 23:21:08 UTC
Git commit 6d16a4f96ac245ed11450326c128cf63ca5a1332 by Maik Qualmann.
Committed on 24/08/2018 at 23:17.
Pushed by mqualmann into branch 'master'.

implement a child album to row cache hash
this commit reduces the start of digiKam here with 11000 albums from 0:35 minutes to 0:20 minutes
the sorting of the albums and entries is now almost without delay.
Related: bug 368468

M  +14   -4    core/libs/album/album.cpp
M  +12   -5    core/libs/album/album.h
M  +1    -16   core/libs/models/abstractalbummodelpriv.h

https://commits.kde.org/digikam/6d16a4f96ac245ed11450326c128cf63ca5a1332
Comment 45 Maik Qualmann 2018-10-07 13:49:03 UTC
Git commit f27ab9c1051bd0a0bba6e79bc77899c74a7e6bf8 by Maik Qualmann.
Committed on 07/10/2018 at 13:47.
Pushed by mqualmann into branch 'master'.

add a global cache for grouped images
When we load the images into the Icon view,
we ask each time, whether there are grouped
images, with 30000 images in the view are that
also 30000 SQL query.
With this patch, the time to load a view with
many images is faster with MySQL 3x and with SQLite 2x.
Related: bug 391840, bug 398921, bug 397901

M  +24   -0    core/libs/database/coredb/coredb.cpp
M  +5    -0    core/libs/database/coredb/coredb.h
M  +1    -10   core/libs/database/item/imageinfo.cpp
M  +19   -2    core/libs/database/item/imageinfocache.cpp
M  +7    -0    core/libs/database/item/imageinfocache.h
M  +0    -3    core/libs/database/item/imageinfodata.h

https://commits.kde.org/digikam/f27ab9c1051bd0a0bba6e79bc77899c74a7e6bf8
Comment 46 caulier.gilles 2020-07-31 11:26:09 UTC
digiKam 7.0.0 stable release is now published:

https://www.digikam.org/news/2020-07-19-7.0.0_release_announcement/

We need a fresh feedback on this file using this version.

Best Regards

Gilles Caulier
Comment 47 Rafael Linux User 2020-08-02 08:47:04 UTC
Well, I changed scenary since I reported this bug:
- MySQL (internal) DB
- 65k pictures in a folder

Curiosly, when I sort by size, is near instantly, but when I sort by name, it takes about 30 seconds. I understand it could be cause one field is numerical and the other is a string, but if it is indexed, it should be instantly too.
Comment 48 Maik Qualmann 2020-08-02 09:23:32 UTC
The sorting does not take place in the database, but in the Qt Item model. A string is slower when comparing. This is normal, especially if the differences only appear at the end, e.g. with long path names. Keep in mind that all 65,000 strings are compared several times until the correct order is established. For me, it takes around 6 seconds to rearrange for 60,000 items, for an already much older computer. The QColator class that carries out the sorting offers the possibility to create a key beforehand, then the sorting is as fast as with the date. I have already implemented it as a test. We only gain time if the user would change the view with many items more often. The first time you open a large view, there are no advantages.

In your bug description, do you write that a folder of 50 images takes 4 minutes?

Maik

Maik
Comment 49 Rafael Linux User 2020-08-02 11:11:31 UTC
(In reply to Maik Qualmann from comment #48)
> The sorting does not take place in the database, but in the Qt Item model. A
> string is slower when comparing. This is normal, especially if the
> differences only appear at the end, e.g. with long path names. Keep in mind
> that all 65,000 strings are compared several times until the correct order
> is established. For me, it takes around 6 seconds to rearrange for 60,000
> items, for an already much older computer. The QColator class that carries
> out the sorting offers the possibility to create a key beforehand, then the
> sorting is as fast as with the date. I have already implemented it as a
> test. We only gain time if the user would change the view with many items
> more often. The first time you open a large view, there are no advantages.

Well, the story is larger, but I'll resume. The "real" user is who has a +50K photos with thousands of folders and subfolders and despite he has a high-level computer, he was suffering exactly the issue (even using an SSD for the stored database).

Cause I have no access nowadays to his computer, I created a (new) script that creates a folder with 3 subfolders with 65k jpg images (with distinct content and resolution) each one (I'll share the script, when I add more subfolders to try to get it to be a more real scenario). And in this scenario, it takes near 30 seconds to order by name an album elements (in a 4 years old PC). As I said, it should not take much more time than when I sort by size. 

> > In your bug description, do you write that a folder of 50 images takes 4
> minutes?
> 

Yes, in the real PC, when I notified the bug, that was the real time elapsed.

> Maik
> 
> Maik
Comment 50 Rafael Linux User 2020-08-06 12:25:18 UTC
Created attachment 130679 [details]
New script (2020) to create 4 folders with 10k distinct images each one

This script try to create an album similar to real life folders/images. Creates 1 folder with 4 subfolders (level 1) with 4 subfolders (level 2) each one. Inside each one of this folders in level 2 are created 2500 jpg images with distinct resolution and content. I made it cause it's very useful to check some issues like this related when sorting by filename.
Comment 51 Maik Qualmann 2020-10-10 20:24:25 UTC
Git commit d63e171bec0910f036bb3c2b261ab3333af110ee by Maik Qualmann.
Committed on 10/10/2020 at 20:22.
Pushed by mqualmann into branch 'master'.

add experimental QCollatorSortKey cache for fast string sorting
Add quick cache comparison to item and album sorting.
Changing a view with about 30,000 items when sorting
by name or path previously took about 22 seconds.
Now about 2-3 seconds. We will observe how the
memory consumption develops.
Related: bug 368468

M  +4    -8    core/libs/database/models/itemsortsettings.h
M  +3    -6    core/libs/models/albumfiltermodel.cpp
M  +96   -12   core/libs/threadimageio/fileio/loadingcache.cpp
M  +8    -0    core/libs/threadimageio/fileio/loadingcache.h

https://invent.kde.org/graphics/digikam/commit/d63e171bec0910f036bb3c2b261ab3333af110ee
Comment 52 Maik Qualmann 2020-10-15 20:15:30 UTC
I close the bug now. With the new item sorter cache there are no problems sorting many items by file name or path.

Maik
Comment 53 Rafael Linux User 2020-10-19 17:12:51 UTC
With Digikam version will be patched?

Thank you