Bug 431951

Summary: UI lockup for upwards of 15 seconds while just scrolling album views and selecting images
Product: [Applications] digikam Reporter: griffiths_andy
Component: Database-SqliteAssignee: Digikam Developers <digikam-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: caulier.gilles, metzpinguin
Priority: NOR    
Version: 7.2.0   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed In: 7.2.0
Sentry Crash Report:
Attachments: Backtraces from digikam lockups
Backtrace
Timestamped log showing UI lockup delay during possible mutex contention

Description griffiths_andy 2021-01-22 18:51:46 UTC
SUMMARY
Whenever I am doing any work on my database, occasionally the UI locks up for upwards of 15 seconds for no apparent reason. During the lockup I can't scroll anything in the album view (also applies in other views eg tags).

I have run digikam with the message debugging on, and the only thing I can correlate this with messages from digikam.general

digikam.general: Action Thread run  1  new jobs      # UI locks up now
digikam.general: One job is done
digikam.general: Cancel Main Thread                  # UI recovers

There's no indication of what that job is doing, but for it's duration I am locked out. I'm not doing anything special or out of the ordinary when this happens. I could be scrolling up and down in the album, or selecting images.

Is there any extra debug output I can enable? All other logging eg. thumbnails, detection, or SQL statments appear as normal.

I'm running a self compiled digikam build 7.2 from Arch AUR. The revision is https://invent.kde.org/graphics/digikam/commit/fd85e5002c0a0dc24a4583c67c3f157b55d35d29

I'm using the beta because I was keen to see if this issue had been resolved with newer revisions, but I find it's the same issue as I've had since 7.0.

This issue seems to have plagued me for quite some time in 7.0 onwards. I have a large database of images (300,000+) on MySQL backend, db is on SSD. Most of the images live on USB3 attached disk, but this problem affects images also in the internal SSD.

STEPS TO REPRODUCE
1. QT_LOGGING_RULES="digikam.*=true" digikam
2. Use digikam while observing the console
3. Note that lockup occurs on 'digikam.general: Action Thread run  1  new jobs' 
4. Note that lockup frees on 'digikam.general: Cancel Main Thread'

OBSERVED RESULT
UI locks up occasionally for a long time

EXPECTED RESULT
UI should not lockup

SOFTWARE/OS VERSIONS
Windows: 
macOS:
Linux/KDE Plasma: ArchLinux
(available in About System)
KDE Plasma Version: 5.20.5
KDE Frameworks Version: 5.78.0
Qt Version: 5.15.2

ADDITIONAL INFORMATION
Comment 1 griffiths_andy 2021-01-22 18:58:37 UTC
Forgot to mention. The digikam process hits 100% CPU while this is happening. The sub threads (QT processes, mysqld) appear to be unaffected.
Comment 2 caulier.gilles 2021-01-22 19:08:04 UTC
Can you reproduce this problem with 7.2.0 RC AppImage bundle for linux ?

https://files.kde.org/digikam/

Gilles Caulier
Comment 3 griffiths_andy 2021-01-22 19:13:45 UTC
I'll try, but I always have issues running the appimage, and have to mess around with eg.

QT_XCB_GL_INTEGRATION=none Downloads/digikam-7.2.0-beta1-x86-64.appimage

I only found that workaround after a long investigation.

I'll try again with the 7.2.0 RC
Comment 4 griffiths_andy 2021-01-22 19:27:28 UTC
QT_LOGGING_RULES="digikam.*=true" QT_XCB_GL_INTEGRATION=none Downloads/digiKam-7.2.0-rc-20210122T180600-x86-64.appimage

Still happens occasionally but the delay is much shorter, maybe only 2 or 3 seconds. It's a lot less frequent.

All the versions I have had a problem with in the past have either been Arch official package releases, or AUR compiles. So whatever is different is in the Arch libs I suppose.
Comment 5 Maik Qualmann 2021-01-22 19:59:18 UTC
The problem should only occur if the tags view in the right sidebar is open and many images are selected. In order to get the tag status (selected, partially selected, etc.) the tags and metadata of all images must be read. This is a sequential process for MySQL that is quite slow. I don't see any way to improve it at the moment. In frequent queries, MySQL is significantly slower as than SQLite on an SSD.

Maik
Comment 6 Maik Qualmann 2021-01-22 20:07:47 UTC
By the way, the text fields for title and description are the main problems. To get the status whether all have the same or different content, all must be read. If you have empty content here, it will take a particularly long time.

Maik
Comment 7 griffiths_andy 2021-01-22 20:27:29 UTC
(In reply to Maik Qualmann from comment #5)
> The problem should only occur if the tags view in the right sidebar is open
> and many images are selected. In order to get the tag status (selected,
> partially selected, etc.) the tags and metadata of all images must be read.
> This is a sequential process for MySQL that is quite slow. I don't see any
> way to improve it at the moment. In frequent queries, MySQL is significantly
> slower as than SQLite on an SSD.
> 
> Maik

That's not what I experience. It happens also when file properties is open, and when just scrolling through images, not necessarily selecting them, so there is no change in file properties (or tags if that view is open). I could understand that the query process for retrieving tags is time consuming, but that is not the only trigger.

Would you suggest I do a comparison with an SQLite database (if so I can do a conversion using the migration tool)? I'm on MySQL due to the recommendations re number of images.
Comment 8 griffiths_andy 2021-01-22 20:29:11 UTC
(In reply to Maik Qualmann from comment #6)
> By the way, the text fields for title and description are the main problems.
> To get the status whether all have the same or different content, all must
> be read. If you have empty content here, it will take a particularly long
> time.
> 
> Maik

Maik,
Are you commenting on the correct bug? Your reply seems to be unrelated to what I'm talking about.
Comment 9 Maik Qualmann 2021-01-22 21:37:49 UTC
Yes, I'm in the right thread ((:-))
Problems with scrolling are not known and I cannot reproduce them here with MySQL. Run digiKam in the GDB, stop it if the problem occurs, copy the backtrace and continue digiKam. Do this a few times. Then maybe we can assess what digiKam is doing.

Maik
Comment 10 griffiths_andy 2021-01-22 21:50:43 UTC
(In reply to Maik Qualmann from comment #9)
> Yes, I'm in the right thread ((:-))
> Problems with scrolling are not known and I cannot reproduce them here with
> MySQL. Run digiKam in the GDB, stop it if the problem occurs, copy the
> backtrace and continue digiKam. Do this a few times. Then maybe we can
> assess what digiKam is doing.
> 
> Maik

Ah Ok, just thought I'd check!

Ok, I'll look at producing the backtraces. I'll probably end up doing it a few times with different setups, all properly documented of course.

eg.
my build using MySQL
my build using SQLite
appimage using MySQL
appimage using SQLite

I'll probably update my build to the same rev as the 7.2 RC for consistency.

I'm currently converting a copy of my MySQL db to SQLite, as it's been a very long time since I've used digikam with SQLite and I'd like to compare performance anyway.

I'll report back...
Comment 11 griffiths_andy 2021-01-22 21:54:23 UTC
(In reply to griffiths_andy from comment #10)
> (In reply to Maik Qualmann from comment #9)
> > Yes, I'm in the right thread ((:-))
> > Problems with scrolling are not known and I cannot reproduce them here with
> > MySQL. Run digiKam in the GDB, stop it if the problem occurs, copy the
> > backtrace and continue digiKam. Do this a few times. Then maybe we can
> > assess what digiKam is doing.
> > 
> > Maik
> 
> Ah Ok, just thought I'd check!
> 
> Ok, I'll look at producing the backtraces. I'll probably end up doing it a
> few times with different setups, all properly documented of course.
> 
> eg.
> my build using MySQL
> my build using SQLite
> appimage using MySQL
> appimage using SQLite
> 
> I'll probably update my build to the same rev as the 7.2 RC for consistency.
> 
> I'm currently converting a copy of my MySQL db to SQLite, as it's been a
> very long time since I've used digikam with SQLite and I'd like to compare
> performance anyway.
> 
> I'll report back...

Hmm maybe not do the backtraces for the appimage..
Comment 12 Maik Qualmann 2021-01-22 22:02:15 UTC
The AppImage isn't that well suited for this. You cannot run it in an external GDB. You have to download the debug version of the AppImage and start it with the debug option because it has an internal GDB.

Maik
Comment 13 Maik Qualmann 2021-01-22 22:04:33 UTC
Another question, does the number of images in the icon view make any difference? How many images are we assuming in the view?

Maik
Comment 14 griffiths_andy 2021-01-22 22:10:41 UTC
The number of images varies with album. It could be 10 to 1000. I haven't noticed any correlation between the number vs. the likelihood or duration of the hang, but I'll look out for it.

Noted re. debug appimage. If I need to I'll try that one also.
Comment 15 caulier.gilles 2021-01-23 02:37:14 UTC
To Maik, comment #12:

debug appimage version do not have an internal GDB inside. the "debug" argument run the internal main executable with the GDB from the host system. See the main internal bash script used internally to run AppImage binary:

https://invent.kde.org/graphics/digikam/-/blob/master/project/bundles/appimage/data/AppRun#L127

The difference between the standard appimage bundle and the debug version is the debug symbols dropped or not from all binaries (.so, .a, executable files, etc.). This id done at last stage of bundle compilation where the bash script parse all binaries one by one to strip debug symbols.

https://invent.kde.org/graphics/digikam/-/blob/master/project/bundles/appimage/04-build-appimage.sh#L402

Typically both version of appimage bundles are the same excepted the strip debug stage, which reduce by 2 the file size...

Gilles
Comment 16 griffiths_andy 2021-01-23 12:27:27 UTC
Well, I can say with a small amount of confidence that I think my issue is related to my MySQL db.

I have completed the MySQL to SQLite conversion overnight, and I'm now trying it out in both my own compiled binary and in the appimage. Performance is now very good in both. There are no hangups while scrolling or selecting.

I have a total of 402,000 files in the db which gave a 10GB thumbnail table and 620MB ImageHaarMatrix table in the old MySQL db. Total image size is probably 4TB+. Obviously I haven't got a comparison yet in the SQLite db until the thumbnail db is populated, and I've yet to try face recognition again.

Would it be safe to say that my old db was severely underperforming and full of redundant data (despite regular db maintenance)? I've been using it since 6.4 IIRC.

If I get the time I'm going to try a MySQL to MySQL conversion and see whether there is any comparison in table sizes over time.

For the moment though, I'll continue with the new db and see whether running face recognition (and retagging ofc unfortunately as that was lost) has any detrimental effect. I don't think I need to do any tests for GDB backtraces.

Gilles & Maik, Thanks for your help and guidance so far!
Comment 17 griffiths_andy 2021-01-23 16:12:29 UTC
I'm getting lockups again, so I'll start doing the backtraces. So far it looks like Mutex locking in Digikam::CoreDbAccess::CoreDbAccess(), but I'll post proper logs later.
Comment 18 griffiths_andy 2021-01-23 16:36:05 UTC
Created attachment 135103 [details]
Backtraces from digikam lockups

Backtraces as requested.

A range of actions were the cause: scrolling, selecting, renaming, deleting.
Comment 19 griffiths_andy 2021-01-23 17:20:43 UTC
I noticed that my binary was built without symbols, so I'm going to recompile and retest.
Comment 20 Maik Qualmann 2021-01-23 17:41:08 UTC
Debug symbols are not necessary. I also know ao where the breakpoints are in the code. I have no idea why you are having the problem with MySQL. The CoreDbAccess() lock is wanted. The database is being accessed in another thread. So certain database accesses must take a long time before a response is received.

Maik
Comment 21 griffiths_andy 2021-01-23 17:52:34 UTC
I'm on SQLite now, as I mentioned above. Still experiencing the same issue occasionally.
Comment 22 Maik Qualmann 2021-01-23 21:52:57 UTC
Git commit e9b438fdac100a7f0ea85f8078aec4080b47468a by Maik Qualmann.
Committed on 23/01/2021 at 21:51.
Pushed by mqualmann into branch 'master'.

use item count from the item model
SELECT COUNT() can be very slow for many items in the
table. And we call it up very often when selecting items.

M  +1    -4    core/app/main/digikamapp.cpp
M  +1    -0    core/app/views/stack/itemiconview.h
M  +5    -0    core/app/views/stack/itemiconview_items.cpp
M  +0    -15   core/libs/database/coredb/coredb.cpp
M  +0    -6    core/libs/database/coredb/coredb.h
M  +5    -0    core/libs/database/models/itemmodel.cpp
M  +1    -0    core/libs/database/models/itemmodel.h

https://invent.kde.org/graphics/digikam/commit/e9b438fdac100a7f0ea85f8078aec4080b47468a
Comment 23 griffiths_andy 2021-01-23 23:12:18 UTC
Thanks. Compiling now, I'll report back...
Comment 24 griffiths_andy 2021-01-23 23:44:25 UTC
Created attachment 135120 [details]
Backtrace
Comment 25 griffiths_andy 2021-01-23 23:47:19 UTC
Maik,
It still does it. In the latest backtrace, I Ctrl-C as soon as I feel its locked up, and then 'bt' and leisurely read the output. Once I continue execution it still hangs for a number of seconds before the 'digikam.general: One job is done' message.
Comment 26 griffiths_andy 2021-02-07 23:26:00 UTC
I'm trying to accumulate a bit more information on this issue for you. Currently I'm running https://invent.kde.org/graphics/digikam/commit/4c6b84b15cd6111aa9d3dfb430635a754f40ccba directly compiled as per your standard instructions (not via Arch AUR). I'm using SQLite.

Given that the potential issue (still occurring) is happening around database access I wanted to pin down which SQL statements are active at the time I experience the hangs. I expected to see the SQL statements appearing in logs when QT_LOGGING_RULES="digikam.*=true", but they do not. Everything else is logged but not SQL.

eg. in core/libs/database/coredb/coredb.cpp

    QMap<QString, int>  map;

    QString queryString = QString::fromUtf8("SELECT COUNT(*), II.format "
                                            "FROM ImageInformation AS II "
                                            "INNER JOIN Images ON II.imageid=Images.id "
                                            " WHERE Images.status=1 ");

    if (category != DatabaseItem::UndefinedCategory)
    {
        queryString.append(QString::fromUtf8("AND Images.category=%1 ").arg(category));
    }

    queryString.append(QString::fromUtf8("GROUP BY II.format;"));
    qCDebug(DIGIKAM_DATABASE_LOG) << queryString;

etc.....

I expected with the logging level set as so, that all would be logged.
Comment 27 griffiths_andy 2021-02-07 23:35:29 UTC
To clarify.. I do see some digikam.database output when logging, just not the SQL.
Comment 28 griffiths_andy 2021-02-07 23:57:54 UTC
Hmm.. Have I just picked one of the very few places SQL is logged in that file? Is it not performed generally? I picked 

QMap<QString, int> CoreDB::getFormatStatistics(DatabaseItem::Category category) const

but I see that lots of other methods don't log.

Sorry for the misdirection if that's the case, but I think it would be a great help if SQL could be logged at the point the database call is actually performed.
Comment 29 Maik Qualmann 2021-02-08 08:54:55 UTC
If you can scroll the view, i.e. the items became visible. A large part of the database is already in the ItemInfo cache. So things like date, file size and much more and are not reloaded from the database again. There are very few things that are still loaded when scrolling when they are needed, e.g. the tags, but are then also in the cache. The thumbnails are also reloaded when they are needed, but you don't seem to have a thumbnail lock. But there are no database queries in digiKam that would take 15 seconds, even the complex SQL queries of an advanced search do not take a second. I suspect some other or external problem here.

Maik
Comment 30 griffiths_andy 2021-02-08 14:11:13 UTC
Agreed, the fundamental problem remains that it's having trouble for whatever reason waiting to lock a mutex before doing some operation and this locks up the whole UI. Logging the SQL was a roundabout way of trying to establish what its doing at that point. I don't have issues with any other application at the same time, there's plenty of memory, so no swap issues, no IO waits.

I did a test with QT_LOGGING_RULES="qt.*=true", and there were still plenty of Qt messages when it locked up while still moving the mouse around, *BUT* they appeared to be:

qt.qpa.events.reader: [heap]  202
qt.qpa.events.reader: [heap]  203
qt.qpa.events.reader: [heap]  204
qt.qpa.events.reader: [heap]  205
qt.qpa.events.reader: [heap]  206
qt.qpa.events.reader: [heap]  207
qt.qpa.events.reader: [heap]  208
qt.qpa.events.reader: [heap]  209
qt.qpa.events.reader: [heap]  210

etc.. which feels like the events were getting queued up. Could my issue be more Qt related?

I appreciate that you can't replicate this issue so I just need a bit of guidance so I can provide you with enough info to sort this out. I don't have any other hardware / OS combination on which to test so I have to work with what I've got.
Comment 31 griffiths_andy 2021-02-17 23:09:11 UTC
Created attachment 135802 [details]
Timestamped log showing UI lockup delay during possible mutex contention

More debugging info.

Build http://commits.kde.org/digikam/46fd680f2328c7ff3c55a0b538271c326c781063

Invoke digikam thru ts (timestamp input)

$ gdb digikam | ts | tee digikam_timestamped_log.txt

With this run I have attempted to timestamp the gdb output in order to accurately work out the delay during which the UI is unresponsive (accounting for the time in GDB examining the backtraces and threads.

In every case where I have performed this I have found that the UI is locked while the main thread and the "DBJobsThread" are contending for a mutex (to my eye...)

Overall there is a delay of 14 seconds, which occurs randomly, and usually at the most inconvenient point.

My comments in the attached file are prefixed **
Comment 32 Maik Qualmann 2021-02-18 06:50:10 UTC
We are at the mutex where the item view wants to read the information for the geo-icon in the thumbnail.
The mutex on which the thread is waiting has nothing to do with the CoreDbAccess mutex. I'll add a debug timer for how long each CoreDbAccess mutex is taking.

Maik
Comment 33 Maik Qualmann 2021-02-18 07:06:15 UTC
Git commit 9bed4e982ab4b2f294e5e5f9e106d55baa08a6f6 by Maik Qualmann.
Committed on 18/02/2021 at 07:05.
Pushed by mqualmann into branch 'master'.

add test debug for CoreDbAccess lock time

M  +10   -0    core/libs/database/coredb/coredbaccess.cpp

https://invent.kde.org/graphics/digikam/commit/9bed4e982ab4b2f294e5e5f9e106d55baa08a6f6
Comment 34 Maik Qualmann 2021-02-18 07:09:11 UTC
A message is output if the lock time is longer than 10 ms. First of all, we check whether a CoreDbAccess lock takes a long time.

Maik
Comment 35 Maik Qualmann 2021-02-18 07:13:41 UTC
Git commit c958341c0f442cf7bdc2e8adb35cc2f329bcd030 by Maik Qualmann.
Committed on 18/02/2021 at 07:13.
Pushed by mqualmann into branch 'master'.

restart timer after lock

M  +2    -2    core/libs/database/coredb/coredbaccess.cpp

https://invent.kde.org/graphics/digikam/commit/c958341c0f442cf7bdc2e8adb35cc2f329bcd030
Comment 36 caulier.gilles 2021-02-18 07:28:20 UTC
Andy,

The new AppImage Linux bundle including last changes from Maik will be available today late this evening at usual place:

https://files.kde.org/digikam/

Gilles Caulier
Comment 37 griffiths_andy 2021-02-18 10:41:47 UTC
Thanks...

I've compiled from source as usual, http://commits.kde.org/digikam/54974097ef336dc9ba1edaa90bbade42529ab084

Although I've run in GDB again I have not paused at the delay, as that affects the timer you've just coded in.

Instead I paused after the UI became responsive again:

digikam.general: Using  4  CPU core to run threads
[New Thread 0x7fff377fe640 (LWP 1026906)]
digikam.general: Action Thread run  1  new jobs
[New Thread 0x7fff37fff640 (LWP 1026907)]
digikam.coredb: CoreDbAccess lock time: 12870 ms
digikam.general: One job is done
[Thread 0x7fff377fe640 (LWP 1026906) exited]
digikam.general: Cancel Main Thread
[Thread 0x7fff37fff640 (LWP 1026907) exited]
^C
Thread 1 "digikam" received signal SIGINT, Interrupt.
0x00007ffff3ce946f in poll () from /usr/lib/libc.so.6
(gdb) Quit
(gdb) c
Continuing.
Comment 38 Maik Qualmann 2021-02-18 11:40:32 UTC
Ok, almost 13 seconds for a database query is a problem. However, I suspect the problem outside of digiKam. Let's see which query it concerns, we crash digikam at 10 seconds.

Maik
Comment 39 Maik Qualmann 2021-02-18 11:42:41 UTC
Git commit f9a9d695c5a499291581a59dc1163b7c752a0ea1 by Maik Qualmann.
Committed on 18/02/2021 at 11:41.
Pushed by mqualmann into branch 'master'.

crash at 10 seconds lock time

M  +2    -0    core/libs/database/coredb/coredbaccess.cpp

https://invent.kde.org/graphics/digikam/commit/f9a9d695c5a499291581a59dc1163b7c752a0ea1
Comment 40 griffiths_andy 2021-02-18 12:02:41 UTC
Heh.. this was straight away during program startup. Not even displaying main UI yet, just the splash.

digikam.general: Action Thread run  1  new jobs
[New Thread 0x7fff31ffb640 (LWP 1043703)]
digikam.coredb: CoreDbAccess lock time: 12022 ms
ASSERT: "d->timer.elapsed() < 10000" in file /home/andyg/projects/digikam/core/libs/database/coredb/coredbaccess.cpp, line 143
--Type <RET> for more, q to quit, c to continue without paging--

Thread 45 "Thread (pooled)" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff31ffb640 (LWP 1043703)]
0x00007ffff3c31615 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff3c31615 in raise () at /usr/lib/libc.so.6
#1  0x00007ffff3c1a862 in abort () at /usr/lib/libc.so.6
#2  0x00007ffff41ef9ac in  () at /usr/lib/libQt5Core.so.5
#3  0x00007ffff41eed59 in qt_assert_x(char const*, char const*, char const*, int) () at /usr/lib/libQt5Core.so.5
#4  0x00007ffff70931a1 in Digikam::CoreDbAccess::~CoreDbAccess() (this=0x7fff31ffa748, __in_chrg=<optimized out>)
    at /home/andyg/projects/digikam/core/libs/database/coredb/coredbaccess.cpp:143
#5  0x00007ffff70adfbe in Digikam::DatesJob::run() (this=0x555556b2c610) at /home/andyg/projects/digikam/core/libs/database/dbjobs/dbjob.cpp:102
#6  0x00007ffff4230302 in  () at /usr/lib/libQt5Core.so.5
#7  0x00007ffff422cf0f in  () at /usr/lib/libQt5Core.so.5
#8  0x00007ffff3dc83e9 in start_thread () at /usr/lib/libpthread.so.0
#9  0x00007ffff3cf4293 in clone () at /usr/lib/libc.so.6
Comment 41 griffiths_andy 2021-02-18 12:12:17 UTC
A second try. Crashes at the same place. Digikam::DatesJob::run()

[New Thread 0x7fff31ffb640 (LWP 1045062)]
digikam.coredb: CoreDbAccess lock time: 11949 ms
ASSERT: "d->timer.elapsed() < 10000" in file /home/andyg/projects/digikam/core/libs/database/coredb/coredbaccess.cpp, line 143
--Type <RET> for more, q to quit, c to continue without paging--

Thread 45 "Thread (pooled)" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff31ffb640 (LWP 1045062)]
0x00007ffff3c31615 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff3c31615 in raise () at /usr/lib/libc.so.6
#1  0x00007ffff3c1a862 in abort () at /usr/lib/libc.so.6
#2  0x00007ffff41ef9ac in  () at /usr/lib/libQt5Core.so.5
#3  0x00007ffff41eed59 in qt_assert_x(char const*, char const*, char const*, int) () at /usr/lib/libQt5Core.so.5
#4  0x00007ffff70931a1 in Digikam::CoreDbAccess::~CoreDbAccess() (this=0x7fff31ffa748, __in_chrg=<optimized out>)
    at /home/andyg/projects/digikam/core/libs/database/coredb/coredbaccess.cpp:143
#5  0x00007ffff70adfbe in Digikam::DatesJob::run() (this=0x555556c9b6f0) at /home/andyg/projects/digikam/core/libs/database/dbjobs/dbjob.cpp:102
#6  0x00007ffff4230302 in  () at /usr/lib/libQt5Core.so.5
#7  0x00007ffff422cf0f in  () at /usr/lib/libQt5Core.so.5
#8  0x00007ffff3dc83e9 in start_thread () at /usr/lib/libpthread.so.0
#9  0x00007ffff3cf4293 in clone () at /usr/lib/libc.so.6
Comment 42 Maik Qualmann 2021-02-18 13:20:49 UTC
We Are Here:

https://invent.kde.org/graphics/digikam/-/blob/master/core/libs/database/dbjobs/dbjob.cpp#L102

It is not normal for this query to take 10 seconds. maybe you could increase the 10000 times in the Q_Assert to 20000.

/home/andyg/projects/digikam/core/libs/database/coredb/coredbaccess.cpp, line 143

Maik
Comment 43 Maik Qualmann 2021-02-18 13:31:00 UTC
On the other hand, you could install a DB Browser for SQLite, open the digikam4.db file in the DB Browser and execute the following query:

SELECT creationDate FROM ImageInformation INNER JOIN Images ON Images.id=ImageInformation.imageid WHERE Images.status=1;

Maik
Comment 44 griffiths_andy 2021-02-18 14:04:05 UTC
Done, and it gets over that initial ASSERT crash. 

But now it doesn't ASSERT crash where we want it to, as the timer is still < 20000ms. So we need a way of not crashing where we expect the query to take a long time. 

I reduced it to 14000, to maybe find a point at which the initial query succeeds, but subsequent ones fail if >= 14000. However, even the UI hang timers are still < 14000, at roughly 12000ms. As the times look fairly consistent, maybe it's the same query?

So I went back to looking at the backtraces again by interrupting while there is the delay.

  204  Thread 0x7fff307f8640 (LWP 1073656) "Thread (pooled)" 0x00007ffff3ce425a in __xstat64 () from /usr/lib/libc.so.6
(gdb) thread 204
[Switching to thread 204 (Thread 0x7fff307f8640 (LWP 1073656))]
#0  0x00007ffff3ce425a in __xstat64 () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff3ce425a in __xstat64 () at /usr/lib/libc.so.6
#1  0x00007ffff3cb1e91 in __tzfile_read () at /usr/lib/libc.so.6
#2  0x00007ffff3cb1a3d in tzset_internal () at /usr/lib/libc.so.6
#3  0x00007ffff3cb1b4a in tzset () at /usr/lib/libc.so.6
#4  0x00007ffff3cb085d in timelocal () at /usr/lib/libc.so.6
#5  0x00007ffff4220ed0 in qMkTime(tm*) () at /usr/lib/libQt5Core.so.5
#6  0x00007ffff42f696e in  () at /usr/lib/libQt5Core.so.5
#7  0x00007ffff42f7008 in  () at /usr/lib/libQt5Core.so.5
#8  0x00007ffff42f8706 in QDateTime::toMSecsSinceEpoch() const () at /usr/lib/libQt5Core.so.5
#9  0x00007ffff42fbb11 in QDateTime::operator<(QDateTime const&) const () at /usr/lib/libQt5Core.so.5
#10 0x00007ffff706897d in qMapLessThanKey<QDateTime>(QDateTime const&, QDateTime const&) (key1=..., key2=...) at /usr/include/qt/QtCore/qmap.h:68
#11 0x00007ffff706d020 in QMapNode<QDateTime, int>::lowerBound(QDateTime const&) (this=0x7fff82fa70e0, akey=...)
    at /usr/include/qt/QtCore/qmap.h:152
#12 0x00007ffff70688d6 in QMapData<QDateTime, int>::findNode(QDateTime const&) const (this=0x7fff8000cb10, akey=...)
    at /usr/include/qt/QtCore/qmap.h:284
#13 0x00007ffff706472b in QMap<QDateTime, int>::find(QDateTime const&) (this=0x7fff307f7750, akey=...) at /usr/include/qt/QtCore/qmap.h:862
#14 0x00007ffff7056866 in Digikam::CoreDB::getAllCreationDatesAndNumberOfImages() const (this=0x555555ad7540)
    at /home/andyg/projects/digikam/core/libs/database/coredb/coredb.cpp:3268
#15 0x00007ffff70adfb2 in Digikam::DatesJob::run() (this=0x7fff80066fa0) at /home/andyg/projects/digikam/core/libs/database/dbjobs/dbjob.cpp:102
#16 0x00007ffff4230302 in  () at /usr/lib/libQt5Core.so.5
#17 0x00007ffff422cf0f in  () at /usr/lib/libQt5Core.so.5
#18 0x00007ffff3dc83e9 in start_thread () at /usr/lib/libpthread.so.0
#19 0x00007ffff3cf4293 in clone () at /usr/lib/libc.so.6
(gdb) 

Again we see Digikam::CoreDB::getAllCreationDatesAndNumberOfImages(), called from Digikam::DatesJob::run()

Maybe this query is taking too long whenever it is run, regardless of where it's called in the code.
Comment 45 griffiths_andy 2021-02-18 14:04:39 UTC
(In reply to Maik Qualmann from comment #43)
> On the other hand, you could install a DB Browser for SQLite, open the
> digikam4.db file in the DB Browser and execute the following query:
> 
> SELECT creationDate FROM ImageInformation INNER JOIN Images ON
> Images.id=ImageInformation.imageid WHERE Images.status=1;
> 
> Maik

Yes, that's my next move..
Comment 46 griffiths_andy 2021-02-18 14:15:03 UTC
andyg@arch ~/Pictures/digikam % sqlite3 digikam4.db
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> .timer on
sqlite> .mode csv
sqlite> .output datequery.csv
sqlite> SELECT creationDate FROM ImageInformation INNER JOIN Images ON Images.id=ImageInformation.imageid WHERE Images.status=1;
Run Time: real 0.174 user 0.136208 sys 0.037326
sqlite> .output stdout
sqlite> .q
andyg@arch ~/Pictures/digikam % wc -l datequery.csv
405890 datequery.csv


Query took hardly any time at all for 405890 results. Agreed, it shouldn't take 12 seconds in digikam.
Comment 47 griffiths_andy 2021-02-18 14:18:46 UTC
Maybe it's the post-processing in getAllCreationDatesAndNumberOfImages() to return datesStatMap that is taking the time?
Comment 48 Maik Qualmann 2021-02-18 15:08:52 UTC
The fact is, if you had such times for queries everywhere in digiKam, digiKam would be inoperable. So there must be something special about this query. There is an indication in the web of a QDateTime problem in current Qt versions that QDateTime is extremely slow. A large number of QDateTime objects are created in this function. I'm going to apply a new timer patch tonight.

Maik
Comment 49 griffiths_andy 2021-02-18 15:41:31 UTC
Thanks. I'll look forward to trying it.
Comment 50 griffiths_andy 2021-02-18 17:06:17 UTC
Just did my own rough & ready mod to log query and post-processing in core/libs/database/coredb/coredb.cpp:

QMap<QDateTime, int> CoreDB::getAllCreationDatesAndNumberOfImages() const
{
    qCDebug(DIGIKAM_COREDB_LOG) << "getAllCreationDatesAndNumberOfImages() query starts";
    QList<QVariant> values;

    d->db->execSql(QString::fromUtf8("SELECT creationDate FROM ImageInformation "
                                     "INNER JOIN Images ON Images.id=ImageInformation.imageid "
                                     " WHERE Images.status=1;"),
                   &values);

    qCDebug(DIGIKAM_COREDB_LOG) << "getAllCreationDatesAndNumberOfImages() query ends";
    QMap<QDateTime, int> datesStatMap;

    qCDebug(DIGIKAM_COREDB_LOG) << "getAllCreationDatesAndNumberOfImages() post-processing starts";
    foreach (const QVariant& value, values)
    {   
        if (!value.isNull())
        {   
            QDateTime dateTime = value.toDateTime();
            
            if (!dateTime.isValid())
            {
                continue;
            }

            QMap<QDateTime, int>::iterator it2 = datesStatMap.find(dateTime);

            if (it2 == datesStatMap.end())
            {
                datesStatMap.insert(dateTime, 1);
            }
            else
            {
                it2.value()++;
            }
        }
    }
    qCDebug(DIGIKAM_COREDB_LOG) << "getAllCreationDatesAndNumberOfImages() post-processing ends";
    return datesStatMap;
}


Then run thru ts to timestamp it:

andyg@arch ~/projects/digikam/build (git)-[master] % QT_LOGGING_RULES="digikam.core*=true" digikam 2>&1 | ts
Feb 18 17:03:46 QCommandLineParser: already having an option named "h"
Feb 18 17:03:46 QCommandLineParser: already having an option named "help-all"
Feb 18 17:03:46 QCommandLineParser: already having an option named "v"
Feb 18 17:03:46 digikam.coredb: Core database: running schema update
Feb 18 17:03:46 digikam.coredb: Core database: have a structure version  11
Feb 18 17:03:46 digikam.coredb: Core database: makeUpdates  11  to  11
Feb 18 17:03:50 [ALSOFT] (EE) Failed to set real-time priority for thread: Operation not permitted (1)
Feb 18 17:03:50 [ALSOFT] (EE) Failed to set real-time priority for thread: Operation not permitted (1)
Feb 18 17:03:50 kf.xmlgui: Unhandled container to remove :  Digikam::DigikamApp
Feb 18 17:03:50 digikam.coredb: CoreDbAccess lock time: 41 ms
Feb 18 17:03:53 digikam.coredb: CoreDbAccess lock time: 2119 ms
Feb 18 17:03:55 digikam.coredb: CoreDbAccess lock time: 676 ms
Feb 18 17:03:55 digikam.coredb: CoreDbAccess lock time: 66 ms
Feb 18 17:03:55 digikam.coredb: getAllCreationDatesAndNumberOfImages() query starts
Feb 18 17:03:55 digikam.coredb: getAllCreationDatesAndNumberOfImages() query ends
Feb 18 17:03:55 digikam.coredb: getAllCreationDatesAndNumberOfImages() post-processing starts
Feb 18 17:04:07 digikam.coredb: getAllCreationDatesAndNumberOfImages() post-processing ends
Feb 18 17:04:07 digikam.coredb: CoreDbAccess lock time: 12682 ms

Sub-second in the query, ~ 12 seconds in the QDateTime manipulation.
Comment 51 Maik Qualmann 2021-02-18 20:04:10 UTC
Ok, the problem was reported by KPhotoAlbum among others. Qt says it doesn't go faster, many things have to be calculated when comparing dates, daylight saving time, etc. Our problem is that we are using a QMap, so the QDateTime keys are compared again and again when inserted. A QHash is 10 times faster.

Maik
Comment 52 Maik Qualmann 2021-02-18 21:05:21 UTC
Git commit bec249c5dc8350b293c258fc767440139da3fcf0 by Maik Qualmann.
Committed on 18/02/2021 at 21:03.
Pushed by mqualmann into branch 'master'.

port QMap<QDateTime, int> to a QHash

M  +7    -7    core/app/date/timelinewidget.cpp
M  +1    -1    core/app/date/timelinewidget.h
M  +2    -2    core/app/views/sidebar/timelinesidebarwidget.cpp
M  +1    -1    core/libs/album/manager/albummanager.cpp
M  +2    -2    core/libs/album/manager/albummanager.h
M  +7    -7    core/libs/album/manager/albummanager_dalbum.cpp
M  +7    -6    core/libs/database/coredb/coredb.cpp
M  +2    -2    core/libs/database/coredb/coredb.h
M  +1    -1    core/libs/database/dbjobs/dbjob.cpp
M  +1    -1    core/libs/database/dbjobs/dbjob.h
M  +2    -2    core/libs/database/dbjobs/dbjobsthread.cpp
M  +1    -1    core/libs/database/dbjobs/dbjobsthread.h
M  +1    -1    core/libs/database/models/itemfiltersettings.h

https://invent.kde.org/graphics/digikam/commit/bec249c5dc8350b293c258fc767440139da3fcf0
Comment 53 Maik Qualmann 2021-02-18 21:11:39 UTC
Update to current git/master and please try to find out where it takes a long time in the item view when scrolling, or where it crashes after 10 seconds.

Maik
Comment 54 griffiths_andy 2021-02-18 21:40:40 UTC
Checking now..

BTW, I had to apply this change a while ago to get a working compile. Seems like a missing include.

diff --git a/core/tests/video/qtavcodecs.cpp b/core/tests/video/qtavcodecs.cpp
index 19b028de99..51d7ddcea0 100644
--- a/core/tests/video/qtavcodecs.cpp
+++ b/core/tests/video/qtavcodecs.cpp
@@ -22,6 +22,7 @@
  * ============================================================ */
 
 #include <QDebug>
+#include <QPainterPath>
 
 // QtAv includes
Comment 55 griffiths_andy 2021-02-18 21:50:35 UTC
Distinct improvement. I stashed and reapplied my local changes prior to compile. (Inc the patch in prev comment).

Feb 18 21:41:34 digikam.coredb: getAllCreationDatesAndNumberOfImages() query starts
Feb 18 21:41:34 digikam.coredb: getAllCreationDatesAndNumberOfImages() query ends
Feb 18 21:41:34 digikam.coredb: getAllCreationDatesAndNumberOfImages() post-processing starts
Feb 18 21:41:37 digikam.coredb: getAllCreationDatesAndNumberOfImages() post-processing ends
Feb 18 21:41:37 digikam.coredb: CoreDbAccess lock time: 3665 ms

I'll carry on testing and try to make it crash (response > 10000ms). Report back tomorrow.

Thanks very much. It seems like it might be a solution, although it still throws a 3sec delay occasionally.
Comment 56 Maik Qualmann 2021-02-19 17:59:24 UTC
Git commit c7bda4772d7731404ff0337000dabcbff253102b by Maik Qualmann.
Committed on 19/02/2021 at 17:58.
Pushed by mqualmann into branch 'master'.

calculate QDateTime outside of the DB lock
Related: bug 432257

M  +4    -36   core/libs/database/coredb/coredb.cpp
M  +2    -3    core/libs/database/coredb/coredb.h
M  +0    -12   core/libs/database/coredb/coredbaccess.cpp
M  +28   -2    core/libs/database/dbjobs/dbjob.cpp

https://invent.kde.org/graphics/digikam/commit/c7bda4772d7731404ff0337000dabcbff253102b
Comment 57 Maik Qualmann 2021-02-19 18:01:37 UTC
Please test if the problem with the git/master version is fixed.

Maik
Comment 58 griffiths_andy 2021-02-19 18:28:14 UTC
Compiled, and checking now.

I had to apply the patch I mentioned in comment 54. I also see I should have raised that as a separate report so I'll do so.

I'll report back later this evening with results, but tests of the QHash changes were going very well, even with the still present but much reduced delay. If the delay is gone by moving the loop outside the DB lock, then that will be very welcome.
Comment 59 griffiths_andy 2021-02-19 19:03:08 UTC
Hi Maik,

I'm happy. I have experienced no lockup delays while giving it a thorough going over... Seems that last change has really improved the performance due to moving the long running op outside the DB lock (I noticed you had removed the timer & assert).

Thank you for your efforts (and applying that other patch)! I think this can be closed at last.

Andy
Comment 60 Maik Qualmann 2021-02-19 19:06:07 UTC
Thank you for the feedback and for testing the changes.

Maik