Bug 372435 - Use multiple cpu cores for duplicate search [patch]
Summary: Use multiple cpu cores for duplicate search [patch]
Status: RESOLVED FIXED
Alias: None
Product: digikam
Classification: Applications
Component: Searches-Similarity (show other bugs)
Version: 5.3.0
Platform: Other Linux
: NOR wishlist
Target Milestone: ---
Assignee: Digikam Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-13 15:41 UTC by Simon
Modified: 2021-03-30 11:57 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In: 7.3.0


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Simon 2016-11-13 15:41:58 UTC
When searching a big amount of pictures for duplicates the search takes a lot of time. One cpu core is at 100% the whole time, the others are almost idle.
It would be great if all cpu cores were used to speed the search up.
Comment 1 Barbara Scheffner 2016-11-14 07:07:06 UTC
I support this request.
Comment 2 caulier.gilles 2016-11-14 11:36:44 UTC
Look here :

https://quickgit.kde.org/?p=digikam.git&a=blob&&f=utilities%2Fmaintenance%2Fduplicatesfinder.cpp

This is the maintenance tool used to process a duplicates search album, through the multithreaded DBJob interface.

Currently, multicore support is not set to the DBjob. This can be a simpler solution, if Haar algorithm from Database is enough re-entrant. I must admit that i never tested this case, and this is why multi core support is not enabled here.

Gilles Caulier
Comment 3 Christian David 2016-11-15 18:15:53 UTC
Git commit 64d8749496a04c3be88300f040ae1c1e14af8103 by Christian Dávid.
Committed on 15/11/2016 at 18:09.
Pushed by christiand into branch 'master'.

Fixed constructors in KBanking

Some class members were not initialized correctly what potentially
caused crashes.
FIXED-IN: 5.0

M  +11   -2    kmymoney/plugins/kbanking/mymoneybanking.cpp

http://commits.kde.org/kmymoney/64d8749496a04c3be88300f040ae1c1e14af8103
Comment 4 caulier.gilles 2016-11-15 18:17:38 UTC
comment #3 come from another world (:=)))
Comment 5 Christian David 2016-11-15 18:22:06 UTC
(In reply to caulier.gilles from comment #4)
> comment #3 come from another world (:=)))

I am sorry (you were fast reopening this bug)!
Comment 6 Ralf Habacker 2016-11-16 06:51:16 UTC
Git commit ee2e88ec3dc89d21ecfdadcb3131ea6bb4ad7c76 by Ralf Habacker.
Committed on 15/11/2016 at 22:54.
Pushed by habacker into branch '4.8'.

Fixed constructors in KBanking

Some class members were not initialized correctly what potentially
caused crashes.
FIXED-IN: 5.0
(cherry picked from commit 64d8749496a04c3be88300f040ae1c1e14af8103)

M  +11   -2    kmymoney/plugins/kbanking/mymoneybanking.cpp

http://commits.kde.org/kmymoney/ee2e88ec3dc89d21ecfdadcb3131ea6bb4ad7c76
Comment 7 Alister Troup 2017-06-28 12:51:17 UTC
This appears still to be an issue - at least on the windows version 5.6.0 x64

I'm running on a xeon 1231 v3 (4 cores plus HT) 16GB RAM, using MarinaDB server on local Host, SSD as OS/DatabaseData

Currently doing a "Find Duplicates" on a large collection 

Digikam.exe

Threads 254 
CPU 12 
Average CPU 12.42

Process Explorer 
msvcrt.dll!ftime64_s+0x180 thread is ~12.5 CPU

Stack capture

ntoskrnl.exe!memset+0x61a
ntoskrnl.exe!KeWaitForMultipleObjects+0xd52
ntoskrnl.exe!KeWaitForSingleObject+0x19f
ntoskrnl.exe!PoStartNextPowerIrp+0xbd0
ntoskrnl.exe!PoStartNextPowerIrp+0x186d
ntoskrnl.exe!PoStartNextPowerIrp+0x1ae7
libdigikamdatabase.dll!ZN7Digikam9HaarIface14calculateScoreERNS_4Haar13SignatureDataES3_RNS1_7WeightsEPPNS1_12SignatureMapE+0xd4
libdigikamdatabase.dll!ZN7Digikam9HaarIface14searchDatabaseEPNS_4Haar13SignatureDataENS0_10SketchTypeER5QListIiENS0_28DuplicatesSearchRestrictionsExi+0xb12
libdigikamdatabase.dll!ZN7Digikam9HaarIface24bestMatchesWithThresholdExPNS_4Haar13SignatureDataEddR5QListIiENS0_28DuplicatesSearchRestrictionsENS0_10SketchTypeE+0xd2
libdigikamdatabase.dll!ZN7Digikam9HaarIface32bestMatchesForImageWithThresholdExddR5QListIiENS0_28DuplicatesSearchRestrictionsENS0_10SketchTypeE+0x158
libdigikamdatabase.dll!ZN7Digikam9HaarIface14findDuplicatesERK4QSetIxEddNS0_28DuplicatesSearchRestrictionsEPNS_20HaarProgressObserverE+0x273
libdigikamdatabase.dll!ZN7Digikam9HaarIface29findDuplicatesInAlbumsAndTagsERK5QListIiES4_NS0_16AlbumTagRelationEddNS0_28DuplicatesSearchRestrictionsEPNS_20HaarProgressObserverE+0x351
libdigikamdatabase.dll!ZN7Digikam9HaarIface23rebuildDuplicatesAlbumsERK5QListIiES4_NS0_16AlbumTagRelationEddNS0_28DuplicatesSearchRestrictionsEPNS_20HaarProgressObserverE+0x67
libdigikamdatabase.dll!ZN7Digikam11SearchesJob3runEv+0x8f2
Qt5Core.dll!ZN11QThreadPool5clearEv+0xe5
Qt5Core.dll!ZN7QThread21setTerminationEnabledEb+0x14c
msvcrt.dll!srand+0x93
msvcrt.dll!ftime64_s+0x1dd
kernel32.dll!BaseThreadInitThunk+0xd
ntdll.dll!RtlUserThreadStart+0x21


Memory is 696 MB private Bytes

I/O is 380B average peaks to 26KB
Comment 8 caulier.gilles 2020-08-02 12:55:34 UTC
digiKam 7.0.0 stable release is now published:

https://www.digikam.org/news/2020-07-19-7.0.0_release_announcement/

We need a fresh feedback on this file using this version.

Best Regards

Gilles Caulier
Comment 9 Bruno Abinader 2021-03-28 21:16:13 UTC
Proposal in https://bugs.kde.org/show_bug.cgi?id=372435.
Comment 10 Bruno Abinader 2021-03-28 21:19:23 UTC
Proposal in https://bugs.kde.org/show_bug.cgi?id=372435.(In reply to Bruno Abinader from comment #9)
> Proposal in https://bugs.kde.org/show_bug.cgi?id=372435.

Sorry, wrong link: https://invent.kde.org/graphics/digikam/-/merge_requests/54
Comment 11 Maik Qualmann 2021-03-30 11:21:20 UTC
Git commit ba796dd744c9ca7e596fc97eb1dc45f22e4f30cd by Maik Qualmann, on behalf of Bruno de Oliveira Abinader.
Committed on 30/03/2021 at 10:00.
Pushed by mqualmann into branch 'master'.

Run HaarIface::findDuplicates lock-free in parallel

This works by splitting the duplicates finding logic in 4 major steps:

1. Resolve all image ids before starting the searches jobs during DuplicatesFinder::slotStart().
2. Create a shared HaarIface with signature cache in SearchesDBJobsThread to be used by all SearchesJob in parallel.
3. Break down the whole "images to scan" set into iterator ranges, and run these in parallel (lock-free).
4. Rebuild (or update) the search albums in the database.

Step 3) can be run lock-free in parallel with some adjustments e.g. because we're using constant iterator ranges, it is
not possible to remove unused images from the cache when running multithread. Also because we use ranges in step 3),
sometimes the same search album is generated multiple times in separate threads using different reference images; in
step 4) we ensure the aggregated results are filtered so there's only one search album with similar images per
duplicates found.

M  +26   -15   core/libs/database/dbjobs/dbjob.cpp
M  +10   -1    core/libs/database/dbjobs/dbjob.h
M  +74   -9    core/libs/database/dbjobs/dbjobsthread.cpp
M  +6    -2    core/libs/database/dbjobs/dbjobsthread.h
M  +18   -17   core/libs/database/haar/haariface.cpp
M  +8    -14   core/libs/database/haar/haariface.h

https://invent.kde.org/graphics/digikam/commit/ba796dd744c9ca7e596fc97eb1dc45f22e4f30cd