I added a bunch of folders that contain ~200,000 video files and hit refresh to scan them into the database. Digikam crashes after about 2-5 seconds. This is repeatable. Digikam will not add 200,000 video files.
Reproducible with 5.5.0pre release ?
Yes, same thing happens with 5.5.0pre release.
which solution can we apply to fix this entry :
1/ Disable autocompletion in tree search field. Report this problem to Qt team to open API of QCompleter in goal to use current private methods.
2/ re-use KCompletion to backport classes in digiKam core with ajusted API for digiKam
I think you mean Bug 368468. This bug here has a different cause, possibly crash in Exiv2.
To Bug 36846:
The QCompleter is not the performance problem. This is fixed by a QTimer. The main problem is the ever slower adding of items to the QTreeView.
An edit function for the first minutes after the comment would not be bad...
We need a debugger backtrace to investigate in details.
See this page for details :
In comment #4 you talk about a slower adding of items to the QTreeView.
Where is located the problem exactly ? Did you profile execution time with Valgrind ? In Digikam treeviewitem widget implementation ? In Digikam model populated by the DB ? In DB interface to get data to host in widget ? In Qt5 implementation ?
In my office i write a fast shared memory mapping viewer in Qt5 using QTreeview/item classes. I create item in treeview with no data, and i populate all items in a separated thread because it take a lot of time.
At end i call a treeview update in main thread (X11 is not re-entrant). It's very fast. The amount of item in treeview is very huge (more than 1000 entries).
Can we do the same in digiKam ?
Still running the 5.5.0pre
Okay so I went to the https://www.digikam.org/contrib and tryed a few things with limited success, I will try more tomorrow.
First, the gdb in windows, not working well. I type in 'catch throw', and get back 'Catchpoint 1 (throw)', seems good. Then I type in 'run' and get back:
No executable specified, use `target exec'.
Not sure what to do here??
Second thing I tried is the third party debug tool from system internals:
Looks like some bad stuff happening for about 10.2 seconds before it crashes:
00000009 1.02899146  digikam.general: Trying to load Embedded preview with libraw
00000010 1.02921200  digikam.rawengine: Failed to load embedded RAW preview
00000011 1.02923596  digikam.general: Trying to load half preview with libraw
00000012 1.02927971  digikam.general: Trying to load Embedded preview with Exiv2
00000013 1.04443121  digikam.dimg: "Removed file path and name" : QIMAGE file identified
00000014 1.04464126  digikam.dimg.qimage: Can not load " "Removed file path and name" " using DImg::QImageLoader!
00000015 1.04492271  digikam.general: mimetype = "" ext = "MOV"
00000016 1.04507148  digikam.general: Cannot create thumbnail for "Removed file path and name"
00000017 1.04512084  digikam.general: Thumbnail is null for "Removed file path and name"
I removed the file path and name for privacy reasons.
this repeats for various videos until crash, takes about 2/10ths of a second per loop? (looks like from that snipit I gave you). video file types are various, avi, flv, mov, mp4, and more, the example above is just mov.
This happens before the loops start when I hit refresh:
00000005 0.91890234  digikam.general: Using 8 CPU core to run threads
00000006 0.91933465  digikam.general: Action Thread run 1 new jobs
00000007 0.93396312  digikam.general: Cancel Main Thread
00000008 0.93400776  digikam.general: One job is done
I will try to get more info tomorrow.
Also two other questions, I turned off the album sync when it starts because it was crashing. How do I start it artificially, I thought that is what refresh does, not apparently refresh only updates the thumbnails.
Also is it possible to do the FUZZY search on the thumb nails to file potential duplicates? This is my real intent. I want to cut that 200,000 videos down to 100,000.
If not, is this a future feature? Can it be one? High demand I think.
Spent some more time trying to figure out how to provide more data. while running the debugger I also found this line:
 digikam.metaengine: Exiv2 ( 3 ) : Xmp.video.Metadata dataLength was found to be larger than 5000 entries considered invalid; not read.
If there is anything else I can do to help debug this, let me know! Thank you.
The xmp warning is not the problem.
But it's know that Exiv2 have many problem with video files.
I recommend to not try to scan your huge collection in one time.
Start with a fresh database and add video files by chunks step by step until crash appear. To goal is to isolate the file which introduce the dysfunction.
After that, report the problem to Exiv2 bugzilla with the identified video file for investigations. As DK windows installer include current Exiv2 source code, we can rebuild a DK for windows with last fix from Exiv2.
For your problem with GDB under Windows, if command line version won't to start digiKam (even if it work on my VM with Windows 7), you need to open a console and go to the directory where gdb and digikam excutable are installed (it's the same dir).
After that it's simple. Look the generic page for details :
>Also is it possible to do the FUZZY search on the thumb nails to file >potential duplicates? This is my real intent. I want to cut that 200,000 >videos down to 100,000.
>If not, is this a future feature? Can it be one? High demand I think.
The Fuzzy Search work only with Still Image currently.
To see a similar function for video, this will need an algorithm to create a fingerprint of the first frame of video, in goal to compare later with DB.
This is how the fuzzy tool work actually. A simplified wavelets matrix is computed with still image. We compare matrix together to found similarities.
For video we need a new matrix with the spacial information of video. Not impossible but complex to write and test.
Are the thumbnails not readily available to do the fuzzy search on? I know they are not the biggest but I think they are big enough, or if there is a setting to render them a slightly higher resolution... That is how I imagined it would work anyways, since the thumbnails would already be generated, half the work is already done to fuzzy search videos...
(In reply to Poz from comment #13)
> Are the thumbnails not readily available to do the fuzzy search on? I know
> they are not the biggest but I think they are big enough, or if there is a
> setting to render them a slightly higher resolution... That is how I
> imagined it would work anyways, since the thumbnails would already be
> generated, half the work is already done to fuzzy search videos...
Sadly, it is not this easy.
The fuzzy search creates a signature from images. This does not hold for videos. Videos are quite more complex as the signature creation must be uniformly done for all videos. But if videos have black frames in the beginning, the search would lead to results which are, let's say, rubbish. The most stable way I see is to take the first frame from every video that is not plain, i.e. single-coloured. But this means we would have to generate images until we find the first appropriate frame. This would slow down the fingerprints generation significantly.
A stable implementation is not trivial here. I will think about a way more closely over the weekend.
This will be a quite long text - sorry. But I want to make the problems as clear as possible.
I thought about the fuzzy search for videos a bit more during my train travel.
In fact, even the first non-plain frame is worthless. If a user really wants to use digiKam as catalog for videos (which is not the scope of digiKam in first place IMHO), he will potentially have videos that have the same beginning, i.e. intro but are different videos. Thus, also the first non-plain frame will potentially lead to rubbish. I remember that I found some tools to find video duplicates. The process they applied was to take the first n images of a video and compare it to all others. A quite bad process IMO as with m videos you generate n*m images and then have to make a comparison. This is awfully bad from the view of complexity theory. And in practice, this process is, as can be expected, awfully slow.
Nevertheless, the process is the probably best way to really recognise duplicate videos. So, a way could be to generate a fingerprint over the first or last n images (slows down fingerprint generation extremely). This still is not robust as many videos may have the same intro (at least the first m seconds, e.g. about m*25 frames. Usual intros take many seconds. So a *rather* stable approach would be to take 1000 frames. As you can imagine, this is a big amount of data to compute fingerprints for. Just imagine your 200,000 videos. Fingerprinting them would mean to generate 200,000,000 images. Every image must be generated which is no const-time process but at least linear time. So, even with 1000 videos, i would expect computation time to be in measure of hours, not minutes.
Let's take a look from the other side, outros are far more distinct than intros. So, a lower number n can be taken, e.g. 100. This reduces the time quite a lot. But is probably still not satisfying.
If no or only short intros/outros are there, only few images should be sufficient and the process could work quite good.
But we cannot estimate, how the videos are structured. The FPS count may/will differ from video to video. So, woking on frames explicitly may again lead to low-quality results. So, the best way would be to take the n first/last seconds and then the complexity cannot really be estimated here.
Also, I think, users should decide themselves, how many seconds are taken (configuration) and if beginning or ending should be taken (configuration again).
So, *if* this feature should be implemented, I see the following options for users:
1) Take the first non-plain frame for fingerprinting (fast, probable not suitable for e.g. cinema movies)
2) Take the n first seconds for fingerprinting (probably awfully slow, may be suitable for e.g. cinema movies, overkill for self-produced movies)
3) Take the n last seconds for fingerprinting (probably slow, probably suitable for e.g. cinema movies, less overkill for self-produced movies)
In a more precise algorithmic way, we would need an adoption of the fingerprints maintenance stage:
Option 1: take the first non-plain frame for video fingerprints
Option 2: take the Option(number n) Option(first,last) seconds for video fingerprinting.
Changing the current options *must* trigger delete the current fingerprints of videos as otherwise, different
fingerprintings would coexist which leads to wrong results - except rebuild all fingerprints is chosen.
Then, the fuzzy search could probably work without adoptions - but I am not completely sure if it would work out of the box.
In my office we capture Infrared plan sequence of events in a Tokamak to catch physical dysfunctions while experience.
video can take more than 2 minutes in HD, no more. More than 20 experiences can be done in a day. All video are lossless stored in a database.
There is no camera movements. Only the plasma inside the machine change the contents. Depending of the experience parameters, the video contents willbe different.
We have a process to recognize similar video into the database. It written in Matlab. As i know the process cut the first frames where there is nothing (black hole) until the light begin. After that a wavelets fingerprints is computed with a flat image taken from some frames inside the video. Not whole video is analyzed, but the algorithm try to detect the edge of change and adjust the fingerprint, by parsing a section of the movie. This is how the spacial (temporal) dimension is processed.
For each file, the fingerprint can give the average of similarity of video comparing to others. When physicians want to look in experiences, they just compare a video made with Tokamak settings and look if another one is similar. The goal is to see if physical events are similar even if parameters are different.
Of course, it's a special use case, as video are static plan with changing contents, but i think the process is not too bad if we want to apply it on a small section of DSC movies.
Note : I know just the theory. The code is not available of course.
Wow the discussion here is fantastic. Thank you for the time and thought!
So yes, the approach I suggested of just using the thumbnails is clearly not robust enough given the wide array of video content out there.
I think a lot of the problems come from very uniform videos, for example standard intros or outros. My case has very non uniform videos (without any intro or outros) where I can run through windows explorer and find duplicates myself from simply looking at the thumbnails so I know at least 20% are duplicates just from simple observation. The problem is that it is to much to go through that many files and click each one individually. I have used Digikam before on photos for duplicates and was amazed at how well it worked so naturally I thought, 'man, I wish I could get digikam to access these thumbnails for me, I could get rid of +95% of these duplicates in a day'. I know there could be false positives, but I could live with 1% or something like that. To further get rid of false positives there could be a video length option of +-X seconds (default at 2 or something).
I currently use http://www.alldup.de/alldup_help/alldup.php
The content method works very well, I would say less then 0.001% false positives. But it misses so very very much. It can take up for 48 hour to run, but builds a database so it only compares new files added into the search. I even use the file size method, for large files, this works very well. Smaller files (<10 mb?) tend to have more false positives. Unfortunately due to different compression and file types this does not catch them all either.
I think in the end, until computer hardware is faster, video duplicate searches will require a number of different methods and some user input. Until then that is what we have to work with/ around. I was just hoping for another way to slim down on this video database. Thumbnail seemed like low hanging fruit.
I disabled video metadata support in Exiv2 shared library used with windows installer. New version can be downloaded in GDrive repository in few minutes :
Can you reproduce the problem with this version ?
Typically, the video file will be registered in database, but video metadata will not be parsed to populate the database.
Thanks in advance for your feedback
I tried the version with disabled video metadata support in Exiv2 shared library that you just posted.
It allows me to import all of the video files! Success! However they all appear to be gray boxes with no thumbnails. Perhaps this is a separate issue?
I recommend to stop digiKam and drop the thumbnail-digikam.db file and restart it.
Force to rebuild thumbnails with F5 key when you are in album with video. This fix the problem ?
I had Digi closed deleted the thumbnail-digikam.db and started Digi. I hit F5 and it rebuild the thumbnails in a few mins. Everything flashed and then still only gray video boxes.
I suspect a possible ffmpeg codec missing for your video.
Can you share some video sample through the cloud to reproduce the problem ?
Can you install debugview program and run digiKam to press F5 in a video album ? debugview will capture all debug statements from digiKam. Se this page for details :
These are the codecs I have installed: https://www.codecguide.com/download_kl.htm
The mega version and updated.
I will look into sharing some video sample through the cloud to reproduce the problem. However I believe it is a shear numbers problem as sometimes a few thumbnails load, even up to 20-30 videos show thumbnails. As soon as I scroll, they flip back to gray. Not a particular video in the group that is causing the issue.
Here is the output from debugview:
I edited path name and video locations for privacy reasons. Also this is a snip it of starting digi cam, with thumbnail data base removed, and after hitting F5. For each video file it simply repeats making a very large file where the only thing different is the file name.
Let me know if you have trouble viewing that pastebin and I will copy the text here.
No. Your codecs that you have installed are not used by digiKam
We compile ffmpeg codecs for QtAV player used in digiKam.
This kind of error is explicit :
avcore\npsm\localprovider\baseprovider\lib\baseprovider.cpp(604)\NPSMDesktopProvider.dll!00007FFA59332140: (caller: 00007FFA593326E5) ReturnHr(497) tid(2de8) 80070490 Element not found.
It miss a codec for your video. avcore come from libav codecs into ffmpeg.
Which kind of video type you use exactly ?
The file video file types are various, avi, flv, mov, mp4, wmv, and more... A big mix.
Sorry, I do have those errors as well, I thought they were part of a different problem I am having with Oculus Rift cameras because they occur at about the same time.
00000351 2.07417393  avcore\npsm\localprovider\baseprovider\lib\baseprovider.cpp(604)\NPSMDesktopProvider.dll!00007FFA59332140: (caller: 00007FFA593326E5) ReturnHr(495) tid(2de8) 80070490 Element not found.
00000352 2.07450104  shell\explorer\taskband2\taskband2.cpp(4148)\explorer.exe!00007FF6BC80792A: (caller: 00007FFA75E67DE3) ReturnHr(588) tid(2e28) 80004005 Unspecified error
00000353 5.56119251  shell\lib\bindctx.cpp(128)\explorerframe.dll!00007FFA3647F200: (caller: 00007FFA364A24EA) ReturnHr(21) tid(828) 80070057 The parameter is incorrect.
00000354 5.56124878  shell\lib\bindctx.cpp(128)\explorerframe.dll!00007FFA3647F200: (caller: 00007FFA364A24EA) ReturnHr(22) tid(bf0) 80070057 The parameter is incorrect.
How do I ensure I have the codes installed correctly?
The codecs are in digiKam, included at compilation time.
There is no way for the moment to know which codecs are available. I know that only GPL2 licensed codecs are installed. No GPL3 and no patented codecs are compiled for legal reasons.
new 5.6.0 pre-release as bundle is available here :
Please check if this problem still reproducible with these versions.
Thanks in advance
digiKam 5.6.0 is now released and available as bundle for Linux, MacOS and Windows.
Can you check if problem still exists with this version ?
Thanks in advance
New digiKam 5.7.0 are built with current implementation as pre-release bundles:
Problem still reproducible ?
With 6.0.0, we have now a FFMpeg low level metadata parser based on libav C API for video files database registration.
The Exiv2 video support is not used anymore as this code is buggous and nobody sound motivated in Exiv2 to finalize the code.
The original post for this file must be fixed now and video metadata support with ffmpeg must be enough to populate database entries.