475205 – [UX/Perf] : UI freezes when handling big projects / adding lots of clips in the resource folder

Bug 475205 - [UX/Perf] : UI freezes when handling big projects / adding lots of clips in the resource folder

Summary: [UX/Perf] : UI freezes when handling big projects / adding lots of clips in t...

Status:	REPORTED

Alias:	None

Product:	kdenlive
Classification:	Applications
Component:	User Interface (show other bugs)
Version:	23.08.0
Platform:	Arch Linux Linux

Importance:	NOR major
Target Milestone:	---
Assignee:	Jean-Baptiste Mardelle

URL:
Keywords:	efficiency

Depends on:
Blocks:

Reported:	2023-10-04 08:52 UTC by Benoît Tarrade
Modified:	2023-10-20 04:10 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:

Flags:	snd.noise: performance+

Attachments
Kdenlive starts to freeze right away (loading a big project) (25.28 KB, image/png) 2023-10-04 08:52 UTC, Benoît Tarrade	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Benoît Tarrade 2023-10-04 08:52:30 UTC

Created attachment 162080 [details]
Kdenlive starts to freeze right away (loading a big project)

SUMMARY
Kdenlive UI seems to be very tied to the underlying processes, thus making the UI event processing loop stuck for some reasons. This happens either when :
* dropping a big folder into the resource explorer
* opening a saved project which points to a lot of resources.

In my context, I am working on a project where I have around 1500 items (video and pictures) for a total of around 33 GB.
That's a lot of files, and kdenlive (or mlt!) takes sometimes close to 10 minutes scanning all of them, making the UI totally unresponsive. This happens on every load as well, so it seems caching is not sufficient (by the way, I'm having a lot of logs saying "GOT THUMBNAIL : -1 x -1" ; don't know what that means exactly but it seems that the cache contains thumbnails at least.

=> This is an I/O bound issue, Kdenlive lacks asynchronicity when it comes to opening files/and performing preprocessing tasks on them (parsing/loading content, computing thumbnails, etc.)

STEPS TO REPRODUCE
1. Open a new project
2. Drop a big folder with few tens of clips in the resource explorer
3. Alternative : save a big project and reopen it later on, file indexing seems to be taking very long as well and if the amount of files is big, it takes forever (especially if data is stored on a not-brand-new HDD). 

OBSERVED RESULT
The loading dialog appears (and is more or less frozen), most of the times the Desktop manager shows and asks if you want to kill the process (because Qt needs to ping the Destkop manager's watchdog periodically, otherwise it's considered a zombie process)
Then wait. Eventually Kdenlive will get to the point where it finishes loading, or at least gets out of the work intensive loop it's stuck in and the little "xxx tasks" counter increases suddenly, and the UI comes back to life.

EXPECTED RESULT
The UI should remain responsive at all times, and the user shall be able to cancel the loading process if he/she needs to.
Like "Whoops, I dropped the wrong folder ! I do I do know to stop it from loading this 4000 files rush folder ??". For now you need to kill the process, or make one good cup of coffee :)

SOFTWARE/OS VERSIONS 
Linux/KDE Plasma:  Linux arch-linux 6.1.55-1-lts #1 SMP PREEMPT_DYNAMIC /// Gnome 44.5 / X11
Qt Version: Qt 5.15.x

ADDITIONAL INFORMATION

> /!\ I haven't taken the time to read the source code yet, maybe what I'm saying here is off (and in such a case I apologize), but I'm trying to lay out some ideas here

My gut feeling is that we need to defer (and push in background loading tasks) files loading as much as we can and prioritize the loading order so that the end user can start working as soon as possible !
* A simple stat should be enough to check if the file is reachable and valid
* All I/O bound operations need to be as asynchronous as possible (maybe using the std::future for ease of use, or put in a background thread that'll wait the I/O and move on to the next one)
* Thumbnails shall be computed only for the visible par of the viewport in the resources folder and cached for later reuse, I believe the caching is already written, but I think thumbnail generation really takes too much time (again, my gut feeling; I might really well be wrong)

Comment 1 Benoît Tarrade 2023-10-04 11:58:44 UTC

After some digging, I can tell a few things :

* Calculating hashes at every project load for every element (more exactly documentProducers and documentChains) takes up some time. 
* Updating the timeline with Mlt takes up a huge amount of time as well (I'm trying to profile that, but I think I will end up using stopwatches because profiling is far too heavy).

I haven't gotten to the point of loading thumbnails and populating the resource explorer yet, but there is a big target before this.

For the hashes, I think we can safely try to :
* Compare the modification timestamp
  * If timestamp differs, then we might compute the hash of the file
  * alternatively, we can try to compute a hash for the N last bytes of a file (we'll be very lucky if 2 different files have the same tail hash!), which is much less expensive because the dataset size is much shorter, it might be quite faster

I understand that this is an addition to the XML tags in the kdenlive doc, but as this probably is a new item (?) it shouldn't break serialization.
I believe we can also split the list of candidates to be checked (lines 182 to 193 of documentchecker.cpp, with the linear for loop) and try to break it down and give it to eat to worker threads. I suppose there's no resource racing condition with the hashing function (but I might be wrong, in such a case it won't be much of a help to parallelize this execution).

Furthermore, later in the projectmanager.cpp, after all the hashing calculations and a bunch of stuff, we try to update the timeline using the projectManager::updateTimeline(...) function. 
Instantiation of the Mlt::Producer (std::unique_ptr) takes a lot of time as well. 
I'll try to investigate that a bit.

Comment 2 Benoît Tarrade 2023-10-04 12:02:51 UTC

[Edit] :  for now that's the stacktrace I have while profiling project loading with valgrind :

```
 Frame: Backtrace for Thread 1
   [ 0]  read (103077 x)
   [ 1]  QFSFileEngine::read(char*, long long) (101904 x)
   [ 2]  QFileDevice::readData(char*, long long) (99246 x)
   [ 3]  QIODevicePrivate::read(char*, long long, bool) (494613 x)
   [ 4]  QIODevice::read(char*, long long) (375127 x)
   [ 5]  qt_fill_input_buffer (373294 x)
   [ 6]  jpeg_fill_bit_buffer (147621986 x)
   [ 7]  decode_mcu (10730517 x)
   [ 8]  decompress_onepass (35681 x)
   [ 9]  process_data_context_main (568343 x)
   [10]  jpeg_read_scanlines (568343 x)
   [11]  read_jpeg_image(QImage*, QSize, QRect, QRect, int, void (*)(unsigned int*, unsigned char const*, int), jpeg_decompress_struct*, my_error_mgr*) (269 x)
   [12]  QJpegHandler::read(QImage*) (269 x)
   [13]  QImageReader::read(QImage*) (324 x)
   [14]  QImageReader::read() (269 x)
   [15]  refresh_qimage (269 x)
   [16]  producer_qimage_init (269 x)
   [17]  mlt_repository_create (665 x)
   [18]  mlt_factory_producer (335 x)
   [19]  create_producer.lto_priv.0 (335 x)
   [20]  producer_loader_init (330 x)
   [21]  mlt_repository_create (665 x)
   [22]  mlt_factory_producer (270 x)
   [23]  on_end_element (9016 x)
   [24]  xmlParseEndTag1.constprop.0 (56694 x)
   [25]  xmlParseElementEnd (56691 x)
   [26]  xmlParseContentInternal (4 x)
   [27]  xmlParseElement (4 x)
   [28]  xmlParseDocument (2 x)
   [29]  producer_xml_init (2 x)
   [30]  mlt_repository_create (7 x)
   [31]  mlt_factory_producer (2 x)
   [32]  Mlt::Producer::Producer(mlt_profile_s*, char const*, char const*) (2 x)
   [33]  ProjectManager::updateTimeline(bool, QString const&, QString const&, QDateTime const&, bool) [clone .constprop.0] (1 x)
   [34]  ProjectManager::doOpenFile(QUrl const&, KAutoSaveFile*, bool) (2 x)
   [35]  ProjectManager::openFile(QUrl const&) (1 x)
   [36]  ProjectManager::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) [clone .part.0] (2 x)
   [37]  ProjectManager::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (1 x)
```

It's painfully slow due to the Debug build and underlying profiling, if I have the time to do so I'll try again with stopwatches for faster reporting.

Comment 3 farid 2023-10-18 12:04:01 UTC

Salut Benoît, there were recently some optimizations in Kdenlive and MLT to improve project loading, could you try using the latest appimage or latest Kdenlive with MLT from master branch to test? 

But your profiling effort seems very helpful to find other possible problems. Hopefully someone from the team can give you some feedback if needed asap.

Comment 4 Benoît Tarrade 2023-10-19 02:43:46 UTC

Hi Farid, thank you for your reply.

I just tested now with kdenlive on master branch and mlt on master branch as well :
* kdenlive commit a89e93b672bf51f918945e1791bb8f24652bf871 -> Github Mirror
* mlt commit 04e78fc3ecfa75de4c8d7a5ebb84d2160d606d37

# Loading time observations
It takes more than 4 minutes (both Releases builds) just to compute the hashes. This is far too long, in the meantime the whole software is frozen, and Gnome (in my case) keeps asking if we can kill that rogue process, which is to be expected as Qt can't poll the event handler, for some reasons.

Once hashes are computed and kdenlive populates the resource folder, it then proceeds to actually load the data in RAM, using MLT.
It takes somewhere between 1 and 2 minutes on my machine, meanwhile the UI is responsive again (with the amount of task increasing fast in the top left corner, then decreasing slowly when kdenlive starts to process the data).

# Ram usage
A word about RAM usage (this would probably need to be in another ticket, as it is linked to this one but the cause is different)
I have 33GB of data referenced in the project tree, and about 20 clips in the timeline, with an average size of 60MB each.
Now, kdenlive (thus MLT, in this case) has the whole timeline loaded in memory (around 10GB), and as I have a 16GB machine we can very easily see that I won't be able to add much more files to the timeline.

I tried to reproduce something close to that in Blender, which works differently and here are the major differences :
* No project/source explorer whereas kdenlive has one : having a project explorer is very neat as it helps organize data, and that's a real + for kdenlive ; however there is absolutely NO delay and no hash calculation-induced delay on Blender size. The experience is 100% transparent, I believe this is what Kdenlive should aim for IMHO.
* With 20 clips added to Blender's timeline, Ram usage is kept at a minimum and does not grow the more we add objects to timeline. What makes the RAM usage grow is the amount of overlapping clips, because we need to process the data altogether. Clip data is streamed on the fly ! And that makes for a video editor that can eat 100+++ Gb of clips and still only using just the clip size, sometimes less, depending on the clip size.

=> I am not advocating a solution is better (conceptually) than the other, Kdenlive is a crazy good software and it's at the right spot to do the job. Blender has different goals and is built in a very singular manner, and it's video editing tool, despite being really powerful, has a peculiar UX, if we compare it to what we are accustomed to in the video editing industry, I think kdenlive best fits this niche.

Comment 5 Benoît Tarrade 2023-10-20 03:52:28 UTC

Yo team guys !
I'm spamming again, I've done some testing and finally had the time to try the stopwatches.

Here are my findings so far around the `documentchecker.cpp`  file, in  the `hasError

Comment 6 Benoît Tarrade 2023-10-20 04:10:41 UTC

[And I cannot edit the later comment comment ... Damn it!]

So I was saying : 
In `documentchecker.cpp::hasErrorInProject()` with my dataset :

```

// Note, I renamed some steps so that the function is easier to understand, you know, usual stuff :-)
Starting project checking process - Playlist checks"
* 6 ms

"Validating document producers"
* 65s

Validating document chains
* 22s

Checking luma files
* 6.5ms

Checking for missing transitions
* 6.4 ms

Checking for missing filter assets
* 6.8 ms

Checking for missing effects
* 6.4 ms
```

So, while the overall time is around 1 and half minute and might probably be shortened, that's not where the full 4 minutes of project loading are coming alone.
There might be other stuff happening here and there, I'll continue looking but there is something going on here.

Also, I think you very well know how to do it but I strongly advocate to have this kind of instrumented code in the codebase :
```
#include <iostream>

#define DECL_STOPWATCH(varname) \
    std::chrono::_V2::system_clock::time_point varname;

#define STOPWATCH_TIME_NOW(stopwatch) \
    stopwatch = std::chrono::system_clock::now();

#define LOG_DURATION(start,end) \
                                    /* Convert Nanoseconds to seconds  */    \
    std::cout << "elapsed time: " << (end-start).count() / 1000000000 << "s" << std::endl;

#define LOG(msg)                                                    \
    std::cout << QFileInfo(__FILE__).baseName().toStdString()       \
              << "::" << __func__ <<"():line"                       \
              << __LINE__ << ": " << msg << std::endl;




// And later in the code : 
DECL_STOPWATCH(start);
DECL_STOPWATCH(end);

STOPWATCH_TIME_NOW(start);
LOG("Starting project checking process - Playlist checks");
// Some heavy process to be measured
// ...
// ...

STOPWATCH_TIME_NOW(end);
LOG_DURATION(start, end);
```
Note that this is a naive monothreaded implementation, multithreaded impl would be a tad bit trickier, involving multithreaded logs in the loop, but the idea remains the same.

And this can be easily be disabled on releases builds with a conditional macro like this :
```
#ifdef PERF_MONITORING
#define DECL_STOPWATCH(varname) \
    blahblahblah
// ... Other macros

#else
// Do nothing macro ! 
#define DECL_STOPWATCH(varname) ()
// Variables are not even declared, so this is transparent to the final code

#endif // End of PERF_MONITORING macros.
```


# About logging 
[I have no lessons to give to anyone, I'm a software dev just like you and here are a few personal thoughts about our common endeavour, I am not trying to rub anyone the wrong way ^^ ]

There are tools here and there in kdenlive to perform some logging, albeit most of the logs you see in the console is actually performed via qDebug.
I believe this is a hot topic in Kdenlive dev community (I can't see why it wouldn't be!), it's probably due to a lack of resources and so much things to work on already, pushing new features, etc...
However I believe having a proper logging system (and a bit enforced, be it through macros to increase performances, or switches to enable/disable logging systems if perfs are really an issue) is key for kdenlive stability.
In my opinion, Perf logging is really important, as well as error cases.
Logs should be filtered as well based on their criticality, etc... 
In our software industry, we always welcome detailed logs that actually tells you what's going on before you even open the codebase :)