Summary: | [UX/Perf] : UI freezes when handling big projects / adding lots of clips in the resource folder | ||
---|---|---|---|
Product: | [Applications] kdenlive | Reporter: | Benoît Tarrade <benoittarrade> |
Component: | User Interface | Assignee: | Jean-Baptiste Mardelle <jb> |
Status: | REPORTED --- | ||
Severity: | major | CC: | snd.noise |
Priority: | NOR | Keywords: | efficiency |
Version: | 23.08.0 | Flags: | snd.noise:
performance+
|
Target Milestone: | --- | ||
Platform: | Arch Linux | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Attachments: | Kdenlive starts to freeze right away (loading a big project) |
Description
Benoît Tarrade
2023-10-04 08:52:30 UTC
After some digging, I can tell a few things : * Calculating hashes at every project load for every element (more exactly documentProducers and documentChains) takes up some time. * Updating the timeline with Mlt takes up a huge amount of time as well (I'm trying to profile that, but I think I will end up using stopwatches because profiling is far too heavy). I haven't gotten to the point of loading thumbnails and populating the resource explorer yet, but there is a big target before this. For the hashes, I think we can safely try to : * Compare the modification timestamp * If timestamp differs, then we might compute the hash of the file * alternatively, we can try to compute a hash for the N last bytes of a file (we'll be very lucky if 2 different files have the same tail hash!), which is much less expensive because the dataset size is much shorter, it might be quite faster I understand that this is an addition to the XML tags in the kdenlive doc, but as this probably is a new item (?) it shouldn't break serialization. I believe we can also split the list of candidates to be checked (lines 182 to 193 of documentchecker.cpp, with the linear for loop) and try to break it down and give it to eat to worker threads. I suppose there's no resource racing condition with the hashing function (but I might be wrong, in such a case it won't be much of a help to parallelize this execution). Furthermore, later in the projectmanager.cpp, after all the hashing calculations and a bunch of stuff, we try to update the timeline using the projectManager::updateTimeline(...) function. Instantiation of the Mlt::Producer (std::unique_ptr) takes a lot of time as well. I'll try to investigate that a bit. [Edit] : for now that's the stacktrace I have while profiling project loading with valgrind : ``` Frame: Backtrace for Thread 1 [ 0] read (103077 x) [ 1] QFSFileEngine::read(char*, long long) (101904 x) [ 2] QFileDevice::readData(char*, long long) (99246 x) [ 3] QIODevicePrivate::read(char*, long long, bool) (494613 x) [ 4] QIODevice::read(char*, long long) (375127 x) [ 5] qt_fill_input_buffer (373294 x) [ 6] jpeg_fill_bit_buffer (147621986 x) [ 7] decode_mcu (10730517 x) [ 8] decompress_onepass (35681 x) [ 9] process_data_context_main (568343 x) [10] jpeg_read_scanlines (568343 x) [11] read_jpeg_image(QImage*, QSize, QRect, QRect, int, void (*)(unsigned int*, unsigned char const*, int), jpeg_decompress_struct*, my_error_mgr*) (269 x) [12] QJpegHandler::read(QImage*) (269 x) [13] QImageReader::read(QImage*) (324 x) [14] QImageReader::read() (269 x) [15] refresh_qimage (269 x) [16] producer_qimage_init (269 x) [17] mlt_repository_create (665 x) [18] mlt_factory_producer (335 x) [19] create_producer.lto_priv.0 (335 x) [20] producer_loader_init (330 x) [21] mlt_repository_create (665 x) [22] mlt_factory_producer (270 x) [23] on_end_element (9016 x) [24] xmlParseEndTag1.constprop.0 (56694 x) [25] xmlParseElementEnd (56691 x) [26] xmlParseContentInternal (4 x) [27] xmlParseElement (4 x) [28] xmlParseDocument (2 x) [29] producer_xml_init (2 x) [30] mlt_repository_create (7 x) [31] mlt_factory_producer (2 x) [32] Mlt::Producer::Producer(mlt_profile_s*, char const*, char const*) (2 x) [33] ProjectManager::updateTimeline(bool, QString const&, QString const&, QDateTime const&, bool) [clone .constprop.0] (1 x) [34] ProjectManager::doOpenFile(QUrl const&, KAutoSaveFile*, bool) (2 x) [35] ProjectManager::openFile(QUrl const&) (1 x) [36] ProjectManager::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) [clone .part.0] (2 x) [37] ProjectManager::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (1 x) ``` It's painfully slow due to the Debug build and underlying profiling, if I have the time to do so I'll try again with stopwatches for faster reporting. Salut Benoît, there were recently some optimizations in Kdenlive and MLT to improve project loading, could you try using the latest appimage or latest Kdenlive with MLT from master branch to test? But your profiling effort seems very helpful to find other possible problems. Hopefully someone from the team can give you some feedback if needed asap. Hi Farid, thank you for your reply. I just tested now with kdenlive on master branch and mlt on master branch as well : * kdenlive commit a89e93b672bf51f918945e1791bb8f24652bf871 -> Github Mirror * mlt commit 04e78fc3ecfa75de4c8d7a5ebb84d2160d606d37 # Loading time observations It takes more than 4 minutes (both Releases builds) just to compute the hashes. This is far too long, in the meantime the whole software is frozen, and Gnome (in my case) keeps asking if we can kill that rogue process, which is to be expected as Qt can't poll the event handler, for some reasons. Once hashes are computed and kdenlive populates the resource folder, it then proceeds to actually load the data in RAM, using MLT. It takes somewhere between 1 and 2 minutes on my machine, meanwhile the UI is responsive again (with the amount of task increasing fast in the top left corner, then decreasing slowly when kdenlive starts to process the data). # Ram usage A word about RAM usage (this would probably need to be in another ticket, as it is linked to this one but the cause is different) I have 33GB of data referenced in the project tree, and about 20 clips in the timeline, with an average size of 60MB each. Now, kdenlive (thus MLT, in this case) has the whole timeline loaded in memory (around 10GB), and as I have a 16GB machine we can very easily see that I won't be able to add much more files to the timeline. I tried to reproduce something close to that in Blender, which works differently and here are the major differences : * No project/source explorer whereas kdenlive has one : having a project explorer is very neat as it helps organize data, and that's a real + for kdenlive ; however there is absolutely NO delay and no hash calculation-induced delay on Blender size. The experience is 100% transparent, I believe this is what Kdenlive should aim for IMHO. * With 20 clips added to Blender's timeline, Ram usage is kept at a minimum and does not grow the more we add objects to timeline. What makes the RAM usage grow is the amount of overlapping clips, because we need to process the data altogether. Clip data is streamed on the fly ! And that makes for a video editor that can eat 100+++ Gb of clips and still only using just the clip size, sometimes less, depending on the clip size. => I am not advocating a solution is better (conceptually) than the other, Kdenlive is a crazy good software and it's at the right spot to do the job. Blender has different goals and is built in a very singular manner, and it's video editing tool, despite being really powerful, has a peculiar UX, if we compare it to what we are accustomed to in the video editing industry, I think kdenlive best fits this niche. Yo team guys ! I'm spamming again, I've done some testing and finally had the time to try the stopwatches. Here are my findings so far around the `documentchecker.cpp` file, in the `hasError [And I cannot edit the later comment comment ... Damn it!] So I was saying : In `documentchecker.cpp::hasErrorInProject()` with my dataset : ``` // Note, I renamed some steps so that the function is easier to understand, you know, usual stuff :-) Starting project checking process - Playlist checks" * 6 ms "Validating document producers" * 65s Validating document chains * 22s Checking luma files * 6.5ms Checking for missing transitions * 6.4 ms Checking for missing filter assets * 6.8 ms Checking for missing effects * 6.4 ms ``` So, while the overall time is around 1 and half minute and might probably be shortened, that's not where the full 4 minutes of project loading are coming alone. There might be other stuff happening here and there, I'll continue looking but there is something going on here. Also, I think you very well know how to do it but I strongly advocate to have this kind of instrumented code in the codebase : ``` #include <iostream> #define DECL_STOPWATCH(varname) \ std::chrono::_V2::system_clock::time_point varname; #define STOPWATCH_TIME_NOW(stopwatch) \ stopwatch = std::chrono::system_clock::now(); #define LOG_DURATION(start,end) \ /* Convert Nanoseconds to seconds */ \ std::cout << "elapsed time: " << (end-start).count() / 1000000000 << "s" << std::endl; #define LOG(msg) \ std::cout << QFileInfo(__FILE__).baseName().toStdString() \ << "::" << __func__ <<"():line" \ << __LINE__ << ": " << msg << std::endl; // And later in the code : DECL_STOPWATCH(start); DECL_STOPWATCH(end); STOPWATCH_TIME_NOW(start); LOG("Starting project checking process - Playlist checks"); // Some heavy process to be measured // ... // ... STOPWATCH_TIME_NOW(end); LOG_DURATION(start, end); ``` Note that this is a naive monothreaded implementation, multithreaded impl would be a tad bit trickier, involving multithreaded logs in the loop, but the idea remains the same. And this can be easily be disabled on releases builds with a conditional macro like this : ``` #ifdef PERF_MONITORING #define DECL_STOPWATCH(varname) \ blahblahblah // ... Other macros #else // Do nothing macro ! #define DECL_STOPWATCH(varname) () // Variables are not even declared, so this is transparent to the final code #endif // End of PERF_MONITORING macros. ``` # About logging [I have no lessons to give to anyone, I'm a software dev just like you and here are a few personal thoughts about our common endeavour, I am not trying to rub anyone the wrong way ^^ ] There are tools here and there in kdenlive to perform some logging, albeit most of the logs you see in the console is actually performed via qDebug. I believe this is a hot topic in Kdenlive dev community (I can't see why it wouldn't be!), it's probably due to a lack of resources and so much things to work on already, pushing new features, etc... However I believe having a proper logging system (and a bit enforced, be it through macros to increase performances, or switches to enable/disable logging systems if perfs are really an issue) is key for kdenlive stability. In my opinion, Perf logging is really important, as well as error cases. Logs should be filtered as well based on their criticality, etc... In our software industry, we always welcome detailed logs that actually tells you what's going on before you even open the codebase :) |