Bug 306290 - Dolphin is unusable on huge folders (200 000 of files)
Summary: Dolphin is unusable on huge folders (200 000 of files)
Status: RESOLVED WORKSFORME
Alias: None
Product: dolphin
Classification: Applications
Component: general (show other bugs)
Version: 2.1
Platform: OpenSUSE Linux
: NOR normal with 20 votes (vote)
Target Milestone: ---
Assignee: Dolphin Bug Assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-05 11:39 UTC by network723
Modified: 2013-06-27 09:27 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Multithreading Mergesort (6.26 KB, patch)
2012-10-22 13:07 UTC, Emmanuel Pescosta
Details

Note You need to log in before you can comment on or make changes to this bug.
Description network723 2012-09-05 11:39:29 UTC
Dolphin is unusable on folders containing piles of files (about 200 000 of files in my case). When opening such directory, dolphin becomes very slow, and starts using 100% of one cpu core.
Tested on KDE 4.7, 4.8 and 4.9, with previews disabled. Also tried disabling nepomuk, but there was no difference.

Reproducible: Always

Steps to Reproduce:
1. start dolphin
2. navigate to a folder containing 200 000 of files
3. enjoy the slowness
Actual Results:  
File manager becomes very slow. Unusable slow.
Comment 1 network723 2012-09-21 11:53:06 UTC
Same happens to Krusader
Comment 2 Mark 2012-09-28 17:34:49 UTC
Confirmed in dolphin 2.1 (kde 4.9.1)

To reproduce, execute this very simple command:
NOTE: please do this in a separate folder!
touch a{1..100000}.txt

That generated 100.000 files and that certainly slows dolphin down massively when you dare opening that folder.
Comment 3 Mark 2012-09-28 17:55:09 UTC
The CPU usage was ~16% in my case. That happened to be 100% for 1 core.
Comment 4 Mark 2012-09-29 00:23:41 UTC
While looking into this issue i found some interesting issues.

The time is spend in three areas.

1. 1/3rd of the time is spend on sorting!
2. 1/3rd of the time is spend on drawing the layout for the massive amount of files. This one refreshes once you scroll up/down and that is causing a real massive slowdown.
3. the last 1/3rd is in the rest of the application.

So it seems like the layout drawing needs some work to handle massive folders.
Comment 5 Frank Reininghaus 2012-09-29 11:08:36 UTC
Thanks for the bug report and for your analysis, Mark!

(In reply to comment #4)
> The time is spend in three areas.
> 
> 1. 1/3rd of the time is spend on sorting!
> 2. 1/3rd of the time is spend on drawing the layout for the massive amount
> of files. This one refreshes once you scroll up/down and that is causing a
> real massive slowdown.
> 3. the last 1/3rd is in the rest of the application.

How exactly did you get these data? Do you have more detailed runtime/CPU cycles data? If the (apparently very long) time that Dolphin needs to open such a folder is equally distributed over three areas, it looks like there's not really one particular bottleneck that we could work on.
Comment 6 network723 2012-09-29 13:32:57 UTC
I noticed that when I am opening a huge folder, dolphin's memory usage rises fast to ~500 megabytes, and then increases very slow. I'm using KDE on 2 64-bit machines, with 16 and 8 gigabytes of RAM.
Comment 7 Mark 2012-09-29 18:16:09 UTC
(In reply to comment #5)
> Thanks for the bug report and for your analysis, Mark!
> 
> (In reply to comment #4)
> > The time is spend in three areas.
> > 
> > 1. 1/3rd of the time is spend on sorting!
> > 2. 1/3rd of the time is spend on drawing the layout for the massive amount
> > of files. This one refreshes once you scroll up/down and that is causing a
> > real massive slowdown.
> > 3. the last 1/3rd is in the rest of the application.
> 
> How exactly did you get these data? Do you have more detailed runtime/CPU
> cycles data? If the (apparently very long) time that Dolphin needs to open
> such a folder is equally distributed over three areas, it looks like there's
> not really one particular bottleneck that we could work on.

Well, the data is a bit... unreliable.

What i did is simply compiling dolphin in debug mode then followed by doing a vallgrind run on a folder with the massive file number in it.

That results in the stuff i said above. However, that isn't a fair profile run since the directory collection and sorting "seems" to be done only once. In that run it's taking 1/3th of the time, but over time the rest should increase and that one should stay roughly the same.

In other words, the painting stuff takes up most of the time and all the time and that part should be investigated further. I plan on doing that by doing a more specific profiling run right after the sorting is done.

It is quite tricky to get good profiling numbers..

Note for the sorting. Even though it's done only once (assumption) it's still a very slow operation where someone should also take a look to see if it can be done in a more optimal way. I will open another bug for that.
Comment 8 Emmanuel Pescosta 2012-10-21 22:18:35 UTC
(In reply to comment #7)
> Note for the sorting. Even though it's done only once (assumption) it's
> still a very slow operation where someone should also take a look to see if
> it can be done in a more optimal way. I will open another bug for that.

Hello everyone. 
I have tried to speed up the sorting a little bit. I implemented different sorting algorithms, such as 3-way quick-sort, comb-sort, insertion-sort, ... and I have also tried to combine the actual sorting-algorithm, which is a merge-sort algorithm, with insertion-sort and quick-sort (if less than 10 items -> sort with insertion-sort).

Furthermore I have played with multithreading. (Thanks to QtConcurrent::run ;)

The result:
The current sorting algorithm (merge-sort) has the best overall performance. With the multi-threading implementation I got a good performance improvement for huge folders.
Dolphin with single-threading merge-sort needs about 6700 ms to sort 500.000 files, the multi-threading merge-sort needs only 4100 ms.

I will upload the patch tomorrow.
Comment 9 Mark 2012-10-21 23:48:30 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > Note for the sorting. Even though it's done only once (assumption) it's
> > still a very slow operation where someone should also take a look to see if
> > it can be done in a more optimal way. I will open another bug for that.
> 
> Hello everyone. 
> I have tried to speed up the sorting a little bit. I implemented different
> sorting algorithms, such as 3-way quick-sort, comb-sort, insertion-sort, ...
> and I have also tried to combine the actual sorting-algorithm, which is a
> merge-sort algorithm, with insertion-sort and quick-sort (if less than 10
> items -> sort with insertion-sort).
> 
> Furthermore I have played with multithreading. (Thanks to QtConcurrent::run
> ;)
> 
> The result:
> The current sorting algorithm (merge-sort) has the best overall performance.
> With the multi-threading implementation I got a good performance improvement
> for huge folders.
> Dolphin with single-threading merge-sort needs about 6700 ms to sort 500.000
> files, the multi-threading merge-sort needs only 4100 ms.
> 
> I will upload the patch tomorrow.

Hi,

That's a nice improvement! Perhaps you might be interested in this report (which is closed): https://bugs.kde.org/show_bug.cgi?id=307585 I did manage to cut the time in half only that was using ugly hacks and in a completely unacceptable way to be merged in dolphin. If you can manage to take that approach only in a clean way then you can already speed the sorting up by 2x without even using multithreading.

If you do add in multithreading then i imagine it to be even faster! Your speedup does make me think why it's not faster though. This job can be done perfectly in a multithreaded manner so something must be blocking it to go faster then the 1/3rd speedup you got.

A piece of advice. The bottleneck in here is KStringHandler::naturalCompare function used for sorting. If you manage to optimize that then you optimize everything. The bottleneck in that function (one of the) is the QStringRef::localeAwareCompare call in there though there is more that becomes a bottleneck when run for millions of times.

Another way to really speed things up (which i also did) is splitting out the folders from the rest of the data. The compare function is calling an if/else in every QuickSort function call to determine something for a file or folder. Just by splitting that out you already gain quite a big performance improvement even if you don't have any folders. It just saves millions and millions of checks. That obviously only works if the folders before files option is selected in dolphin.

My personal goal is to make this sorting (for 200.000 files) done under 1 second. It must be possible because on the command line you can do "ls -v" which also does the same sorting and does it very rapidly!

I actually think that we're all thinking about this with the wrong approach. I don't know what the right approach would be but i can't imagine that sorting 100.000 entries (or 200.000 or even 1 million) can't be done under 1 second with todays computers. We must be missing something very obvious here.

QuickSort isn't the issue. Natural compare stuff is.
Comment 10 Mark 2012-10-22 00:18:51 UTC
I think Burstsort is the way to go. It is specifically designed to naturally sort massive amounts of stings. Please do take a look at http://en.wikipedia.org/wiki/Burstsort and some of the pdf links in there. There is even an GPLed C++ implementation (not LGPL though) library in there!
Comment 11 Frank Reininghaus 2012-10-22 07:45:56 UTC
(In reply to comment #8)
> I have tried to speed up the sorting a little bit. I implemented different
> sorting algorithms, such as 3-way quick-sort, comb-sort, insertion-sort, ...
> and I have also tried to combine the actual sorting-algorithm, which is a
> merge-sort algorithm, with insertion-sort and quick-sort (if less than 10
> items -> sort with insertion-sort).

Thanks for looking into this, Emmanuel. Maybe we should have posted a link to bug 307585 here, that might have saved you some time.

> Furthermore I have played with multithreading. (Thanks to QtConcurrent::run
> ;)

Parallelization indeed looks like a way to make Mergesort faster without much effort (providing that an easy-to-use framework is used) and with a rather low risk of unwanted side effects. I'm curious to see your patch.

But please note that I consider opening folders with hundreds of thousands of files a corner case. If we can make sorting considerably faster in such cases without making the code much more complex, fair enough. But spending vast amounts of time to make sorting huge amounts of files only slightly faster and at the same time add more code paths, make maintenance more difficult and possibly introduce bugs is not a good idea IMHO.

(In reply to comment #9)
> That's a nice improvement! Perhaps you might be interested in this report
> (which is closed): https://bugs.kde.org/show_bug.cgi?id=307585 I did manage
> to cut the time in half only that was using ugly hacks and in a completely
> unacceptable way to be merged in dolphin. If you can manage to take that
> approach only in a clean way then you can already speed the sorting up by 2x
> without even using multithreading.

Just to prevent that anyone takes this as an encouragement to start some work based on the patch in the other report: as I already said in the other report, this approach is so far from being acceptable that I consider any further work on this a waste of time.

> A piece of advice. The bottleneck in here is KStringHandler::naturalCompare
> function used for sorting. If you manage to optimize that then you optimize
> everything. The bottleneck in that function (one of the) is the
> QStringRef::localeAwareCompare call in there though there is more that
> becomes a bottleneck when run for millions of times.

Speed improvements in KStringHandler::naturalCompare() would indeed be welcome, but that has to be done in kdelibs.

> Another way to really speed things up (which i also did) is splitting out
> the folders from the rest of the data. The compare function is calling an
> if/else in every QuickSort function call to determine something for a file
> or folder. Just by splitting that out you already gain quite a big
> performance improvement even if you don't have any folders. It just saves
> millions and millions of checks. That obviously only works if the folders
> before files option is selected in dolphin.

As I already said, I believe that the performance gain we could get from this is not worth adding extra code paths.

(In reply to comment #10)
> I think Burstsort is the way to go. It is specifically designed to naturally
> sort massive amounts of stings.

It should be obvious that we need a general-purpose algorithm because we can also sort by date, size, etc. I'm obviously not going to accept a patch that uses different sort algorithms for different sort roles. Moreover, I havent't seen any evidence yet that cache misses are the problem (which is what Burstsort tries to avoid if I understand correctly).
Comment 12 Emmanuel Pescosta 2012-10-22 13:06:06 UTC
(In reply to comment #11)
> > A piece of advice. The bottleneck in here is KStringHandler::naturalCompare
> > function used for sorting. If you manage to optimize that then you optimize
> > everything. The bottleneck in that function (one of the) is the
> > QStringRef::localeAwareCompare call in there though there is more that
> > becomes a bottleneck when run for millions of times.
> 
> Speed improvements in KStringHandler::naturalCompare() would indeed be
> welcome, but that has to be done in kdelibs.

Yes I have already identified the KStringHandler::naturalCompare() bottleneck (Thanks to callgrind). I am already working on that, but it's not an easy task to find the right way. (Caching and so)

(In reply to comment #10)
> I think Burstsort is the way to go. It is specifically designed to naturally
> sort massive amounts of stings.
How often does Burstsort call model->lessThan() when you compare it with Mergesort? We have to bring the amount of model->lessThan() calls down to a minimum.
Comment 13 Emmanuel Pescosta 2012-10-22 13:07:26 UTC
Created attachment 74722 [details]
Multithreading Mergesort
Comment 14 Emmanuel Pescosta 2012-10-22 13:13:47 UTC
Another interesting sorting algorithm called "New Parallel Sorting Algorithm Based on Partitioning and Redistribution": http://scialert.net/fulltext/?doi=jas.2008.2341.2343

The great thing about this sorting algorithm is, that there is no merging needed at the end -> Saves a lot of model->lessThan() calls.
Comment 15 Mark 2012-10-22 15:25:31 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > > A piece of advice. The bottleneck in here is KStringHandler::naturalCompare
> > > function used for sorting. If you manage to optimize that then you optimize
> > > everything. The bottleneck in that function (one of the) is the
> > > QStringRef::localeAwareCompare call in there though there is more that
> > > becomes a bottleneck when run for millions of times.
> > 
> > Speed improvements in KStringHandler::naturalCompare() would indeed be
> > welcome, but that has to be done in kdelibs.
> 
> Yes I have already identified the KStringHandler::naturalCompare()
> bottleneck (Thanks to callgrind). I am already working on that, but it's not
> an easy task to find the right way. (Caching and so)
> 
> (In reply to comment #10)
> > I think Burstsort is the way to go. It is specifically designed to naturally
> > sort massive amounts of stings.
> How often does Burstsort call model->lessThan() when you compare it with
> Mergesort? We have to bring the amount of model->lessThan() calls down to a
> minimum.

To be honest.. They are (or seem) already down to the minimum of mergesort (or quicksort). The naturalCompare is the big showstopper here. If that is made 10x faster then the entire sorting operation is 10x faster. (in general that is). Other areas can be made faster as well and that's what i did in my attempt and that already cuts cuts the sorting time in half. The remainder really is naturalCompare being slow.

Also, when you look at your CPU usage you see 100% for 1 core so on a multicore systems this should be able to gain a lot for multithreading (more then 1/3rd), for single core systems it's just what it is. Slow.
Comment 16 Mark 2012-10-23 14:12:52 UTC
I do invite everyone to look at this thread i made on the KFM-Devel mailing list: http://lists.kde.org/?l=kfm-devel&m=135099596318315&w=2

I have quite a few ideas how to improve it and am already working on some of them. My goal is to get 200.000 files sorted under 1 second.

Please do continue discussing any further ideas in the mailing list.
Comment 17 Frank Reininghaus 2012-10-23 16:52:58 UTC
(In reply to comment #13)
> Created attachment 74722 [details]
> Multithreading Mergesort

Thanks for the patch! Looks nice, I didn't know that parallel sorting can be done in such a straightforward way :-)

Some ideas how you could make it better, such that we can include it before the feature freeze sets in:

1. Hard coding the number of threads to 2 or 4 and duplicating much of the code for both cases is not a good idea IMHO. Note that the average number of CPU cores is increasing, I don't want to add more code paths for 8 or 16 codes in a year or two ;-)

What do you think about the following idea:

a) Rename the old function KFileItemModelSortAlgorithm::sort(), which you call KFileItemModelSortAlgorithm::sortAlgorithm(), to KFileItemModelSortAlgorithm::sequentialSort().

b) Call the new parallel sorting function KFileItemModelSortAlgorithm::parallelSort(), rather than sort(), and add an additional int argument 'numberOfThreads'.

c) Create a new function KFileItemModelSortAlgorithm::sort(KFileItemModel* model, ...), which is the one called by KFileItemModel. This function is only a very small wrapper around parallelSort() and does the following:

const int numberOfThreads = QThread::idealThreadCount();
parallelSort(model, begin, end, numberOfThreads);

d) The implementation of parallelSort() looks similar to your function sort(), just that it divides the numberOfThreads by 2 and calls itself twice using QtConcurrent::run() (actually, only the first call needs to be done using QtConcurrent, the second one can be done synchronously, let's not create more threads than we need).

At the beginning, parallelSort() checks if numberOfThreads == 1 or the number of items is below a certain threshold (BTW, is there a reason why you chose 700?), and calls sequentialSort() in that case.

Could you implement those parts that look reasonable to you and submit a review request? It's easier to review non-trivial patches on ReviewBoard.

Thanks again for your help!

(In reply to comment #14)
> Another interesting sorting algorithm called "New Parallel Sorting Algorithm
> Based on Partitioning and Redistribution":
> http://scialert.net/fulltext/?doi=jas.2008.2341.2343
> 
> The great thing about this sorting algorithm is, that there is no merging
> needed at the end -> Saves a lot of model->lessThan() calls.

Even if no merging is needed at the end, you have to do partitioning in advance, and I don't see how this can be done with considerably less lessThan() calls than the merging :-/
Comment 18 Emmanuel Pescosta 2013-06-20 17:37:38 UTC
I think we can close this report now, because Dolphin is really fast in KDE 4.11 and will be even faster with every future release. ;)

A few words from my side:
Thanks to everybody who made this awesome and fast Dolphin 2.3 release happen! - Esp. Frank who helped me a lot with polishing my (big) patches and who did a very, very great maintaining and developing job (Speed-Ups nearly everywhere, reduced RAM usage, reworked item roles updating, ...)!

I am really looking forward to the KDE 4.11.0 release :)
Comment 19 Mark 2013-06-20 17:57:22 UTC
(In reply to comment #18)
> I think we can close this report now, because Dolphin is really fast in KDE
> 4.11 and will be even faster with every future release. ;)
> 
> A few words from my side:
> Thanks to everybody who made this awesome and fast Dolphin 2.3 release
> happen! - Esp. Frank who helped me a lot with polishing my (big) patches and
> who did a very, very great maintaining and developing job (Speed-Ups nearly
> everywhere, reduced RAM usage, reworked item roles updating, ...)!
> 
> I am really looking forward to the KDE 4.11.0 release :)

Let me add something to that in this report.

This single bug report really got me interested in quite extreme performance optimizations throughout the KDE file listing and representation classes (KDirLister, KDirModel, KFileItem and a few others). It's something that i'm still working on at this very moment and it begins to look real nice!

This report alone (and the ideas that followed it) really seem to have improved Dolphin massively in terms of speed.

And a massive thank you for Frank for making most of the improvements! I really like seeing those reviewrequests even though i rarely respond in there :)

For the bug, it's probably up to you and frank to decide. I personally would wait till 4.11 is out.
Comment 20 Frank Reininghaus 2013-06-27 09:05:25 UTC
Thanks to everyone who helped to make Dolphin use less resources!

I'd prefer to close report now - many things have been improved in the meantime. Of course, loading a huge folder may still take a few seconds, but at least it works quite nicely now after all items are shown. I just compared it with 4.10, where scrolling stil feels very sluggish for quite some time after the initial loading.

Moreover, this report has been used to discuss quite a few different things now, and the longer a bug report gets, the harder it is to understand what is going on. If anyone finds a performance problem.

I'm not saying that everying is perfect. There might still be quite a few things that can be improved performance-wise in Dolphin, but I think that loading 200000 items will never be really fast ;-)
Comment 21 Mark 2013-06-27 09:27:21 UTC
(In reply to comment #20)
> I'm not saying that everying is perfect. There might still be quite a few
> things that can be improved performance-wise in Dolphin, but I think that
> loading 200000 items will never be really fast ;-)

I dare to prove you wrong there :) (which i will do in time)
But you and most notably Emmanuel have done an outstanding job in making this better. Really awesome!