Bug 301470

Summary: Threading very slow
Product: [Applications] kmail2 Reporter: Bernd Paysan <bernd.paysan>
Component: message listAssignee: kdepim bugs <kdepim-bugs>
Status: RESOLVED UNMAINTAINED    
Severity: normal CC: Martin
Priority: NOR    
Version: 4.8.3   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Bernd Paysan 2012-06-08 19:53:48 UTC
I've a lot of E-mails.  I don't know how much you communicate, but it seems to be that the Kmail2 algorithm for threading the message list is not only O(n^(something>1)), but also other fundamental mistakes have been made.  I want to make a constructive comment on this:

Ok, first of all, threading is an O(n) algorithm.  You have n messages to walk through.  They are identified by the Message-Id header field.  You use this as hash index to store the message list in a hash table.  Now you thread the messages, which converts the list into a tree.  You use the References header field.  This field can have more than one reference, but the first hit is the "parent node".  So you insert the message as child into the first parent node you find, and be done with it.  After having walked through all messages (n items), you have a structured tree.  While inserting, you can go up each message thread to the top, and update a min/max date field (needed for sorting) there.

After you have created the tree, sort it by the criteria the user has selected.  These are things like "by first message in thread" or "by last message in thread", or "into buckets of days/weeks", which will tear the trees apart (as only the intra-day threading is taken into account).  This sorting is O(n log n), by nature of good sorting algorithms.

So, if you are done with sorting, *then* you start to actually create widgets for the messages.  No earlier.  And you do that *only* for the elements which are currently visible in the message list pane.  Not for all the other 10k messages, which are invisible.

At the moment, using Kmail2 is no fun.  A few years ago, Kmail used to be fun, because it was fast and reliable - and that was on hardware which was much slower than today's.  Handling mail directories with 10k entries was no problem.  The problems with threading started before it became Kmail2, but it got worse over time.  And now I'm fed up enough to report this as a bug.  It is "normal", because it does not actually crash, but it is no feature request.  This is a must for a user interface - not be sluggish and unresponsive.

To reproduce, subscribe to the Linux kernel mailing list or any other high-volume mailing list.  Wait a few days, until you hit a few thousand messages.  Then try to use Kmail2 on this mailing list.  If it takes more than a millisecond to thread and sort this amount of data on a contemporary processor, the program is broken.  Wait another few days to have the number of messages doubled.  If the time it takes know to thread and sort is significantly more than two times larger, the program is broken, as well.

A similar issue exists with Dolphin viewing folders with that many items.  I've reports of people who tried with a 30k entry folder, and it took a whole week for Dolphin until it showed the files.  Please, this is first semester computer science stuff - don't program O(n²) or worse, if there is an O(n) algorithm.
Comment 1 Martin Steigerwald 2015-09-11 19:32:01 UTC
Thank you Bernd for your report and sorry for taking such a long time to answer it.

It is about a version of KMail which uses Nepomuk and is unmaintained. Thus
closing. If you still see performance issues please open new reports. But
please follow the following guide lines to avoid unnecessary work for the
developers:

- Ideally test with KDEPIM and Akonadi 15.08. It contains some performance
improvements like the binary protocol.

- Otherwise at least use KDEPIM 4.14.10 and newest Akonadi 1.13 you can get as
it already contains some performance improvements.

- If you can wait, please retest with KDEPIM and Akonadi 15.12 once they become
available for you. Akonadi 15.12 will contain *massive* performance
improvements implemented by Dan due to new database indexes, optimized queries
and leveled file_db directory. All of these are in master already, so if you
dare use kdesrc-build to compile KF5, kdepim and kdepim-runtime. I am using
this currently and it basically moves the bottleneck to KMail (displaying message list
of huge folder). It is a *huge* improvement. And also Volker and Dan work to improve
message list display speed as well.

Thank you and greetings from KDE Randa Meetings,
Martin
Comment 2 Bernd Paysan 2015-09-11 21:30:40 UTC
Am Freitag, 11. September 2015, 19:32:01 schrieb Martin Steigerwald:
> - If you can wait, please retest with KDEPIM and Akonadi 15.12 once they
> become available for you. Akonadi 15.12 will contain *massive* performance
> improvements implemented by Dan due to new database indexes, optimized
> queries and leveled file_db directory. All of these are in master already,
> so if you dare use kdesrc-build to compile KF5, kdepim and kdepim-runtime.
> I am using this currently and it basically moves the bottleneck to KMail
> (displaying message list
> of huge folder). It is a *huge* improvement. And also Volker and Dan work to
> improve
> message list display speed as well.

I will retest with 15.12 when I find time to do that.
Comment 3 Martin Steigerwald 2015-09-11 21:36:26 UTC
Note that it will take till December for 15.12 to be released. Current git master, buildable with kdesrc-build has quite some improvements already. But of course you can also wait for 15.12 to appear in your distro.
Comment 4 Bernd Paysan 2015-09-11 23:19:41 UTC
Am Freitag, 11. September 2015, 21:36:26 schrieb Martin Steigerwald:
> https://bugs.kde.org/show_bug.cgi?id=301470
> 
> --- Comment #3 from Martin Steigerwald <Martin@Lichtvoll.de> ---
> Note that it will take till December for 15.12 to be released. Current git
> master, buildable with kdesrc-build has quite some improvements already. But
> of course you can also wait for 15.12 to appear in your distro.

Due to the slow speed, I've split up my folders; the largest one has 3500 
deeply threaded messages in it; at the moment (with a 3GHz Core i7, Kmail 
4.14.9), it takes 6 seconds to completely thread in interactive mode, and 
4.something in batch mode - you can expect that a slow machine with an Atom or 
so will take a minute or more for that task, which is certainly not tolerable.  
That's the only folder I've left which is somewhat slow; it used to be worse 
than that.  If you want to get some better feedback, I suggest you add a stop 
timer while threading and display the final timing after "Ready", bottom left.

I've some mailing lists which accumulate deeply threaded messages relatively 
quickly and will let them grow until I get 15.12.  If you want to get a folder 
full with lots of deeply threaded messages, just subscribe to the Linux kernel 
mailing list ;-).

I suspect the current threading code in 4.14.9 is at least O(n²) or worse, 
because all folders somewhat below 2000 messages are threaded fast enough.  If 
you need help with an O(n) algorithm for threading messages, I propose the 
following - requires only two linear walk through the list of messages:

First walk: Build a hash message id->message, so you can access each message 
by its id.  That walk you need to do only once when loading the messages from 
the backend.

Second walk: Go through the list of referenced message ids, and if you find a 
message, append it in the "nested messages" array/vector of that message 
object.  If there's no reference found, append it to the top-level 
array/vector.  Choose a data structure where append is an O(1) function.

Next: Sort the arrays from bottom (1st level messages) with the selected sort 
order, and walk the threading tree downwards to sort the sub-lists.  This is 
O(n log(n)).  To make it easier for displaying, in the return path of the tree 
walk, calculate the number of messages in each sub-thread, and annotate them 
with their absolute position in in the message list and their nesting depth 
(that information is easily available in a tree walk).

The last walk is then ready to display.

IMHO it should at most take some hundred microseconds to thread a few thousand 
messages that way, where the headers are all in memory, and selection of the 
right data structures.
Comment 5 Martin Steigerwald 2015-09-12 09:33:03 UTC
Bernd, thank you a lot for your comment and your attempt to help to move along with the threading code.

From what I gathered so far at the KDE Randa Meetings the threading code is basically very optimized. Current developers do not see many ways to speed it up. The reasons for the slow display of message lists of large folder are elsewhere. One reason was that KMail supports a different font for each tag. So the font handling took a lot of CPU cycles. And there are some other reasons as well, likely calculating sizes which are now estimated and some other parts. Volker actually profiled this code already.

Volker and Dan started to work on improvement KMail now that Akonadi from master shifted the bottleneck to KMail. And I already see some nice improvements. I can basically access a 43000 mail folder with threading in about 5 to 10 seconds and in another 10 seconds the threading basically completed.

I sure bet further improvements are possible and developers work on that. There are still some bottlenecks in the display and as displaying is incrementally updated it has an influence on how fast the threading is completed. Also there are ideas to only do it for the newest say 1000-2000 mails in each folder and include more mails on demand or in background. For that there are plans to basically move the threading inside Akonadi and even cache the results.

If you want to help to make it happen, I suggest you use kdesrc-build to build the latest code and ask in #kontact on Freenode or kdepim mailing list about it. I can help you with setting up your development environment in case you have questions or need hints to some documentation. For development questions you better contact Dan and Volker on mailing list or IRC.
Comment 6 Martin Steigerwald 2015-09-12 09:34:33 UTC
Also note: That there are ways to speed up threading by deselecting the grouping into days/months and so on. Also you can have more than one mail folder open in tabs and once the threading is completed you can just read mails in there nicely. This already works with KDEPIM 4.14 and Akonadi 1.13.