Bug 334280 - IMAP Synchronisation with GMAIL takes 13 minutes to complete !!
Summary: IMAP Synchronisation with GMAIL takes 13 minutes to complete !!
Status: RESOLVED FIXED
Alias: None
Product: Akonadi
Classification: Unclassified
Component: IMAP resource (show other bugs)
Version: GIT (master)
Platform: unspecified Linux
: NOR grave (vote)
Target Milestone: ---
Assignee: Christian Mollekopf
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-03 14:46 UTC by Raymond Wooninck
Modified: 2014-05-20 23:11 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Debug output of the imap resource (648.86 KB, text/x-log)
2014-05-03 14:48 UTC, Raymond Wooninck
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Raymond Wooninck 2014-05-03 14:46:58 UTC
The current state of the IMAP resource in git master delivers an unacceptable synchronization time for a Gmail resource. It takes around 13 minutes to do one synchronization. I have attached the debug log of the IMAP resource to this bug report, but what I have noticed is that during most of these 13 minutes there is no disk activity, no network connectivity and also the progress meter in kmail just stays at 0. The only thing happening during this period is that the debuglog shows the line BatchFetcher::fetchNextBatch: Fetching  <x>  to  <y>.  

Strangely enough every mail check these counters always start from 1. 

I already made sure that all Archived emails in gmail.com itself was deleted and I recreated the gmail imap resource, but this all didn't help. 

At this moment it seems that it is just the gmail connectivity that is affected.

Reproducible: Always
Comment 1 Raymond Wooninck 2014-05-03 14:48:36 UTC
Created attachment 86429 [details]
Debug output of the imap resource
Comment 2 Raymond Wooninck 2014-05-03 17:13:04 UTC
I reverted the Christian's commits from Friday afternoon and tried to see where that would bring the situation.  After the revert, Gmail synchronization is done in about 1 minute, which is acceptable. 

Comparing the logs then I can see the following 

Git master:  
Starting retrieval for  "INBOX"
Starting message retrieval. Elapsed(ms):  161
MessageCount:  626 Local message count:  626
UidNext:  59721 Local UidNext:  59721
HighestModSeq:  0 Local HighestModSeq:  0
Listing flags  1 59721
Starting flag retrieval. Elapsed(ms):  162
Fetching  1  to  100
....
Retrieval complete. Elapsed(ms):  27321

With the revert I see 
Starting retrieval for  "INBOX"
Starting message retrieval. Elapsed(ms):  167
Listing flags  1 59726
Starting flag retrieval. Elapsed(ms):  167
Retrieval complete. Elapsed(ms):  3470

So the inbox is with the latest git master retrieved in 27 seconds, while before the commits the inbox was retrieved in about 3,5 seconds.
Comment 3 Christian Mollekopf 2014-05-04 09:26:23 UTC
The problem with gmail is that it's CONDSTORE support is broken (I think it was something with the tag folders?), and we therefore have to fetch all flags on every sync. I think the resource simply ignored flag changes previously if there were no new messages. I can readd a similar workaround if CONDSTORE is not available (or broken as on GMAIL).

Sorry if I introduced performance problems, but I'm generally preferring correctness over performance, but we can re-add optimizations as necessary.
Comment 4 Raymond Wooninck 2014-05-04 12:46:14 UTC
(In reply to comment #3)
> The problem with gmail is that it's CONDSTORE support is broken (I think it
> was something with the tag folders?), and we therefore have to fetch all
> flags on every sync. I think the resource simply ignored flag changes

But shouldn't this generate network traffic ? Because the flags needs to be retrieved from the server, but during this period I do not see any traffic at all until it reaches the point where it can retrieve some new email. 

> Sorry if I introduced performance problems, but I'm generally preferring
> correctness over performance, but we can re-add optimizations as necessary.
Correctness is fine, but in this world Performance is more important to the users. And it seems that this correctness also creates the situation that the IMAP resource is no longer fast enough and starts blocking itself. (As reported in the other bug).  

I guess that the IMAP resource now has reached a point where its developers should decide what it actually supports. If the requirement is that it needs a correct CONDSTORE, then support for gmail should be dropped. Maybe moving the gmail support to a separate resource would be much better, so that this specific gmail resource can have all the things required to support a broken GMAIL.
Comment 5 Christian Mollekopf 2014-05-04 17:34:41 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > The problem with gmail is that it's CONDSTORE support is broken (I think it
> > was something with the tag folders?), and we therefore have to fetch all
> > flags on every sync. I think the resource simply ignored flag changes
> 
> But shouldn't this generate network traffic ? Because the flags needs to be
> retrieved from the server, but during this period I do not see any traffic
> at all until it reaches the point where it can retrieve some new email. 
> 

This should generate network traffic indeed.

> > Sorry if I introduced performance problems, but I'm generally preferring
> > correctness over performance, but we can re-add optimizations as necessary.
> Correctness is fine, but in this world Performance is more important to the
> users. And it seems that this correctness also creates the situation that
> the IMAP resource is no longer fast enough and starts blocking itself. (As
> reported in the other bug).  
> 

That's just a regular bug that needs to be fixed.

> I guess that the IMAP resource now has reached a point where its developers
> should decide what it actually supports. If the requirement is that it needs
> a correct CONDSTORE, then support for gmail should be dropped. Maybe moving
> the gmail support to a separate resource would be much better, so that this
> specific gmail resource can have all the things required to support a broken
> GMAIL.

The IMAP resource supports vanilla IMAP plus extensions as required/useful. If we have to employ certain workarounds in order to make the experience better with servers that don't implement all the necessary extensions, then I'm open to that as long as it doesn't become a maintenance burden.

Daniel is already working on a specialized IMAP resource that implements the GMAIL specific extensions, so that will be covered eventually.

But I'd still expect the plain IMAP resource to work reasonably well with GMAIL as long as they provide regular IMAP.
Comment 6 Raymond Wooninck 2014-05-04 19:06:29 UTC
(In reply to comment #5)
> This should generate network traffic indeed.

Strangely enough I didn't notice any network traffic during this period. As indicated network traffic is only visible around the end.  
> 
> That's just a regular bug that needs to be fixed.

Ok. Just strange that it seems only to hit the gmail IMAP. 

> Daniel is already working on a specialized IMAP resource that implements the
> GMAIL specific extensions, so that will be covered eventually.

That is great news :) I will let him know that he already found a test person for it :)
 
> But I'd still expect the plain IMAP resource to work reasonably well with
> GMAIL as long as they provide regular IMAP.

Ok.
Comment 7 Christian Mollekopf 2014-05-04 19:41:25 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > This should generate network traffic indeed.
> 
> Strangely enough I didn't notice any network traffic during this period. As
> indicated network traffic is only visible around the end. 

Please let me know if the situation is improved or still the same with the fix for the other bug.

Flag fetching definitely is slower because I'm currently fetching the flags just as messages in batches of 100 to avoid overloading akonadi. This obviously results in many roundtrips, and I suppose the batchsize could be greatly increased for flags (because a lot less data than full messages).

> > 
> > That's just a regular bug that needs to be fixed.
> 
> Ok. Just strange that it seems only to hit the gmail IMAP. 
> 

It was a timing issue, and it only appeared if the network connection is the bottleneck. Apparently gmail is slow ;-)

> > Daniel is already working on a specialized IMAP resource that implements the
> > GMAIL specific extensions, so that will be covered eventually.
> 
> That is great news :) I will let him know that he already found a test
> person for it :)
>  
> > But I'd still expect the plain IMAP resource to work reasonably well with
> > GMAIL as long as they provide regular IMAP.
> 
> Ok.
Comment 8 Raymond Wooninck 2014-05-04 20:09:44 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > This should generate network traffic indeed.
> > 
> > Strangely enough I didn't notice any network traffic during this period. As
> > indicated network traffic is only visible around the end. 
> 
> Please let me know if the situation is improved or still the same with the
> fix for the other bug.

With the fix for the other bug, the synchronization went back to about 5 minutes and now I see network traffic during the flag retrieval. 

What exactly does the number mean in this line : akonadi_imap_resource_16(30157)/kdepimlibs (kimap) BatchFetcher::fetchNextBatch: Fetching  257301  to  257400

Does that mean that I have 257400 emails in that folder ? If so, then something else is wrong. I know that I have a number of high volume email lists (e.g. kde-commit@kde.org), but I am deleting the emails from there every 3 days to reduce the total number of emails. I also cleaned up in gmail.com itself, so I shouldn't have those many emails.

> Flag fetching definitely is slower because I'm currently fetching the flags
> just as messages in batches of 100 to avoid overloading akonadi. This
> obviously results in many roundtrips, and I suppose the batchsize could be
> greatly increased for flags (because a lot less data than full messages).

Can I increase the number somewhere ?  Just to test if I can find a better number for my system and to retrieve things a little faster ?
Comment 9 Christian Mollekopf 2014-05-04 20:24:02 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > (In reply to comment #5)
> > > > This should generate network traffic indeed.
> > > 
> > > Strangely enough I didn't notice any network traffic during this period. As
> > > indicated network traffic is only visible around the end. 
> > 
> > Please let me know if the situation is improved or still the same with the
> > fix for the other bug.
> 
> With the fix for the other bug, the synchronization went back to about 5
> minutes and now I see network traffic during the flag retrieval. 
> 
> What exactly does the number mean in this line :
> akonadi_imap_resource_16(30157)/kdepimlibs (kimap)
> BatchFetcher::fetchNextBatch: Fetching  257301  to  257400
> 

It means UID's 257301 to 257400 are fetched which only correlates to the number of messages that ever have been in the folder (not to the amount you currently have).

> Does that mean that I have 257400 emails in that folder ? If so, then
> something else is wrong. I know that I have a number of high volume email
> lists (e.g. kde-commit@kde.org), but I am deleting the emails from there
> every 3 days to reduce the total number of emails. I also cleaned up in
> gmail.com itself, so I shouldn't have those many emails.
> 
> > Flag fetching definitely is slower because I'm currently fetching the flags
> > just as messages in batches of 100 to avoid overloading akonadi. This
> > obviously results in many roundtrips, and I suppose the batchsize could be
> > greatly increased for flags (because a lot less data than full messages).
> 
> Can I increase the number somewhere ?  Just to test if I can find a better
> number for my system and to retrieve things a little faster ?

In imapresource.cpp (setItemSyncBatchSize) we define the batch size. This directly correlates to the number of messages that are loaded into memory, so I wouldn't make this one too big.

In retrieveitemstask.cpp:668 you could increase the passed-in batch-size for flag fetching only.
I think it should work if you simply set a multiplier (e.g. 10*batchSize()).

It's not configurable in a configfile.
Comment 10 Raymond Wooninck 2014-05-05 08:36:53 UTC
(In reply to comment #9)
> It means UID's 257301 to 257400 are fetched which only correlates to the
> number of messages that ever have been in the folder (not to the amount you
> currently have).

So the number of flags that are going to be retrieved is only increasing with time ?  I would have expected that the IMAP resource would not start retrieving flags for messages that are no longer existing. Because at this moment I see that the fetching starts with number 1 to 100 and then moves up to 100 to 200, etc. So in the above example 257400 flags are trying to be fetched despite that I have only maybe 100 messages in that particular folder. Which is quite an overkill, in my opinion. 

I am currently trying to increase the batchsizes, but what I have noticed is that the main issue comes from those high-volume folders where the fetching starts from 1 and goes up to 257400 (regardless of how many emails are actually in that folder (local and server). So it seems that Akonadi is just keeping the latest message number and the IMAP resource will always start from 1. In case of an empty folder, this could mean that the IMAP resource is spending 1 or more minutes on trying to retrieve thousands of non-existing flags. I am sure that there must be a better way to retrieve only the flags for existing messages.
Comment 11 Christian Mollekopf 2014-05-05 21:11:01 UTC
Git commit 7f3e279cf79ef8e24eb64986d538e0075452b91c by Christian Mollekopf.
Committed on 05/05/2014 at 21:10.
Pushed by cmollekopf into branch 'master'.

IMAP-Resource: Check available UID's first & retrieve 50 time more flags.

This way we avoid the UID fragmentation problem (we try to fetch a lot of
messages that are not available on the server).
It's also a nice first step towards only syncing a certain time-frame.

Retrieving 50 times more flags than full messages should still result in
low memory consumption while greatly reducing roundtrips.

M  +68   -33   resources/imap/retrieveitemstask.cpp
M  +49   -25   resources/imap/tests/testretrieveitemstask.cpp

http://commits.kde.org/kdepim-runtime/7f3e279cf79ef8e24eb64986d538e0075452b91c
Comment 12 Raymond Wooninck 2014-05-06 08:45:05 UTC
Issue has been resolved.  Synchronizing my GMAIL account is now done is the same time as with KDE 4.13.0.
Comment 13 Christian Mollekopf 2014-05-20 23:11:53 UTC
Git commit 7802a783189adbe4e3e67cd9bb60867c82c36d69 by Christian Mollekopf.
Committed on 05/05/2014 at 21:10.
Pushed by cmollekopf into branch 'kolab/integration/4.13.0'.

IMAP-Resource: Check available UID's first & retrieve 50 time more flags.

This way we avoid the UID fragmentation problem (we try to fetch a lot of
messages that are not available on the server).
It's also a nice first step towards only syncing a certain time-frame.

Retrieving 50 times more flags than full messages should still result in
low memory consumption while greatly reducing roundtrips.

M  +68   -33   resources/imap/retrieveitemstask.cpp
M  +49   -25   resources/imap/tests/testretrieveitemstask.cpp

http://commits.kde.org/kdepim-runtime/7802a783189adbe4e3e67cd9bb60867c82c36d69