Bug 332196 - baloo does not index all mails
Summary: baloo does not index all mails
Status: RESOLVED FIXED
Alias: None
Product: Baloo
Classification: Frameworks and Libraries
Component: General (show other bugs)
Version: unspecified
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: Vishesh Handa
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-15 21:26 UTC by Martin Koller
Modified: 2014-05-12 07:38 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In: 1.12.0


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Koller 2014-03-15 21:26:26 UTC
I have installed 4.12.90 yesterday night. I'm using kmail with a maildir folder structure which contains about 83000 mails.
My inbox contains currently 192 mails but in the properties I see that only 23 of them are indexed. Clicking "Force reindexing" does not change anything.
I have 7 subfolders of inbox - none of them has a single mail indexed.
In some other folders where I have a few thousand emails, only ONE is indexed (as shown in the properties dialog).
The PC did run for the last 8 hours.
Filtering in kmails quick-search has become useless.

Reproducible: Always
Comment 1 Vishesh Handa 2014-03-15 22:56:58 UTC
This is strange. Email indexing is very very fast these days.

Please do the following -
1. Try searching for emails via $ baloosearch -e "some words" and see if any email is found
2. Check the number of emails indexed by running `delve .` in .kde4/share/apps/baloo/email/
3.  Try enabling all debug messages via kdebugdialog, restart akonadi and see if you can see some messages of emails being indexed. They are typically of the form - "indexing <subject". You can restart akonadi via akonadictl restart
4. Try reseting it by removing the baloorc config file and restarting akonadi.

Maybe one of these will help us diagnose the issue better.
Comment 2 Martin Koller 2014-03-15 23:23:48 UTC
ad 1) Yes, it finds something (although if I search e.g. for my mail address "kollix" it just finds 10 mails, which definitely can't be all)

ad 2) (after finding delve to be a tool of xapian):
UUID = c1c253c2-8110-4158-9fbc-19c328e42262
number of documents = 1370
average document length = 1982.79
document length lower bound = 395
document length upper bound = 97233
highest document id ever used = 184606
has positional information = false

ad 3) I only see the "indexing" debug output when I click on a mail. Even clicking on "Force reindexing" does nothing.

ad 4) after removing baloorc and restarting akonadi, I got some more "indexing" lines. I counted all lines
which include "EmailIndexer::process: Indexing" and got only 1042.
The mails I see it indexes are from different folders but not that many (I have a tree with more than 200 folders)
Don't know if related but before the last output line from "EmailIndexer::process: Indexing" I see 10 lines like this:
akonadi_baloo_indexer(23689) ContactIndexer::index: Indexing "" ""
(What I find strange are the two empty strings)
Comment 3 Vishesh Handa 2014-03-16 09:44:15 UTC
We have been discussing this -

[00:46:29] <vHanda> dvratil_: please take a look at this if you get a chance - https://bugs.kde.org/show_bug.cgi?id=332196
[00:46:30] <bugbot> KDE bug 332196 in Baloo (General) "baloo does not index all mails" [Normal,Unconfirmed: ] 
[00:48:08] <dvratil_> hmm, first, we should either implement reindexCollection() in Baloo, or remove the button from KMail until it's done
[00:48:25] -*- vHanda votes for remove
[00:49:27] -*- dvratil_ agrees
[00:52:05] <dvratil_> the debug output from ContactIndexer should probably print email. Name and nickname are not usually filled, but email is almost always
[00:53:39] <dvratil_> the problem can only be either broken Akonadi fetch. If Baloo would fail indexing, there would be error messages from Xapian
[00:54:21] <vHanda> a broken Akonadi fetch seems unlikely considering that he can see all his emails
[00:54:24] <vHanda> the code is the same
[00:54:35] <dvratil_> fair point
[00:56:38] <vHanda> the only difference that I can see is the "fetchedSince" option
[00:57:23] <dvratil_> which is not set when you remove baloorc
[00:57:46] <vHanda> right :/
[01:04:36] <dvratil_> vHanda, I think it could be because of setCacheOnly()
[01:05:07] <dvratil_> Martin is using local maildir AFAIK, which automatically expires all item parts after 5 minutes
[01:05:45] <dvratil_> so in most cases, the fetch job won't contain any bodies
[01:14:27] <dvratil_> which represents a huge problem: we can't disable cacheOnly, because it would download all message on  non-disconnected IMAP accounts
[01:15:40] <dvratil_> but with cacheOnly enabled, we can't index stuff from resources with cache expiration timeout
Comment 4 Daniel Vrátil 2014-03-17 16:44:29 UTC
Git commit 88fa239674e6b40ffd4a21b8b87f459b4f55d8e5 by Dan Vrátil.
Committed on 17/03/2014 at 16:42.
Pushed by dvratil into branch '1.12'.

Allow FETCH of items from local resources to ignore CacheOnly parameter

Resources that use local storage and have X-Akonadi-Custom-HasLocalStorage=true
entry in their .desktop files are considered 'local'. These resources often have
cacheTimeout set to very low value so that data are not held in Akonadi unnecessary,
because loading them on-demand from the storage is very cheap and fast. However
this causes problem with Baloo, which cannot index content of these items that are
not cached in Akonadi.

For this reason we introduce this hack, that allows FetchHelper to ignore CacheOnly
flag in case all the queries items belong to a local resource. As a result, when Baloo
asks for items from a collection, Akonadi will retrieve payload of these items
from the resources first.

In long term, we need to come up with a better solution, because ItemRetriever
does not really scale when it comes to retrieving many items.
FIXED-IN: 1.12.0

M  +47   -1    server/src/handler/fetchhelper.cpp
M  +1    -0    server/src/handler/fetchhelper.h

http://commits.kde.org/akonadi/88fa239674e6b40ffd4a21b8b87f459b4f55d8e5
Comment 5 Daniel Vrátil 2014-03-17 16:50:08 UTC
Git commit e4aab93dee22e53bdfd3156dbc7dc675bb86a2c6 by Dan Vrátil.
Committed on 17/03/2014 at 16:49.
Pushed by dvratil into branch 'KDE/4.13'.

Add X-Akonadi-Custom-HasLocalStorage to maildir and mixedmaildir resources

This flag tells Akonadi server that it can retrieve items payload from these
resources even when the fetch operation has CacheOnly flag.

M  +1    -0    resources/maildir/maildirresource.desktop
M  +1    -0    resources/mixedmaildir/mixedmaildirresource.desktop

http://commits.kde.org/kdepim-runtime/e4aab93dee22e53bdfd3156dbc7dc675bb86a2c6
Comment 6 fsfbugs 2014-04-27 22:40:45 UTC
I'm using the openSUSE's KDE:Current packages which include both of the commits above but Baloo is still not indexing all of my emails.

Unlike Martin Koller, I am using IMAP rather than maildir resources.
Comment 7 fsfbugs 2014-04-27 22:45:34 UTC
I can also confirm bug 333798 which is probably related.
Comment 8 Erasmo Caponio 2014-04-29 09:27:25 UTC
(In reply to comment #6)
> I'm using the openSUSE's KDE:Current packages which include both of the
> commits above but Baloo is still not indexing all of my emails.
> 
> Unlike Martin Koller, I am using IMAP rather than maildir resources.

I can confirm this. It seems to affect only imap resources
Comment 9 Ivan Adzhubey 2014-05-04 17:21:53 UTC
I can confirm this issue too (Kubuntu 14.04 LTS, KDE 4.13).

I am using local IMAP server (dovecot) as my mail aggregation/storage engine, with getmails script set up for fetching messages from remote accounts. Quick search option in KMail at most finds 1-2 messages in a folder, out of thousands which match the search term(s), confirmed by comparing to search results in the Thunderbird mail client. Strangely enough, full message search (Tools->Search messages...) works fine and seems to be able to hit all relevant messages.