Bug 324925 - Already-existing MAILDIRs not getting indexed with KDE 4.11
Summary: Already-existing MAILDIRs not getting indexed with KDE 4.11
Status: RESOLVED UNMAINTAINED
Alias: None
Product: Akonadi
Classification: Frameworks and Libraries
Component: Nepomuk Feeder Agents (show other bugs)
Version: 4.11
Platform: Chakra Linux
: NOR major
Target Milestone: ---
Assignee: kdepim bugs
URL:
Keywords: regression, reproducible
Depends on:
Blocks:
 
Reported: 2013-09-15 10:33 UTC by whatifgodwasoneofus
Modified: 2015-03-17 12:40 UTC (History)
10 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description whatifgodwasoneofus 2013-09-15 10:33:32 UTC
Hi!

My old e-mail messages in the maildir folders do not get indexed.  The new ones are, as they arrive.
I think akonadi_nepomuk_feeder does not send to nepomuk all the due "notifications"  (sorry for this improper term).

If I remove and add back the maildir resource, the mails are still NOT indexed.
If I force a reindex via the "Folder Properties"  window, the mails are still NOT indexed.
If I create a NEW maildir resource and copy the old messages to the new one, MOST of them get indexed, others don't.
After two months playing around with this, I realised that it is just necessary to TOUCH the files within the maildir folders to get akonadi_nepomuk_feeder send "notifications" to nepomuk, which actually starts indexing, without the need to create a new maildir resource to move the old messages to.
BUT: Still, not all the messages are getting indexed. I have 80,000 mails and only the half of them gets indexed.

In the console terminal (and in ~/.xsession-errors) I can read a lot of these messages:
akonadi_nepomuk_feeder(1104) ItemQueue::fetchJobResult: Not all items were fetched:  2 100

So, I tried to touch less mails at once, splittng the nested maildir folders in smaller sub-sets, and it works better. In the end, I strongly assume that the touch-workaround works, but the huge number of updates cannot be handled by akonadi all at once.
It might also be related to the system limits, but I'm not sure how to change them.

I managed to find a solution by touching the files in one folder at the time, waiting for nepomuk to finish indexing, and then touching the ones in the next folder. That could be done with a script (attached below), querying the NepomukFeeder agent via qdbus.

Hope this helps!

NOTES:
1. You have to touch the FILES in the folders, not the folders themselves.
2. Several other people have been repeatedly reporting similar issues in the past, since 2010, but I think I've come closer to the problem with this version of KDE (4.11).


Reproducible: Always

Steps to Reproduce:
1. Add an already-existing maildir resource to akonadi

and/or:

1. right-click on the maildir resource in Kmail,
2. choose "Folder Properties"
3. go to the "Maintenance Tab" 
4. click "Force reindexing"

Actual Results:  
No e-mails in the maildir resource are indexed by nepomuk.

Expected Results:  
The e-mails in the maildir resource should be indexed by nepomuk.

* my ulimit configuration:
ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 30
file size               (blocks, -f) unlimited
pending signals                 (-i) 96113
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 99
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 96113
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

* my kernel:
Linux luna 3.10.10-1-CHAKRA #1 SMP PREEMPT Thu Aug 29 21:57:49 UTC 2013 x86_64 GNU/Linux

* I have 12 GB RAM and a quad-core i7 @3 GHz.

* The script I used to force reindexing:
#!/bin/bash

LISTA=(
# full path of the "cur" folders in the several maildir resources
# one folder per line, within double quotes
# You can get a list of folders with something like:
# find ~/Mail -type d -name cur -exec echo \"{}\" \;
"~/Mail/Friends/cur"
"~/Mail/.Friends.directory/closest ones/cur"
)

function wait_for_start_of_indexing()
{
   ciccio=0
   while [[ "`qdbus org.freedesktop.Akonadi.Agent.akonadi_nepomuk_feeder  / org.freedesktop.Akonadi.NepomukFeeder.isIndexing`" == "false" ]]; do
      (( ciccio > 2 )) && break # wait max 120 seconds = 2 minutes!
      sleep 60
      (( ciccio++ ))
   done
}

function wait_for_end_of_indexing()
{
   ciccio=0
   while [[ "`qdbus org.freedesktop.Akonadi.Agent.akonadi_nepomuk_feeder  / org.freedesktop.Akonadi.NepomukFeeder.isIndexing`" == "true" ]]; do
      (( ciccio > 10 )) && break # wait max 600 seconds = 10 minutes!
      sleep 5
      (( ciccio++ ))
   done
}

# Touch all files in a maildir folder,
# wait for nepomuk to start indexing
# wait for nepomuk to stop indexing
# and then move to the next folder

for ((i=0; i<${#LISTA[@]}; i++)); do
   echo  ${LISTA[i]}
   echo touch \"${LISTA[i]}/\"\* | bash
   wait_for_start_of_indexing
   wait_for_end_of_indexing
done

########### end of script ###########
Comment 1 Kishore 2013-11-06 15:31:18 UTC
I have a similar problem but with a disconnected IMAP resource. Is there a workaround that i can use? Is my problem related to this bug?
Comment 2 Jörg Schaible 2013-11-06 18:32:16 UTC
I can confirm this behavior for KMail 4.10.5 (see http://thread.gmane.org/gmane.comp.kde.users.pim/21868)
Comment 3 Patrick 2013-11-21 14:54:32 UTC
I'm seeing the same issue here on KDE 4.11.3, Arch Linux x86_64.
Comment 4 koyukuk 2013-12-05 14:29:31 UTC
Same here on debian testing, KDE 4.11.3.
When using nepomukpimindexerutility to index a mail I get the message: 

nepomukpimindexerutility(12977) FeederPluginloader::feederPluginsForMimeType: No feeder for type  "inode/directory"  found 
nepomukpimindexerutility(12977) ItemQueue::fetchJobResult: Not all items were fetched:  0 1

So maybe the pimindexer does not index Maildirs because it lacks a plugin and the fileindexer doesn't know it is mail and therefore can't tell akonadi/kmail?
Comment 5 luisfe 2013-12-07 10:22:09 UTC
I think that the problem is not the akonadi_nepomuk_feeder but Akonadi::ItemFetchJob that does not fetch items.

Say you have an akonadi item 123456 that want to be passed to Nepomuk, try

qdbus org.freedesktop.Akonadi.Agent.akonadi_nepomuk_feeder / org.freedesktop.Akonadi.NepomukFeeder.forceReindexItem 123456

You will get something like

akonadi_nepomuk_feeder(5709) ItemQueue::fetchJobResult: Not all items were fetched:  0 1

It happens with mails, contacts and everything.
Comment 6 Daniel Vrátil 2013-12-09 11:32:28 UTC
The Akonadi::ItemFetchJob invoked from ItemQueue in Nepomuk Feeder fetches only cached items - if there's an item that does not have the payload cached in Akonadi yet, it will be skipped.

Please open Akonadi console and in "DB Console" tab, run following query:

SELECT PartTable.*, PartTypeTable.* FROM PartTable LEFT JOIN PartTypeTable ON PartTable.partTypeId = PartTypeTable.id WHERE PartTable.PimItemId = 123456

Replacing 123456 with ID of an item you know does not work, post the output here (or screenshot, the table is hard to copy)

Then, switch to Raw Socket tab and send following commands one by one

0 LOGIN test 
1 CAPABILITY (NOPAYLOADPATH) 
2 UID FETCH 123456 (CACHEONLY EXTERNALPAYLOAD PLD:ENVELOPE PLD:HEAD PLD:RFC822 ALLATTRIBUTES) 

And paste here the replies from the server. Remember to strip sensitive information. You can string content of the reply, just make sure you keep strings like PLD:HEAD[1] {123} (they can appear in the middle of the reply, too)
Comment 7 luisfe 2013-12-09 17:06:17 UTC
OK,  using akonadi master, for the DB query I get

id	pimItemId	partTypeId	data	datasize	version	external	id	name	ns
57869	28999	9	("Thu, 29 Sep 2011 21:21:29 -0300" "Re: diciembre" …	362	1	0	9	ENVELOPE	PLD
57869	28999	10		0	1	0	10	HEAD	PLD
80978	28999	11		0	1	0	11	RFC822	PLD

So headers and full mail is not in the database. Now, I query on the raw socket, got:

* OK Akonadi Almost IMAP Server [PROTOCOL 35] 
0 LOGIN test 
0 OK User logged in 
1 CAPABILITY (NOPAYLOADPATH) 
1 OK CAPABILITY completed 
2 UID FETCH 28999 (CACHEONLY EXTERNALPAYLOAD PLD:ENVELOPE PLD:HEAD PLD:RFC822 ALLATTRIBUTES) 
* 28999 FETCH (UID 28999 REV 4 MIMETYPE "message/rfc822" COLLECTIONID 86 PLD:ENVELOPE[1] {362} 
...here appears the envelope as in the database...

PLD:HEAD[1] {3579} 

...Many headers of the message...

PLD:RFC822[1] {5051} 

...Full message here...

2 OK UID FETCH completed 



NOW, after performing the raw queries, the item is found by nepomuk, if I perform a new database query I get

id	pimItemId	partTypeId	data	datasize	version	external	id	name	ns
57869	28999	9	("Thu, 29 Sep 2011 21:21:29 -0300" "Re: diciembre" …	362	1	0	9	ENVELOPE	PLD
57869	28999	10	"headers here"	3579	1	0	10	HEAD	PLD
80978	28999	11	80978_r0	5051	1	1	11	RFC822	PLD

And 80978_r0 is a copy of the mail in akonadi/file_db_data cache folder. If I restart akonadi, the cached file disappears, the database query gives again the first answer. 

id	pimItemId	partTypeId	data	datasize	version	external	id	name	ns
57869	28999	9	("Thu, 29 Sep 2011 21:21:29 -0300" "Re: diciembre" …	362	1	0	9	ENVELOPE	PLD
57869	28999	10		0	1	0	10	HEAD	PLD
80978	28999	11		0	1	0	11	RFC822	PLD

But now the difference is that nepomuk stores the information of the mail.
Comment 8 Vishesh Handa 2015-03-17 12:40:17 UTC
The Nepomuk project is no longer maintained in KDE since 4.13. For email indexing, Baloo provided an Akonadi resource to index emails, contacts and events. Tags are now maintained by Akonadi itself.