Bug 307807

Summary: Insane abuse of nepomuk causes power abuse and makes the system unusable
Product: [Frameworks and Libraries] Akonadi Reporter: Anders Lund <anderslund>
Component: Nepomuk Feeder AgentsAssignee: kdepim bugs <kdepim-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: chrigi_1, hephooey_dev, me, vkrause
Priority: NOR    
Version: 4.9   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In: 4.11
Sentry Crash Report:

Description Anders Lund 2012-10-04 06:22:53 UTC
The insane usage of nepomuk by akonadi_nepomuk_feeder is causing a lot of problems.

* The system becomes irresponsible and sluggish. 
* I have to stop akonadi to view a video.
* Power abuse. Bad for the environment, the economy and general helath of users
* Noise caused by the fan running way too muh, causes stress


Reproducible: Always
Comment 1 Christophe Marin 2012-10-04 10:52:52 UTC
Christian, is there a specific reason for not using nepomuk-core in kdepim-runtime 4.9 ?
Comment 2 Christian Mollekopf 2012-10-04 11:05:02 UTC
We are using nepomuk-core (Nepomuk2). The wrapper generator has not yet been ported to nepomuk-core AFAIK, therefore we're still using the pre-generated versions in dms-copy (But they're all using nepomuk-core.
Comment 3 Christophe Marin 2012-10-04 11:57:29 UTC
Then I'm lost:

/kde/src/pim/kdepim-runtime/agents (4.9) # wcgrep -i nepomuk2 |wc -l
0

/kde/src/pim/kdepim-runtime/agents (master) # wcgrep -i nepomuk2 |wc -l
535

We don't even look for nepomuk-core in kdepim-runtime 4.9 (commit 13e481b is only in master)
Comment 4 Anders Lund 2012-10-04 12:01:29 UTC
Of course it is very good to ensure that the right nepomuk code is used.

But it is equally importnat that ALL aspects of this area is optimized. For example there is a patch waiting that should prevent reindexing when flags are changed. 

On my system, my owncloud resources are often dropped, and judging from the virtuoso-t activity when they are rediscovered, they are also reindexed again. Quite absurd, for my calendars  + contacts! Let me know if I can help investigate or if my log files can help!

What happens with new mail? Receiving a few mails costs minutes of virtuoso-t activity > 30-50% CPU here, and that can not be reasonable, even given virtuoso-t or nepomuk inefficiency - indexing many times the amount of images or documents is barely visible. So I can't help wondering if messages are indexed more than once, for example when moved by a filter. If that is the case, there is a low-hanging fruit to pick!

Mail that is set as spam by my bogofilter should NEVER be indexed, so can trust that mail in trash folders are never indexed right? But that is not enough, as long as filtering is not always trustworthy itself, often spam messages are left in the inbox, and it is still silly to index it. Is there a header I can add to prevent that from happening? (and that the spam filtering wizard could add to the filters it creates!)
Comment 5 Christian Mollekopf 2012-10-04 12:43:57 UTC
(In reply to comment #3)
> Then I'm lost:
> 
> /kde/src/pim/kdepim-runtime/agents (4.9) # wcgrep -i nepomuk2 |wc -l
> 0
> 
> /kde/src/pim/kdepim-runtime/agents (master) # wcgrep -i nepomuk2 |wc -l
> 535
> 
> We don't even look for nepomuk-core in kdepim-runtime 4.9 (commit 13e481b is
> only in master)

You are of course right, I was thinking of master, sorry.
If we can depend on nepomuk-core in 4.9 already there is no specific reason. I think nepomuk-core simply came a little late in the process so I didn't port it.
Comment 6 Christian Mollekopf 2012-10-04 12:54:23 UTC
(In reply to comment #4)
> Of course it is very good to ensure that the right nepomuk code is used.
> 
> But it is equally importnat that ALL aspects of this area is optimized. For
> example there is a patch waiting that should prevent reindexing when flags
> are changed. 
> 
> On my system, my owncloud resources are often dropped, and judging from the
> virtuoso-t activity when they are rediscovered, they are also reindexed
> again. Quite absurd, for my calendars  + contacts! Let me know if I can help
> investigate or if my log files can help!
> 

Not sure what you mean by the resources are "dropped", but if they go offline and come online again, and there is no new data, there should also be no indexing happening.

> What happens with new mail? Receiving a few mails costs minutes of
> virtuoso-t activity > 30-50% CPU here, and that can not be reasonable, even
> given virtuoso-t or nepomuk inefficiency - indexing many times the amount of
> images or documents is barely visible. So I can't help wondering if messages
> are indexed more than once, for example when moved by a filter. If that is
> the case, there is a low-hanging fruit to pick!
> 

As long as the ID of the akonadi-item doesn't change there shouldn't be any reindexing going on.

> Mail that is set as spam by my bogofilter should NEVER be indexed, so can
> trust that mail in trash folders are never indexed right? But that is not
> enough, as long as filtering is not always trustworthy itself, often spam
> messages are left in the inbox, and it is still silly to index it. Is there
> a header I can add to prevent that from happening? (and that the spam
> filtering wizard could add to the filters it creates!)

The feeder looks for a $JUNK flag to filter spam, not sure where this flag is set exactly.
Comment 7 Anders Lund 2012-10-04 12:58:37 UTC
Torsdag den 4. oktober 2012 12:54:23 skrev du:
> > On my system, my owncloud resources are often dropped, and judging from
> > the
> > virtuoso-t activity when they are rediscovered, they are also reindexed
> > again. Quite absurd, for my calendars  + contacts! Let me know if I can
> > help investigate or if my log files can help!
> >
> > 
> 
> Not sure what you mean by the resources are "dropped", but if they go
> offline and come online again, and there is no new data, there should also
> be no indexing happening.

When this happens, contacts are not recognized in groups in my "personal 
contacts" resource any longer, which indicates to me - togeather with the 
intensive virtuoso-t activity - that something bad is going on. 

There is no new data though, data changes caused by me changing or adding 
contacts from another device does not cause problems.

Anders
Comment 8 Anders Lund 2012-10-04 13:01:31 UTC
Torsdag den 4. oktober 2012 12:54:23 skrev du:
> As long as the ID of the akonadi-item doesn't change there shouldn't be any
> reindexing going on.

Good to know
 
> > Mail that is set as spam by my bogofilter should NEVER be indexed, so can
> > trust that mail in trash folders are never indexed right? But that is not
> > enough, as long as filtering is not always trustworthy itself, often spam
> > messages are left in the inbox, and it is still silly to index it. Is
> > there
> > a header I can add to prevent that from happening? (and that the spam
> > filtering wizard could add to the filters it creates!)
> 
> The feeder looks for a $JUNK flag to filter spam, not sure where this flag
> is set exactly.

Is there any way I can veryfy that it exists?

Anders
Comment 9 Anders Lund 2012-10-06 16:50:30 UTC
More aspcecs of this area of problems:

* Why does akonadi INSIST on feeding nepomuk while I use my system heavily, like compiling, prossessing images or video etc. I have gotten into the habit of stopping akonadi, when I want to work! :0 Why not be on the good side, there are tools for it!

* Why does akonadi INSIST on abusing my CPU while on battery? Do like the indexing system, and wait until the power cable is plugged in!
Comment 10 LuRan 2012-10-21 04:39:43 UTC
I decided to gave akonadi/nepomuk another try recently and have similar issue. I actually deleted the old nepomuk database to make sure everything started from scratch. I had 200k mails and virtuoso is using 100% cpu for 5 DAYS and still not finished. And the worst part is most of cputime is spended on waiting, even when there is no indexing happening, virtuoso is using 100% cpu,, the feeder only used about 1% (I have 4 cores and virtuoso only uses 100%, maybe related to the waiting). and according to strace, almost all is spend on futex:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 93.54    4.461622         257     17363       382 futex
  6.00    0.286012          44      6507           select
  0.40    0.018997        4749         4           fsync
  0.02    0.000993           0      5787           recvfrom
  0.02    0.000786           0      6678           lseek
  0.01    0.000612         306         2           ftruncate
  0.01    0.000490           0      5787           sendto
  0.01    0.000375           0      6684           write
  0.00    0.000000           0         4           read
  0.00    0.000000           0         2           open
  0.00    0.000000           0         2           close
  0.00    0.000000           0        48           stat
  0.00    0.000000           0         2           fstat
  0.00    0.000000           0         1           mmap
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         2           rt_sigprocmask
  0.00    0.000000           0         2           unlink
------ ----------- ----------- --------- --------- ----------------
100.00    4.769887                 48876       382 total
Comparing with the file indexing by nepomuk+strigi seems to confirm this problem, in that case the cpu used by virtuoso is almost always below 10%
Comment 11 Anders Lund 2012-10-24 18:15:35 UTC
Torsdag den 4. oktober 2012 12:54:23 skrev du:
> Not sure what you mean by the resources are "dropped", but if they go
> offline and come online again, and there is no new data, there should also
> be no indexing happening.

I have proof in the form of screenshots of akonadiconsole browser, that my 
contacts akonadi IDs are changed. The remote IDs remain, but akonadi gives 
them all new IDs regularly. This is an owncloud/webdav resource.

Apart from CPU abuse, this also means that my groups are emptied, groups 
feature is unusable for me.
Comment 12 Anders Lund 2012-11-10 07:55:21 UTC
More apsects of this horror, now running KDE 4.9.3:

* Often, starting kmail takes > 30 seconds, during which time the virtuoso-t is hammering my poor system.

* When I ask kmail for the "configure filters..." dialog, instead of showing it, it starts a what feels like an infinite virtuoso-t madness session. Sometimes I am lucky that the dialog appears before I get tired and kill kmail (> 1 minute, I really try to be patient...)

Of course these does not happen with nepomuk entirely disabled.
Comment 13 Vishesh Handa 2013-08-17 11:47:30 UTC
I'm marking this bug as FIXED as the nepomuk feeder has been substantially improved with 4.11. It's still not perfect and it does consume more CPU than I would like, but it no longer seems like a big inconvenience. 

Once you have tried 4.11, if you still feel that it is a problem, please feel to reopen this bug. Both ways we will continue working on optimizing the indexing process. We are nowhere near done.