Summary: | Insane abuse of nepomuk causes power abuse and makes the system unusable | ||
---|---|---|---|
Product: | [Frameworks and Libraries] Akonadi | Reporter: | Anders Lund <anderslund> |
Component: | Nepomuk Feeder Agents | Assignee: | kdepim bugs <kdepim-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | chrigi_1, hephooey_dev, me, vkrause |
Priority: | NOR | ||
Version: | 4.9 | ||
Target Milestone: | --- | ||
Platform: | unspecified | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | 4.11 | |
Sentry Crash Report: |
Description
Anders Lund
2012-10-04 06:22:53 UTC
Christian, is there a specific reason for not using nepomuk-core in kdepim-runtime 4.9 ? We are using nepomuk-core (Nepomuk2). The wrapper generator has not yet been ported to nepomuk-core AFAIK, therefore we're still using the pre-generated versions in dms-copy (But they're all using nepomuk-core. Then I'm lost: /kde/src/pim/kdepim-runtime/agents (4.9) # wcgrep -i nepomuk2 |wc -l 0 /kde/src/pim/kdepim-runtime/agents (master) # wcgrep -i nepomuk2 |wc -l 535 We don't even look for nepomuk-core in kdepim-runtime 4.9 (commit 13e481b is only in master) Of course it is very good to ensure that the right nepomuk code is used. But it is equally importnat that ALL aspects of this area is optimized. For example there is a patch waiting that should prevent reindexing when flags are changed. On my system, my owncloud resources are often dropped, and judging from the virtuoso-t activity when they are rediscovered, they are also reindexed again. Quite absurd, for my calendars + contacts! Let me know if I can help investigate or if my log files can help! What happens with new mail? Receiving a few mails costs minutes of virtuoso-t activity > 30-50% CPU here, and that can not be reasonable, even given virtuoso-t or nepomuk inefficiency - indexing many times the amount of images or documents is barely visible. So I can't help wondering if messages are indexed more than once, for example when moved by a filter. If that is the case, there is a low-hanging fruit to pick! Mail that is set as spam by my bogofilter should NEVER be indexed, so can trust that mail in trash folders are never indexed right? But that is not enough, as long as filtering is not always trustworthy itself, often spam messages are left in the inbox, and it is still silly to index it. Is there a header I can add to prevent that from happening? (and that the spam filtering wizard could add to the filters it creates!) (In reply to comment #3) > Then I'm lost: > > /kde/src/pim/kdepim-runtime/agents (4.9) # wcgrep -i nepomuk2 |wc -l > 0 > > /kde/src/pim/kdepim-runtime/agents (master) # wcgrep -i nepomuk2 |wc -l > 535 > > We don't even look for nepomuk-core in kdepim-runtime 4.9 (commit 13e481b is > only in master) You are of course right, I was thinking of master, sorry. If we can depend on nepomuk-core in 4.9 already there is no specific reason. I think nepomuk-core simply came a little late in the process so I didn't port it. (In reply to comment #4) > Of course it is very good to ensure that the right nepomuk code is used. > > But it is equally importnat that ALL aspects of this area is optimized. For > example there is a patch waiting that should prevent reindexing when flags > are changed. > > On my system, my owncloud resources are often dropped, and judging from the > virtuoso-t activity when they are rediscovered, they are also reindexed > again. Quite absurd, for my calendars + contacts! Let me know if I can help > investigate or if my log files can help! > Not sure what you mean by the resources are "dropped", but if they go offline and come online again, and there is no new data, there should also be no indexing happening. > What happens with new mail? Receiving a few mails costs minutes of > virtuoso-t activity > 30-50% CPU here, and that can not be reasonable, even > given virtuoso-t or nepomuk inefficiency - indexing many times the amount of > images or documents is barely visible. So I can't help wondering if messages > are indexed more than once, for example when moved by a filter. If that is > the case, there is a low-hanging fruit to pick! > As long as the ID of the akonadi-item doesn't change there shouldn't be any reindexing going on. > Mail that is set as spam by my bogofilter should NEVER be indexed, so can > trust that mail in trash folders are never indexed right? But that is not > enough, as long as filtering is not always trustworthy itself, often spam > messages are left in the inbox, and it is still silly to index it. Is there > a header I can add to prevent that from happening? (and that the spam > filtering wizard could add to the filters it creates!) The feeder looks for a $JUNK flag to filter spam, not sure where this flag is set exactly. Torsdag den 4. oktober 2012 12:54:23 skrev du:
> > On my system, my owncloud resources are often dropped, and judging from
> > the
> > virtuoso-t activity when they are rediscovered, they are also reindexed
> > again. Quite absurd, for my calendars + contacts! Let me know if I can
> > help investigate or if my log files can help!
> >
> >
>
> Not sure what you mean by the resources are "dropped", but if they go
> offline and come online again, and there is no new data, there should also
> be no indexing happening.
When this happens, contacts are not recognized in groups in my "personal
contacts" resource any longer, which indicates to me - togeather with the
intensive virtuoso-t activity - that something bad is going on.
There is no new data though, data changes caused by me changing or adding
contacts from another device does not cause problems.
Anders
Torsdag den 4. oktober 2012 12:54:23 skrev du: > As long as the ID of the akonadi-item doesn't change there shouldn't be any > reindexing going on. Good to know > > Mail that is set as spam by my bogofilter should NEVER be indexed, so can > > trust that mail in trash folders are never indexed right? But that is not > > enough, as long as filtering is not always trustworthy itself, often spam > > messages are left in the inbox, and it is still silly to index it. Is > > there > > a header I can add to prevent that from happening? (and that the spam > > filtering wizard could add to the filters it creates!) > > The feeder looks for a $JUNK flag to filter spam, not sure where this flag > is set exactly. Is there any way I can veryfy that it exists? Anders More aspcecs of this area of problems: * Why does akonadi INSIST on feeding nepomuk while I use my system heavily, like compiling, prossessing images or video etc. I have gotten into the habit of stopping akonadi, when I want to work! :0 Why not be on the good side, there are tools for it! * Why does akonadi INSIST on abusing my CPU while on battery? Do like the indexing system, and wait until the power cable is plugged in! I decided to gave akonadi/nepomuk another try recently and have similar issue. I actually deleted the old nepomuk database to make sure everything started from scratch. I had 200k mails and virtuoso is using 100% cpu for 5 DAYS and still not finished. And the worst part is most of cputime is spended on waiting, even when there is no indexing happening, virtuoso is using 100% cpu,, the feeder only used about 1% (I have 4 cores and virtuoso only uses 100%, maybe related to the waiting). and according to strace, almost all is spend on futex: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 93.54 4.461622 257 17363 382 futex 6.00 0.286012 44 6507 select 0.40 0.018997 4749 4 fsync 0.02 0.000993 0 5787 recvfrom 0.02 0.000786 0 6678 lseek 0.01 0.000612 306 2 ftruncate 0.01 0.000490 0 5787 sendto 0.01 0.000375 0 6684 write 0.00 0.000000 0 4 read 0.00 0.000000 0 2 open 0.00 0.000000 0 2 close 0.00 0.000000 0 48 stat 0.00 0.000000 0 2 fstat 0.00 0.000000 0 1 mmap 0.00 0.000000 0 1 munmap 0.00 0.000000 0 2 rt_sigprocmask 0.00 0.000000 0 2 unlink ------ ----------- ----------- --------- --------- ---------------- 100.00 4.769887 48876 382 total Comparing with the file indexing by nepomuk+strigi seems to confirm this problem, in that case the cpu used by virtuoso is almost always below 10% Torsdag den 4. oktober 2012 12:54:23 skrev du:
> Not sure what you mean by the resources are "dropped", but if they go
> offline and come online again, and there is no new data, there should also
> be no indexing happening.
I have proof in the form of screenshots of akonadiconsole browser, that my
contacts akonadi IDs are changed. The remote IDs remain, but akonadi gives
them all new IDs regularly. This is an owncloud/webdav resource.
Apart from CPU abuse, this also means that my groups are emptied, groups
feature is unusable for me.
More apsects of this horror, now running KDE 4.9.3: * Often, starting kmail takes > 30 seconds, during which time the virtuoso-t is hammering my poor system. * When I ask kmail for the "configure filters..." dialog, instead of showing it, it starts a what feels like an infinite virtuoso-t madness session. Sometimes I am lucky that the dialog appears before I get tired and kill kmail (> 1 minute, I really try to be patient...) Of course these does not happen with nepomuk entirely disabled. I'm marking this bug as FIXED as the nepomuk feeder has been substantially improved with 4.11. It's still not perfect and it does consume more CPU than I would like, but it no longer seems like a big inconvenience. Once you have tried 4.11, if you still feel that it is a problem, please feel to reopen this bug. Both ways we will continue working on optimizing the indexing process. We are nowhere near done. |