Bug 182064 - folders indexed on every start that did not change
Summary: folders indexed on every start that did not change
Status: RESOLVED FIXED
Alias: None
Product: nepomuk
Classification: Miscellaneous
Component: general (show other bugs)
Version: 4.1
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: Sebastian Trueg
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-27 08:08 UTC by S. Burmeister
Modified: 2011-01-06 15:42 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description S. Burmeister 2009-01-27 08:08:37 UTC
Version:            (using KDE 4.1.96)
OS:                Linux
Installed from:    SuSE RPMs

Everytime I start the computer nepomuk's strigi service uses a lot of hdd i/o. For some minutes it causes ~800kb/s working on files that did not change for months, i.e. should already be indexed from the previous sessions.

This might be related to bug 180460
Comment 1 mutlu inek 2009-09-14 06:50:59 UTC
I also experience high I/O load after KDE startup. Sometimes Nepomuk (with strigi enabled) browses folders which had recent changes, very often, however, folders are crawled which I have not touched in years. Their data should have been indexed long ago.

On my machine, this is not related to external storage as all data is on the local hard drive.
Comment 2 S. Burmeister 2009-10-16 09:01:30 UTC
Since amarok seems to get this right, maybe they can give some hints on how they track every change without having to re-scan all folders.

The original statment by Sebastian was, that there is no other way to track every kind of change but to re-scan all folders.
Comment 3 mutlu inek 2009-10-17 00:37:08 UTC
Well, I have become to believe that proper use of inotify actually solves this issue. I have written about my findings in another bug report for nepomuk. See my post here: https://bugs.kde.org/show_bug.cgi?id=196402#c11

With regard to Amarok, it seems to me that they simply check the mtime of directories to see whether they contain files that need to be rescanned. See this blog post: http://blog.jefferai.org/2009/10/14/speed-never-gets-old-at-least-in-software-1129
Comment 4 Jeff Mitchell 2009-10-17 03:16:13 UTC
Yes -- we check directory mtimes, which generally works pretty well except for filesystems that don't have/properly update mtimes  :-|

We could hook into inotify, and we've explored that in the past, but it'd be a linux-specific thing (and it brings some other complexities into the works).

I can provide more details if anyone wishes -- the way that the changes in that blog post were made is that we now give the collection scanner the mtimes of the directories instead of just the directory list itself, which allows us to skip subfolders that haven't changed.
Comment 5 Sebastian Trueg 2009-12-14 12:25:55 UTC
checking the dir's mtime is not enough since that does not change if a file in the dir changes.
I see no other way than scanning all folders for changes since a lot could have changed while Nepomuk was not running.
Comment 6 Jeff Mitchell 2009-12-14 15:42:00 UTC
Sorry, I thought this complaint was about Amarok. For Nepomuk, I agree, checking the dir's mtime isn't enough. It's a currently-acceptable (although less than ideal) situation for Amarok.
Comment 7 Sebastian Trueg 2010-01-27 13:54:10 UTC
Can I close this bug? After all there is no other way to make sure we get all new files than running through all folders and checking all files.
Comment 8 Will Stephenson 2010-01-27 14:11:48 UTC
Sure, and a related suggestion: set the default set of folders to index to ~/Documents and below instead of ~,
Comment 9 Lubos Lunak 2010-01-27 14:56:37 UTC
Well, the problem does exist. And just because there isn't a good solution right now doesn't mean there can't be one. There is one SUSE kernel developer who has a kernel patch that would help with this problem, I just need to make him finally finish and submit it.
Comment 10 Sebastian Trueg 2010-01-27 16:40:21 UTC
I don't see how a kernel patch can help here. While Nepomuk is not running a lot could happen and we need to find these changes, too. The best inotify replacement won't help for that scenario.

As for ~/Documents: can't you do that via a global configuration file for SuSE?
Comment 11 Lubos Lunak 2010-01-27 17:18:11 UTC
I'm not talking about an inotify replacement, am I? Anyway, unless you know a kernel developer with some spare time, I'll get back here as soon as the feature is usable.
Comment 12 Sebastian Trueg 2010-01-27 17:30:31 UTC
@Lubos: I have no idea what you are talking about. You only wrote "a kernel patch that would help with this problem". So since I could not think of a kernel patch that would help with the initial indexing problem I thought you meant the monitoring of file operations.
Care to spare a few details?
Comment 13 Lubos Lunak 2010-01-27 18:58:34 UTC
The idea is basically a kind of recursive mtime. When something changes, the flag propagates all the way up. So when checking a directory tree, recurse only in parts where the flag is set. The idea was initially for kbuildsycoca but it should be usable e.g. for strigi too.
Comment 14 mutlu inek 2010-04-07 16:41:54 UTC
I believe this bug report can be closed now thanks to Sebastian Trüg's reworking of the indexing infrastructure. See:

http://websvn.kde.org/?view=revision&revision=1104720
http://websvn.kde.org/?view=revision&revision=1104721
Comment 15 Sebastian Trueg 2010-04-07 16:49:21 UTC
I agree that this can be closed. Can someone please confirm?
Comment 16 Tassilo Horn 2010-04-07 17:43:50 UTC
(In reply to comment #15)
> I agree that this can be closed. Can someone please confirm?

I don't have KDE installed from trunk, so I cannot confirm if it works till KDE 4.5 is out.  So I'd suggest to change the bug status to resolved and keep it open until someone has confirmed it works.
Comment 17 Sebastian Trueg 2011-01-06 15:42:51 UTC
Closing as it cannot be reproduced since 4.5 anymore.