Bug 233471 - Baloo abuses Disk access for excessive periods of time after startup
Summary: Baloo abuses Disk access for excessive periods of time after startup
Status: RESOLVED FIXED
Alias: None
Product: Baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: VHI critical
Target Milestone: ---
Assignee: Pinak Ahuja
URL:
Keywords:
: 299491 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-04-06 11:42 UTC by Ben Cooksley
Modified: 2018-05-19 20:11 UTC (History)
21 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Cooksley 2010-04-06 11:42:45 UTC
Version:            (using Devel)
Compiler:          g++ (SUSE Linux) 4.4.1 [gcc-4_4-branch revision 150839]  Linux grace 2.6.31.12-0.1-default #1 SMP 2010-01-27 08:20:11 +0100 i686 i686 i386 GNU/Linux 
OS:                Linux
Installed from:    Compiled sources

Following svn commit 1104720 by trueg, NepomukFileWatcher severely abuses the disk on startup, preventing low memory systems ( 512mb ) from being usable until it has finished installing its file watchers.

This effectively prevents effective use of Akonadi as a direct result, making the PIM suite useless, as Nepomuk must be killed in order to make the system usable immediately ( as it restarts the filewatcher if it is killed )

Strigi indexing is disabled on my system.

The NepomukFileWatcher should only watch those directories which contain annotated files, not the entire of $HOME.
Comment 1 Ben Cooksley 2010-04-06 11:44:18 UTC
Increasing priority, adding release blocker keyword.

Current behaviour is completely unacceptable for users with more than a few files under $HOME ( including hidden directories )
Comment 2 Sebastian Trueg 2010-04-07 16:20:00 UTC
I am open to any suggestions that improve the situation. I know that there is a lot of load in the beginning. That is due to the installation of inotify watches for all folders. The only thing I can think of at the moment is to slow down this process, thus producing less load but only watching all folders after a certain period of time.
Comment 3 Ben Cooksley 2010-04-07 22:13:34 UTC
Wouldn't it be possible to watch only those directories which contain annotated files?
Comment 4 Sebastian Trueg 2010-04-08 12:50:54 UTC
If you do that and then move the file into another directory Nepomuk has no way of noticing that. With inotify you need to watch both the source and the target directory. So that is no solution.
Comment 5 Ben Cooksley 2010-04-08 12:55:04 UTC
Could another solution be to disable the creation of the watchers for those who have *no* annotations, tags, etc. on their files, so clients of Nepomuk such as Akonadi are unaffected by this behaviour?

Another solution also would be to allow metadata annotations to be completely disabled for certain sub directories, much like Strigi already allows. 

Also, any reason why hidden directories are also affected by this?
Comment 6 Sebastian Trueg 2010-04-08 14:10:54 UTC
Hidden dirs are watched for the same reason: move something in a hidden dir. Of course that could be made configurable but I doubt that would change much for the average user (one without svn/git checkouts lying around ;).

Not watching anything as long as no Nepomuk data has been created could be done indeed but it does not solve the problem for people using Nepomuk (which I hope will be most soon)
Comment 7 Ben Cooksley 2010-04-08 22:56:38 UTC
Is there anything aganist being able to disable annotation support for certain directories ( ie. svn checkouts ) through the UI? This is probably the best way to solve this, given the weaknesses in inotify.
Comment 8 Sebastian Trueg 2010-04-14 15:33:06 UTC
Actually this could be done. We could simply not monitor folders that do not contain metadata. Technically this is rather simple. But I fear that it will slow down the Nepomuk system even more. I would have to experiment.

As for the configuration in case the solution above is too slow: we could use the same folder configuration that is used for strigi to configure the folders that can hold annotations. The only problem here is that all applications would have to honor this config.
Comment 9 Ben Cooksley 2010-06-16 10:16:15 UTC
As an interim hack, I find that the following command placed in $KDEHOME/Autostart fixes this:

sleep 10s && qdbus org.kde.NepomukServer /servicemanager stopService nepomukfilewatch &

Which at the minimum allows Akonadi to be functional whilst still permitting the system to be used.

Recommending that the file watcher be disabled automatically by Nepomuk if no file based tags have been added to the database. Thoughts on this (temporary) solution Sebastian?
Comment 10 Vishesh Handa 2010-06-16 10:33:12 UTC
@Sebastian: We can't simply NOT monitor folders that don't contain any metadata because the user might move some files/folders into a non-monitored location and because of inotify's crappy interface we wouldn't get notified, and therefore the metadata would get deleted.
The only option is watch everything OR get the inotify people to send events of where a file is moved to even if that location isn't watched.

@Ben:
That really wouldn't work. The filewatcher is present because files are moved around, and their metadata needs to be pointed to the right location (nie:url). It isn't just about tags.

But then I guess IF strigi is disabled and there are no tags or comments. Yea we could switch off the filewatcher.
Comment 11 Ben Cooksley 2010-06-16 11:36:36 UTC
@Vishesh: The correct fix in this case is to get the kernel team to provide a more sane interface for file monitoring. Having to monitor everywhere *just in case* a file moves is absolutely insane.

Also, Nepomuk should possibly consider using file system extended attributes ( xattr when mounting ) to attach a UUID to files to allow it to match files to tags on supporting file systems ( which covers moving a file to anywhere on local storage on most systems )
Comment 12 Christoph Obexer 2011-06-04 10:42:32 UTC
ok now I'm shocked.
you should ask the amarok developers, they also keep metadata attached to the files even if you move them!

also this has nothing to do with RAM.

have you thought about how unusable that will make for example Plasma-Active?

this also seems to happen after a suspend/resume cycle.

why would i want to keep tags,... for files i move outside of the locations i configured?
and how would a normal user ever use a file into a hidden folder? and why?
Comment 13 Dennis Schridde 2012-08-03 00:22:10 UTC
This issue perists for me. I have file indexing disabled and generally do not use Nepomuk. I would have it disabled entirely, if the system would not complain so loudly about it — i.e. systray notifications and KMail composer message boxes, etc. It would definitely be a plus if I could disable this file tracking, too.

Have you had a look into fanotify? An LKML post [1] claims that its author was looking into mv support in 2009.

The xattr idea is also nice, even if not reliable when the data is moved to a non-xattr filesystem. On the other hand this would allow to transport Nepomuk data to other systems via a xattr filesystem, which appears to be a good thing.

The niceness of 19 which nepomukfilewatcher currently uses does not appear to help much against the issue.

[1] http://lwn.net/Articles/339253/
Comment 14 Vishesh Handa 2012-08-03 06:41:37 UTC
(In reply to comment #13)
> This issue perists for me. I have file indexing disabled and generally do
> not use Nepomuk. I would have it disabled entirely, if the system would not
> complain so loudly about it — i.e. systray notifications and KMail composer
> message boxes, etc. It would definitely be a plus if I could disable this
> file tracking, too.

I'll add the details on a wiki somewhere, and maybe even do a blog post. For now just add the following to you nepomukserverrc -

[Service-nepomukfilewatch]
autostart=false

> 
> Have you had a look into fanotify? An LKML post [1] claims that its author
> was looking into mv support in 2009.

Unfortunately, yes. I've contacted the relevant maintainer twice asking details on if there were any plans on implementing move support - so far no reply. I have even looked at the source code, but I don't think I'm in a position to fix it myself.

> 
> The xattr idea is also nice, even if not reliable when the data is moved to
> a non-xattr filesystem. On the other hand this would allow to transport
> Nepomuk data to other systems via a xattr filesystem, which appears to be a
> good thing.

There are a lot of problems with this approach as well, but I have been considering it. It'll will require a number of internal changes. Lets see what I can do. Either way, this approach would always be a kind of safety net. We need someway of getting notified that a file has moved.

> 
> The niceness of 19 which nepomukfilewatcher currently uses does not appear
> to help much against the issue.
> 
> [1] http://lwn.net/Articles/339253/
Comment 15 scroogie 2012-08-07 08:13:42 UTC
I think using extended attributes (xattr) is the only way around this. I remembered this blog post as a reaction to a post of sebas: http://jamiemcc.livejournal.com/10814.html
where the author of iNotify explains that this limitation was an intentional tradeoff, which makes any discussion about kernel changes mood in my opinion. 
On the other hand, there was a large discussion in KDE land around this, and in the end, it was apparently declined, so there must have been reasons: 
http://chem-bla-ics.blogspot.de/2006/06/kde-desktop-search-kat-strigi-and.html
http://cniehaus.livejournal.com/23281.html

Perhaps the situation was different back then. xattrs are supported on ext2,ext3,ext4,btrfs,XFS,reiser, etc. Even on NTFS they can be implemented (and ntfs-3g supports it afaik).
Comment 16 Vishesh Handa 2012-08-07 08:41:16 UTC
(In reply to comment #15)
> I think using extended attributes (xattr) is the only way around this. I
> remembered this blog post as a reaction to a post of sebas:
> http://jamiemcc.livejournal.com/10814.html
> where the author of iNotify explains that this limitation was an intentional
> tradeoff, which makes any discussion about kernel changes mood in my
> opinion. 
> On the other hand, there was a large discussion in KDE land around this, and
> in the end, it was apparently declined, so there must have been reasons: 
> http://chem-bla-ics.blogspot.de/2006/06/kde-desktop-search-kat-strigi-and.
> html
> http://cniehaus.livejournal.com/23281.html
> 
> Perhaps the situation was different back then. xattrs are supported on
> ext2,ext3,ext4,btrfs,XFS,reiser, etc. Even on NTFS they can be implemented
> (and ntfs-3g supports it afaik).

xattrs aren't the solution. They are more like a safety net. Here is why -

Say I have a file "A", which has some metadata associated with it. A couple of tags, and a rating. The file is moved from /home/user/DirA/A to /media/disk/DirB/A. Even if we use xattr to store all the metadata, when the file is moved the url stored in our database for the file is now invalid. It is only once we are informed about the url change that we can update our database.

One of the main flaws of inotify is its horrible move events which require the source and destination directory to be monitored. Say we use xattr, now we wouldn't have to watch for move events, BUT we would still have to watch for file creation events.

When the file "A" moves from /home/user/DirA/A to /media/disk/DirB/A, we will need to be informed via a creation event that a new file has appeared, and only then will we read its xattrs data to see if it was actually the file present in "/home/user/DirA/A".

Getting these file creation events requires adding watches in each directory. That is the issue that causes high disk usage at startup.
Comment 17 scroogie 2012-08-07 10:18:19 UTC
I see. And the creation event can't be caught by fanotify e.g.? So that perhaps combining xattr and fanotify would lead to a solution? (e.g. looking up the old path by xattr and the new path with the fd == move event?). I'm just trying to be creative here, sorry if I annoy you. :)
Comment 18 Vishesh Handa 2012-08-07 11:19:19 UTC
(In reply to comment #17)
> I see. And the creation event can't be caught by fanotify e.g.? So that
> perhaps combining xattr and fanotify would lead to a solution? (e.g. looking
> up the old path by xattr and the new path with the fd == move event?). I'm
> just trying to be creative here, sorry if I annoy you. :)

Nah. You're not annoying me. In fact, what you've suggested hadn't occurred to me. So, yay!

Now just to be clear this would work - perfectly, if fanotify actually supported proper creation events. At the time of writing it supports -

* FAN_ACCESS: every file access.
* FAN_MODIFY: file modifications.
* FAN_CLOSE: when files are closed.
* FAN_OPEN: open() calls.
* FAN_ACCESS_PERM: like FAN_ACCESS, except that the process trying to access the file is put on hold while the fanotify client decides whether to allow the operation. 

I suppose I could infer new file creations from these events, though I'm not sure. For even file open/close event I'll have to check my database to see if that file exists -> One SQL call. Might be a big performance hit. Or I guess one should store them in the xattrs, and avoid looking up the database.

Or maybe I could check the mtime of the file whenever I receive a file close event. Hmm, this could work. Anyway, I'll try out some prototypes to see how it's going. I hear fanotify is terribly buggy [1], and doesn't have a maintainer right now, so I don't have very high hopes.

[1] https://lkml.org/lkml/2012/6/18/176
Comment 19 scroogie 2012-08-08 08:57:39 UTC
> > I see. And the creation event can't be caught by fanotify e.g.? So that
> > perhaps combining xattr and fanotify would lead to a solution? (e.g. looking
> > up the old path by xattr and the new path with the fd == move event?). I'm
> > just trying to be creative here, sorry if I annoy you. :)
> 
> Nah. You're not annoying me. In fact, what you've suggested hadn't occurred
> to me. So, yay!

Nice to hear. :)

> Now just to be clear this would work - perfectly, if fanotify actually
> supported proper creation events. At the time of writing it supports -
> 
> * FAN_ACCESS: every file access.
> * FAN_MODIFY: file modifications.
> * FAN_CLOSE: when files are closed.
> * FAN_OPEN: open() calls.
> * FAN_ACCESS_PERM: like FAN_ACCESS, except that the process trying to access
> the file is put on hold while the fanotify client decides whether to allow
> the operation. 

My fanotify.h includes:

/* the following events that user-space can register for */
#define FAN_ACCESS              0x00000001      /* File was accessed */
#define FAN_MODIFY              0x00000002      /* File was modified */
#define FAN_CLOSE_WRITE         0x00000008      /* Writtable file closed */
#define FAN_CLOSE_NOWRITE       0x00000010      /* Unwrittable file closed */
#define FAN_OPEN                0x00000020      /* File was opened */
#define FAN_Q_OVERFLOW          0x00004000      /* Event queued overflowed */
#define FAN_OPEN_PERM           0x00010000      /* File open in perm check */
#define FAN_ACCESS_PERM         0x00020000      /* File accessed in perm check */
#define FAN_ONDIR               0x40000000      /* event occurred against dir */
#define FAN_EVENT_ON_CHILD      0x08000000      /* interested in child events */

I need to check fanotify to understand it better, though. 
It seems it would need to watch for directory changes somehow to detect renames, but I can't see it doing that currently. It really is a bit confusing.
 
> I suppose I could infer new file creations from these events, though I'm not
> sure. For even file open/close event I'll have to check my database to see
> if that file exists -> One SQL call. Might be a big performance hit. Or I
> guess one should store them in the xattrs, and avoid looking up the database.

How is this currently done? My first thought here was that it shouldn't act immediately on every change, but rather queue them somehow, to be able to merge some operations later. Of course this is easier said than done, but looking up and changing database entries upon every filesystem event sounds expensive indeed.
 
> Or maybe I could check the mtime of the file whenever I receive a file close
> event. Hmm, this could work. Anyway, I'll try out some prototypes to see how
> it's going. I hear fanotify is terribly buggy [1], and doesn't have a
> maintainer right now, so I don't have very high hopes.
> 
> [1] https://lkml.org/lkml/2012/6/18/176

Oh, I didn't know that. Thats sad. The first one seems to be extremely bad...
Comment 20 Vishesh Handa 2012-11-30 23:31:16 UTC
*** Bug 299491 has been marked as a duplicate of this bug. ***
Comment 21 Vishesh Handa 2014-11-20 11:45:04 UTC
Re-assigning this bug to Baloo as it too suffers from a similar problem, though to a lesser extent.
Comment 22 tieskey 2014-12-09 00:38:21 UTC
Hey, hope this continues to be addressed with care, is a real annoyance.

I just wanted to emphasize cobexer's comment. What is the rationale behind watching hidden folders? 99% of them are system or app config folders full of files that nobody would ever want to search for. Even if I made a hidden folder to.... hide my files, I would NOT want them to be indexed.. because I need them hidden, not popping out everywhere.
Of the 13Gb in my home, 8Gb are hidden files (caches, .thunderbird, .local, etc), 29% of the files!! (and 40% is the Stanford POS tagger that I will be deleting after this week, so it is actually 50%)

Could be possible to add a regex filtering? It is not noob user friendly but extremely powerful for the rest of humanity :P
Comment 23 Vishesh Handa 2014-12-09 11:53:43 UTC
(In reply to tieskey from comment #22)
> Hey, hope this continues to be addressed with care, is a real annoyance.
> 
> I just wanted to emphasize cobexer's comment. What is the rationale behind
> watching hidden folders? 99% of them are system or app config folders full
> of files that nobody would ever want to search for. Even if I made a hidden
> folder to.... hide my files, I would NOT want them to be indexed.. because I
> need them hidden, not popping out everywhere.
> Of the 13Gb in my home, 8Gb are hidden files (caches, .thunderbird, .local,
> etc), 29% of the files!! (and 40% is the Stanford POS tagger that I will be
> deleting after this week, so it is actually 50%)
> 
> Could be possible to add a regex filtering? It is not noob user friendly but
> extremely powerful for the rest of humanity :P

* Hidden folders are never indexed nor are inotify watches added for them
* We have wildcard filtering and exclude folders which are then ignored - no inotify watches.
* We removed the advanced interface for Baloo for configuring the filtering via GUI. Might not have been the smartest decision. We're discussing how to add it back. Till then there is an advanced kcm package.
Comment 24 tieskey 2014-12-09 19:00:07 UTC
(In reply to Vishesh Handa from comment #23)
> (In reply to tieskey from comment #22)
> > Hey, hope this continues to be addressed with care, is a real annoyance.
> > 
> > I just wanted to emphasize cobexer's comment. What is the rationale behind
> > watching hidden folders? 99% of them are system or app config folders full
> > of files that nobody would ever want to search for. Even if I made a hidden
> > folder to.... hide my files, I would NOT want them to be indexed.. because I
> > need them hidden, not popping out everywhere.
> > Of the 13Gb in my home, 8Gb are hidden files (caches, .thunderbird, .local,
> > etc), 29% of the files!! (and 40% is the Stanford POS tagger that I will be
> > deleting after this week, so it is actually 50%)
> > 
> > Could be possible to add a regex filtering? It is not noob user friendly but
> > extremely powerful for the rest of humanity :P
> 
> * Hidden folders are never indexed nor are inotify watches added for them
> * We have wildcard filtering and exclude folders which are then ignored - no
> inotify watches.
> * We removed the advanced interface for Baloo for configuring the filtering
> via GUI. Might not have been the smartest decision. We're discussing how to
> add it back. Till then there is an advanced kcm package.

Oh, that's good to know. Sorry for not checking it myself before posting. 
But know I don't understand why it takes almost 4 minutes to stop killing my disk (on a brand new i7 laptop with 8Gb ram). Does having some indexed folders on a NTFS partition affect the performance?
Comment 25 Vishesh Handa 2014-12-10 11:55:29 UTC
(In reply to tieskey from comment #24)
> Oh, that's good to know. Sorry for not checking it myself before posting. 

No worries. It might actually make sense for me to file a new bug with all the info, and mark this bug as a duplicate of that. This bug has too much information for anyone to read through.

> But know I don't understand why it takes almost 4 minutes to stop killing my
> disk (on a brand new i7 laptop with 8Gb ram). Does having some indexed
> folders on a NTFS partition affect the performance?

I haven't personally tested NTFS. Though another developer has complained about NTFS. I'll do some tests and get back to you.
Comment 26 tieskey 2014-12-17 15:22:42 UTC
(In reply to Vishesh Handa from comment #25)
> (In reply to tieskey from comment #24)
> > Oh, that's good to know. Sorry for not checking it myself before posting. 
> 
> No worries. It might actually make sense for me to file a new bug with all
> the info, and mark this bug as a duplicate of that. This bug has too much
> information for anyone to read through.
> 
> > But know I don't understand why it takes almost 4 minutes to stop killing my
> > disk (on a brand new i7 laptop with 8Gb ram). Does having some indexed
> > folders on a NTFS partition affect the performance?
> 
> I haven't personally tested NTFS. Though another developer has complained
> about NTFS. I'll do some tests and get back to you.

Messing with the config file it turns out I was indexing all my ntfs partition and not just the folders I wanted....
You really need to add that options back to the gui!
Comment 27 Nate Graham 2018-05-19 20:11:47 UTC
A lot of work has been done recently to improve the various ways that this can happen. If anyone's still experiencing it with KDE Frameworks 5.46 or later, please file a new bug. Thanks!