Bug 308474 - keeps indexing network share contrary to configuration
Summary: keeps indexing network share contrary to configuration
Status: RESOLVED FIXED
Alias: None
Product: nepomuk
Classification: Miscellaneous
Component: filewatch (show other bugs)
Version: 4.9
Platform: Gentoo Packages Linux
: NOR major
Target Milestone: ---
Assignee: Nepomuk Bugs Coordination
URL:
Keywords:
: 281450 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-10-16 08:00 UTC by Bjoern Olausson
Modified: 2013-03-22 16:24 UTC (History)
8 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
CPU/Network Graph (153.08 KB, image/png)
2012-12-14 16:36 UTC, regi.hops
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bjoern Olausson 2012-10-16 08:00:16 UTC
When accessing a SMB share (mounted via a VPN across the Internet), nepomuk servicestub filewatcher digs the entire share (a few hundred TB) consuming all my upload bandwidth.

Nepomuk file indexer is disabled and Desktop Query is limited to a single empty folder in my home directory (just to make sure it does not index anything) and removable media should be ignored.

To me it looks like the filewatch part is ignoring my setting.

blub@larry $ lsof /mnt/SMB-Share/share/
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF      NODE NAME
nepomukse 4155 blub   17r   DIR   0,35        0 320970531 /mnt/SMB-Share/share/CRM
nepomukse 4155 blub   18r   DIR   0,35        0  69227841 /mnt/SMB-Share/share
nepomukse 4155 blub   19r   DIR   0,35        0  69228137 /mnt/SMB-Share/share/CRM/expenses
nepomukse 4155 blub   20r   DIR   0,35        0  69228154 /mnt/SMB-Share/share/CRM/expenses/adrian
nepomukse 4155 blub   25r   DIR   0,35        0  69228155 /mnt/SMB-Share/share/CRM/expenses/adrian/CASH
09:20:25 [~]
blub@larry $ ps -ef | grep nepomuk
blub      3908  3876  0 07:24 ?        00:00:10 /usr/bin/akonadi_nepomuk_feeder --identifier akonadi_nepomuk_feeder
blub      3954     1  0 07:24 ?        00:00:00 /usr/bin/nepomukserver
blub      3957  3954  5 07:24 ?        00:06:17 /usr/bin/nepomukservicestub nepomukstorage
blub      4152  3954  0 07:24 ?        00:00:00 /usr/bin/nepomukservicestub nepomukfileindexer
blub      4153  3954  0 07:24 ?        00:00:00 /usr/bin/nepomukservicestub nepomukqueryservice
blub      4154  3954  0 07:24 ?        00:00:00 /usr/bin/nepomukservicestub nepomukbackupsync
blub      4155  3954  0 07:24 ?        00:00:02 /usr/bin/nepomukservicestub nepomukfilewatch
blub      4244     1  0 07:24 ?        00:00:00 /usr/bin/nepomukcontroller -session 10e5617272000133423828300000072100027_1350295774_133445
blub     22097  7518  0 09:20 pts/1    00:00:00 grep --colour=auto nepomuk

Even when I select "suspend file indexing" nepomukfileindexer continues to dig the share.

I recently delete all related configs and started from scratch, but it didn't help so I consider this as an bug.

I even created a new user, started KDE, run lsof and saw the same - nepomuk indexing the share... all with default settings.

Cheers,
Bjoern

Reproducible: Always

Steps to Reproduce:
1. Mount SMB share
2. Access SMB share with Dolphin
3. lsof /path/to/share/somefolder
4. Watch nepomuk indexing the share
Actual Results:  
Nepomuk indexes the share recursively eating all my bandwidth 

Expected Results:  
According to my configuration, nepomuk should not index anything since fileindexing is disabled.

In Fact I think filindexing should stay away from anything mounted via network per default!
Comment 1 Bjoern Olausson 2012-10-18 07:20:11 UTC
Well, I don't even have to access the share with Dolphin. I just have mount it (via automount) and the nightmare will start.

Cheers,
Bjoern
Comment 2 Denys 2012-11-09 17:15:37 UTC
Same here.

I have posted this bug here some time ago:
http://sourceforge.net/apps/trac/oscaf/ticket/141
Unfortunately it was not a correct place.

This bug, by the way, is not limited only to NAS mounts. It sometimes happens when I connect my iPod (indexing removable devices is off). As a result of this there is no way to unmount the device. I have to kill the nepomukserver.

The bug is terrible and tortures me on Ubuntu 12.04 and Fedora Core F17. So I have to keep nepomuk completely disabled.
Comment 3 Bjoern Olausson 2012-11-12 19:03:27 UTC
Okay, now this gets really annoying...
Even after hours when the VPN is disconnected, nepomukservicestub still keeps the share busy although it should stall at some point an realize that the share is dead.
Saving an attachment from an e-mail within Kontact took me >10 minutes since I had to wait for an timeout for each action I did in the folder selection dialog.

Please fix it! Make nepomukservicestub aware of network mounted shares, make it smarter, or at least make it stick to the configuration. Or if this is even to complicated, let it die once it cant proceed indexing because of a stalled share, or whatever, but please KILL it somehow! Let it die!

Kind regards,
Bjoern
Comment 4 regi.hops 2012-11-12 20:40:15 UTC
I recognized it too.
System:
openSUSE 12.2 x86_64
KDE 4.9.3
NFS shares

After login nepomuk start indexing the shares, in the config only my local Documents Folder is activated for indexing.
Comment 5 Bjoern Olausson 2012-12-06 08:42:56 UTC
So since nobody seems to care I started to investigate... (also I do not have the time, this kills my nerves and has to be fixed somehow, even if I start to hack around in the code)

The problem seems to be "nepomukfilewatch" service. Once disabled, the described problem is gone.

To disable nepomukfilewatch execute:
qdbus org.kde.NepomukServer /servicemanager stopService nepomukfilewatch

Just add it to the KDE autostart and you are good to go.
I didn't find any drawback for my work-flow by disabling nepomukfilewatch.

If I find some more time I'll see if I can add a simple hack to just ignore "network folders".

A second thing that bugs me:
Why the hack has Dolphin to count the items inside a folder and there is no way to disable this! This eats also a lot of time when working on remote folders *sigh*

Looks like the Devs are all sitting on a machine with a high speed SSD-RAID and nobody ever thinks about performance anymore...

Cheers,
Bjoern
Comment 6 Andreas Sturmlechner 2012-12-09 17:34:08 UTC
*** This bug has been confirmed by popular vote. ***
Comment 7 Andreas Sturmlechner 2012-12-09 17:43:29 UTC
I just had the same bug with a simple usb stick in KDE 4.9.90 - I couldn't unmount it because a nepomuk process was blocking it, and re-spawned a new one as soon as I killed it.

File Indexing is set to "Ignore all removable media" and generally only in use for one particular folder inside ~. Suspending file indexing didn't solve it.
Comment 8 Andreas Sturmlechner 2012-12-10 07:42:45 UTC
It seems the problem has worsened with 4.9.90 - I can't remember that happening in 4.9. But I haven't yet found a sure way to reproduce it. Right now it happened after copying one file to the usb stick. nepomuk file indexing is suspended because I'm running on battery, nevertheless it blocks me from unmounting the stick.
Comment 9 Vishesh Handa 2012-12-10 08:13:02 UTC
(In reply to comment #8)
> It seems the problem has worsened with 4.9.90 - I can't remember that
> happening in 4.9. But I haven't yet found a sure way to reproduce it. Right
> now it happened after copying one file to the usb stick. nepomuk file
> indexing is suspended because I'm running on battery, nevertheless it blocks
> me from unmounting the stick.

Could you define what you mean by 4.9.90? Do you mean the current master or Beta1? Cause I have fixed the umounting, but the fix will only be in RC1 - https://bugs.kde.org/show_bug.cgi?id=304943

@Everyone else: Please don't mix the two bugs up. One bug is about removable media being rendered unmountable due to the file watch service - That is bug 304943. This bug is about how the file watch service installs watches in all the directories of network shares/removable media, even though it was told not to index, and thereby consumes a lot of bandwidth.

This second bug is still there. I'm not sure how to fix it. People could tag/rate any of the files in the mounted directory. We need those watches to make sure the tags are not lost if those files are ever moved. Another case could be the moving of files to the network share. One would expect the metadata to still be there. Without the watches, the meta-data would disappear.
Comment 10 Andreas Sturmlechner 2012-12-10 10:40:40 UTC
(In reply to comment #9)
> Could you define what you mean by 4.9.90? Do you mean the current master or
> Beta1? Cause I have fixed the umounting, but the fix will only be in RC1 -
> https://bugs.kde.org/show_bug.cgi?id=304943

That is beta2. Thx for the fix!
Comment 11 regi.hops 2012-12-10 12:56:36 UTC
(In reply to comment #9)
> This second bug is still there. I'm not sure how to fix it. People could
> tag/rate any of the files in the mounted directory. We need those watches to
> make sure the tags are not lost if those files are ever moved. Another case
> could be the moving of files to the network share. One would expect the
> metadata to still be there. Without the watches, the meta-data would
> disappear.

Wouldn't it be an option to let the user decide?

Split the functionality in indexing local files and remote files.
Together with a small hint that the meta-data would be lost, if the file is on a not indexed remote share (or moved to it) would clarify the behavior.
Sitting on a LAN with GBit isn't such a big deal, but having shares from a few web servers mounted over a 6MBit WAN is a problem.

Or

Even easier - with the current selection options it should work.
If I activate only a single directory for indexing (lets say "Documents"), then I would expect that only files in this directory are indexed. And if I move files from this directory to another location I expect that the meta-data disappear.
If I move the file back to this directory I would expect that the meta-data appears again.
This could be based on another option where I can decide how long meta-data is kept after moving/deleting a file, until it is finally cleared from the meta-data storage.

At the moment the behavior is irritating.
Also your description implies that the whole local hard disk(s) must be indexed to keep track of moving local files.
And I wouldn't expect that a task (under Linux) do something that I never told him to do.
Comment 12 Vishesh Handa 2012-12-10 13:01:36 UTC
It's not just about indexed files, if the indexed information is lost, we can re-index the file. Not a problem. It's the tags and ratings that should not be lost. Tags and Rating have nothing to do with the indexed directories. Do you see the problem?
Comment 13 regi.hops 2012-12-10 14:14:33 UTC
I understand the problem - sorry for mixing-up indexing and meta-data.
And I really appreciate the effort and work that you are taking to keep the meta-data.
I can imaging that this isn't easy.

But... ;-)
Why not giving the user a more fine grained control over the handling of meta-data like tags, ratings, keywords and so on. At least an exclusion option of files and/or folders.
Together with an expiration option of the meta-data storage.
I don't say it's a perfect one - but it would be a configurable one.

Cheers
Regi
Comment 14 Bjoern Olausson 2012-12-10 20:40:24 UTC
I second this.

Add a option to generally "Ignore network shares" and an option to exclude folders.
Additionally  to this respect the indexing options and limit all the metadata stuff to files in this directory. I don't think users expect metadata to be available outside the specified folder. Finally add a warning that metadata is lost if a file is moved somewhere where but not within the specified path to index.

The current behavior is a bummer for KDE in a corporate environment where you share data cross networks...

Cheers,
Bjoern
Comment 15 Denys 2012-12-14 12:28:27 UTC
> This second bug is still there. I'm not sure how to fix it. People could
> tag/rate any of the files in the mounted directory. We need those watches to
> make sure the tags are not lost if those files are ever moved. Another case
> could be the moving of files to the network share. One would expect the
> metadata to still be there. Without the watches, the meta-data would
> disappear.

I do not see how it can possibly work even in an ideal situation.

Suppose that someone has some metadata on a file in a local directory and then moved the file to a NAS. Is the watch necessary to change association of this metadata from local file to file on a removable media? How would it help if the media is actually removed afterwards? If you mount it on another computer the metadata will not be available there. Even if you mount it on the same computer but using a different path (to the NAS) or after someone changed some sort of ID of the removable disk then you won't see that metadata again, will you? 

Isn't it more sensible to store the metadata itself or some sort of "metadata ID" directly in the files using extended attributes? Then the metadata or its marker will travel with the file without any special work. And I realise that some file systems do not support extended attributes. But noone grieves that elephants do not fly. So following this approach the metadata would be safely processed within a proper Linux ecosystem (and maybe further) and won't work only in some weird cases where it does not matter much anyway.

And frankly, while this bug is still there many people do not care how KDE's semantic desktop works. In fact, the first thing they do after installing KDE is disabling it completely.
Comment 16 Vishesh Handa 2012-12-14 13:58:19 UTC
(In reply to comment #15)
> > This second bug is still there. I'm not sure how to fix it. People could
> > tag/rate any of the files in the mounted directory. We need those watches to
> > make sure the tags are not lost if those files are ever moved. Another case
> > could be the moving of files to the network share. One would expect the
> > metadata to still be there. Without the watches, the meta-data would
> > disappear.
> 
> I do not see how it can possibly work even in an ideal situation.
> 
> Suppose that someone has some metadata on a file in a local directory and
> then moved the file to a NAS. Is the watch necessary to change association
> of this metadata from local file to file on a removable media? How would it
> help if the media is actually removed afterwards? If you mount it on another
> computer the metadata will not be available there. Even if you mount it on
> the same computer but using a different path (to the NAS) or after someone
> changed some sort of ID of the removable disk then you won't see that
> metadata again, will you? 

Different mounted path: yes. Though I have never tested it out.
If you change the id of the removable disk. Then no.

How about if the watches are not added for network shares? But are still added for other removable media? I could also provide options to enable a config file to disable it completely. I cannot add new strings, but I could document all these hidden options on the userbase?

I hope you understand why I'm slightly reluctant on removing this entirely.

> 
> Isn't it more sensible to store the metadata itself or some sort of
> "metadata ID" directly in the files using extended attributes? Then the
> metadata or its marker will travel with the file without any special work.
> And I realise that some file systems do not support extended attributes. But
> noone grieves that elephants do not fly. So following this approach the
> metadata would be safely processed within a proper Linux ecosystem (and
> maybe further) and won't work only in some weird cases where it does not
> matter much anyway.

That I agree should be done, but it would take considerable effort. It would amount to a new feature. That it not something I can do for 4.10. Lets focus on what I can do?

> 
> And frankly, while this bug is still there many people do not care how KDE's
> semantic desktop works. In fact, the first thing they do after installing
> KDE is disabling it completely.

Well, I have we can improve on that.
Comment 17 regi.hops 2012-12-14 16:35:24 UTC
Hi,
to give you an impression of how intrusive this feature is at the moment on my system, I attached an image of the CPU/Network-Graph right after login into KDE.
System:
Phenom 2 Quad core, with 8GB Ram and 1GB LAN speed.
With a small NFS-Share of my Dev-Server mounted (approx 2.6 GB Size / 20,000 files).
BTW: Production-Server Shares I can't mount - If I do so, monitoring jumps in ;-)

Marker 1 - System start-up
Marker 2 - Tried to start dolphin, giving up because disk activity makes it too slow
Marker 3 - I stop all activities to speed up nepomuk (25% CPU usage means 1 Core with 100%)
Marker 4 - Finished, I can start to work

Between Marker 1 and Marker 4 it took nearly 10 Minutes.

Please consider a solution, as Denys said most people I know disable it completely - even it's a great feature.

Cheers
Regi
Comment 18 regi.hops 2012-12-14 16:36:21 UTC
Created attachment 75833 [details]
CPU/Network Graph
Comment 19 Denys 2012-12-15 16:13:11 UTC
> Different mounted path: yes. Though I have never tested it out.
> If you change the id of the removable disk. Then no.
> 
> How about if the watches are not added for network shares? But are still
> added for other removable media? I could also provide options to enable a
> config file to disable it completely. I cannot add new strings, but I could
> document all these hidden options on the userbase?

This would certainly be a good thing.

> I hope you understand why I'm slightly reluctant on removing this entirely.

Of course.

> That I agree should be done, but it would take considerable effort. It would
> amount to a new feature. That it not something I can do for 4.10. Lets focus
> on what I can do?

Sure.

I thought a bit more on this and it seems that while using extended attributes would be more reliable, it also has its limitations. While metadata would follow the files, users will probably need to have write permissions to update it.
Comment 20 Vishesh Handa 2012-12-31 09:42:56 UTC
Would it be possible for someone to test the patch out?

https://git.reviewboard.kde.org/r/108047/

I want to make sure it works, and I don't have any network shares. (Nor do I want to set one up)
Comment 21 Bjoern Olausson 2012-12-31 10:23:29 UTC
I am more than happy to test it once I am at home. (3-4 days)

Thanks a lot,
Bjoern
Comment 22 Vishesh Handa 2013-01-02 20:13:23 UTC
Git commit 19f9f3fcc094ad65d4b11eb2fd1d5fc20c3a255e by Vishesh Handa.
Committed on 31/12/2012 at 10:26.
Pushed by vhanda into branch 'KDE/4.10'.

FileWatch: Do not always add inotify watches for removable media

Not everyone wants watches to be added to the removable media. They are
okay with loosing tags/ratings and in removable media. Also, certain
users do not watch watches ever added for Network Shares - It results in
a large amount of network load which is not required.

The following options can disable watches in nepomukstrigirc -

[RemovableMedia]
add watches=true
add watches network share=false

By default watches are never added to network shares, and they are
always added for removable media.
REVIEW: 108047

M  +20   -2    services/filewatch/nepomukfilewatch.cpp

http://commits.kde.org/nepomuk-core/19f9f3fcc094ad65d4b11eb2fd1d5fc20c3a255e
Comment 23 Bjoern Olausson 2013-01-03 08:43:50 UTC
(In reply to comment #20)
> Would it be possible for someone to test the patch out?
> 
> https://git.reviewboard.kde.org/r/108047/
> 
> I want to make sure it works, and I don't have any network shares. (Nor do I
> want to set one up)

Tested and it works for me. It does no longer index the network share (extensive tests done) or USB-Stick (Quick one shot test) with the following config:

[RemovableMedia]
add watches=false
add watches network share=false

I would really like to see these new to options to go into the "Nepomuk/Strigi Server Configuration" panel under the KDE System-Settings.

Thanks a lot!

Cheers,
Bjoern
Comment 24 regi.hops 2013-01-03 20:31:26 UTC
(In reply to comment #20)

Yep - did a quick test in a VirtualBox with an USB-Drive and a NFS-Share.
Looks good.

Great job.
Comment 25 Vishesh Handa 2013-01-08 06:05:46 UTC
*** Bug 281450 has been marked as a duplicate of this bug. ***
Comment 26 korgens 2013-03-22 02:39:40 UTC
It seems that this bug is back.

Tested on Archlinux:
Linux 3.8.3-2-ARCH #1 SMP PREEMPT Sun Mar 17 13:04:22 CET 2013 x86_64 GNU/Linux

My version KDE 4.10.1.  Fresh install of KDE (deleted all .kde4, .local and .config files before loging in).

Configuration:
Configuration Settings -> Nepomuk server -> Indexing: "ignore all removable devices" 
... -> Custom folders: I've left only folders inside /home checked

How to reproduce:
1) Place an USB drive copy, copy some video files to it.
2) Try to "safely remove" from Dolphing or from "available devices" on the system tray. Message says impossible because there is a program accessing it.
3) From the command line, issuing lsof /dev/sdd1 shows nepomuk.

Not even root can unmount the device. Killing all nepomuk related processes does'nt help, because they get respawned.

Question: do I open another bug or would someone reopen this same?
Comment 27 Vishesh Handa 2013-03-22 10:16:24 UTC
(In reply to comment #26)
> It seems that this bug is back.
> 
> Tested on Archlinux:
> Linux 3.8.3-2-ARCH #1 SMP PREEMPT Sun Mar 17 13:04:22 CET 2013 x86_64
> GNU/Linux
> 
> My version KDE 4.10.1.  Fresh install of KDE (deleted all .kde4, .local and
> .config files before loging in).
> 
> Configuration:
> Configuration Settings -> Nepomuk server -> Indexing: "ignore all removable
> devices" 
> ... -> Custom folders: I've left only folders inside /home checked
> 
> How to reproduce:
> 1) Place an USB drive copy, copy some video files to it.
> 2) Try to "safely remove" from Dolphing or from "available devices" on the
> system tray. Message says impossible because there is a program accessing it.
> 3) From the command line, issuing lsof /dev/sdd1 shows nepomuk.
> 
> Not even root can unmount the device. Killing all nepomuk related processes
> does'nt help, because they get respawned.
> 
> Question: do I open another bug or would someone reopen this same?

What you're describing is bug 304943. Also, you might want to read this - http://userbase.kde.org/Nepomuk/FileIndexer
Comment 28 korgens 2013-03-22 16:24:56 UTC
Thanks for the pointer. Did anyone create the other bug report (about nupomuk ignoring the indexing settings)?