Bug 208602 - strigi indexer fails to handle symbolic links
Summary: strigi indexer fails to handle symbolic links
Status: RESOLVED UNMAINTAINED
Alias: None
Product: nepomuk
Classification: Unmaintained
Component: fileindexer (show other bugs)
Version: 4.3
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Sebastian Trueg
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-09-26 15:02 UTC by uetsah
Modified: 2018-09-04 15:22 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description uetsah 2009-09-26 15:02:13 UTC
Version:            (using KDE 4.3.1)
OS:                Linux
Installed from:    Unlisted Binary Package

After doing "ln -s ~/Documents/www ~/public_html" in order to get my new Apache configuration to work, CPU usage and disk i/o suddenly went to maximum and the system became very unresponsive. From the Nepomuk systray applet, I found that strigi was indexing the files in this 'new' public_html directory. (The Nepomuk store size also went up by several megabytes in the process.)

In addition to the fact that indexing should not be so aggressive as to prevent normal operation of the Computer for a cetrain amount of time (I filed a feature request for tackling that problem already), it really should not have indexed those files at all, since they're only symlink "instances" of the files in the Documents/www folder, which had already been indexed previously.

Isn't Nepomuk itself all about separating physical data and logical instances, and remembering those connections in an efficient way? Well, in a way, symlinks do just that, so Nepomuk should not counteract them by keeping all that data twice in its index store. I am therefore reporting this issue as a bug.

(When I then added "~/public_html/" to the strigi index exclude filters list as a manual "fix", the Nepomuk store also didn't shrink back to it's previous size as I would have expected, so I filed a feature request for that too: https://bugs.kde.org/show_bug.cgi?id=208596)
Comment 1 skierpage 2011-11-26 10:26:14 UTC
In 4.7.2, Nepomuk::IndexScheduler::analyzeDir() simply bails if a directory is a symlink and it doesn't index my symlinked directories even when they're checked in System Settings > Desktop Search > Desktop index folders > Customize index
folders… > Strigi Index Folders (bug 287593), so the behavior might have changed.

Ideally Strigi would know only index the file contents once but would know about multiple symlinks (and hard links?) to the same file inode.
Comment 2 Vishesh Handa 2012-12-30 13:56:02 UTC
I'm not sure what to do with this bug report.

With 4.10 (and 4.9. I think) Nepomuk does not follow system links. With 4.11, we might change that, but for now, we do not plan to follow them.

Can I mark this bug as fixed?
Comment 3 Andrew Crouthamel 2018-09-04 15:22:45 UTC
Hello! Sorry to be the bearer of bad news, but this project has been unmaintained for many years so I am closing this bug. Development has moved to Baloo, please try again using the latest version and applications, and submit a new ticket for frameworks-baloo if you still have an issue. Thank you!