Summary: | Baloo indexes files temporarily mounted from other file systems | ||
---|---|---|---|
Product: | [Frameworks and Libraries] frameworks-baloo | Reporter: | Adam Fontenot <adam.m.fontenot+kde> |
Component: | general | Assignee: | baloo-bugs-null |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | nate, tagwerk19 |
Priority: | NOR | ||
Version: | 5.99.0 | ||
Target Milestone: | --- | ||
Platform: | Arch Linux | ||
OS: | Linux | ||
Latest Commit: | https://invent.kde.org/frameworks/baloo/-/commit/373cf1e567e2580145f137176d440da27c319f06 | Version Fixed In: | Frameworks 6.1 |
Sentry Crash Report: |
Description
Adam Fontenot
2022-10-15 22:09:55 UTC
(In reply to Adam Fontenot from comment #0) > ... I'm not sure if Baloo regards these files as deleted when the > remote is unmounted, but if it does the result is probably disk thrashing > as it has to update its index ... It seems to do. And yes, large scale deletes seem to be hard work for baloo (as described in Bug 442453) What might muddle things is dismounting such an sshfs mount from within your home directory might not generate an inotify that files have gone. Would need to check that. I suspect if you do a "fusermount -u XXXX", baloo doesn't notice unless you do a "balooctl check" or until the next time you log in. (In reply to Adam Fontenot from comment #0) > ... Baloo was intelligently skipping other file systems in its indexing ... That had apparently been implemented a while back - Bug 333433 But does seem to happen with sshfs https://bugs.kde.org/show_bug.cgi?id=460508#c3 Although don't have a feeling for what "people would expect (or want) to happen" Confirming... (In reply to tagwerk19 from comment #2) > (In reply to Adam Fontenot from comment #0) > > ... Baloo was intelligently skipping other file systems in its indexing ... > That had apparently been implemented a while back - Bug 333433 > > Although don't have a feeling for what "people would expect (or want) to > happen" I'm thinking given Bug 333433 that Baloo should not index mounted file systems by default. Maybe an option could be provided, but the ability to manually "opt-in" specific directories by adding them to the indexing list is probably good enough. A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/96 (In reply to Adam Fontenot from comment #3) > ... Maybe an option could be provided, but the ability to > manually "opt-in" specific directories by adding them to the indexing list > is probably good enough ... Wandering off into the territory of "personal preferences" here, but I trust the idea of fewest surprises... I think, if you plug in and mount a USB device (appearing under /run/media/<username>) it's not indexed. It's "removable". If you've taken the extra step and mounted something through /etc/fstab, it's not "removable", it can be indexed (if you wish). Extending the model to sshfs... A command line sshfs mount, even if mounted within your home directory, should be considered "removable" and the contents should not be indexed by default. Whereas you should be able to index an sshfs mount that was set up in /etc/fstab Any folder include/excludes in the .config/baloofilerc should take precidence. It's not so clear cut with Dolphin, I can imagine people expecting a search "From here" to search everything below "here", irrespective of whether it is local or remote. At least for filename searches. You can see many bugs/issues/queries logged about Dolphin/Baloo's behaviour with symlinks, where the expectation is that "symlinks are followed". I think this is a reasonable indicator... (In reply to tagwerk19 from comment #5) > (In reply to Adam Fontenot from comment #3) > > ... Maybe an option could be provided, but the ability to > > manually "opt-in" specific directories by adding them to the indexing list > > is probably good enough ... > Wandering off into the territory of "personal preferences" here, but I trust > the idea of fewest surprises... Right - I think the one case where we can say indexing definitely *shouldn't* happen is when something is mounted "temporarily" - although maybe that isn't clearly defined yet. Basically, if there's any reason to think the path might be expected to change? I agree with you that something's being in fstab is a good sign it's "permanent" and should be indexed. However, I think if Baloo is going to do that, several footguns need to be avoided: * The heuristics for determining which filesystems are permanent need to be pretty much flawless. You could have an fstab set up so that multiple USB drives are all mounted on demand to ~/usb. That's pretty much a worst case scenario. Files suddenly appear and disappear, Baloo trashes the database trying to delete everything, etc. * Some method for determining that a given file system is network-based is probably needed; I think content indexing should probably be turned off for these file systems by default. The user could always opt in for individual directories as needed. * Baloo needs to have a mechanism where downstream search tools don't see files on unmounted file systems in their searches. In the mean time, I think the right move is to fix issues like this one (Baloo indexing huge file systems - multiple terabytes in my case - over the network) by changing the defaults so that Baloo never crosses file systems unless the user manually opts a folder in. We can make this work better when the issues above are solved. Users with permanent file systems they want indexed are in a better position anyway. Because their paths are static, they can manually include things in the indexing list easily. I think that's one good reason to lean in the direction of not trying to do too much magic in Baloo by default. When Baloo is including stuff in directories that don't have static locations and you want it to stop, there's not much you can do about that. My merge request from Oct 2022 never got any review. I rebased the changes and I'm leaving a note about it here in the hope that we can close this issue. (In reply to Adam Fontenot from comment #7) > My merge request from Oct 2022 never got any review. I rebased the changes > and I'm leaving a note about it here in the hope that we can close this > issue. I remember stumbling upon this and then finding it near impossible to find it again. https://invent.kde.org/plasma/plasma-desktop/-/issues/71 It might be of interest, it certainly runs against some of my earlier thoughts. (In reply to tagwerk19 from comment #8) > (In reply to Adam Fontenot from comment #7) > > My merge request from Oct 2022 never got any review. I rebased the changes > > and I'm leaving a note about it here in the hope that we can close this > > issue. > I remember stumbling upon this and then finding it near impossible to find > it again. > https://invent.kde.org/plasma/plasma-desktop/-/issues/71 > It might be of interest, it certainly runs against some of my earlier > thoughts. Encouraging better choices with network mounts is a good thing, but this is still needed. The issues I've seen with indexing are the result of FUSE mounts, not /etc/fstab or `mount -t nfs` type mounts that hang in the way this Plasma issue describes. Git commit 373cf1e567e2580145f137176d440da27c319f06 by Felix Ernst, on behalf of Adam Fontenot. Committed on 29/03/2024 at 10:32. Pushed by felixernst into branch 'master'. Skip indexing KDE FS volumes unless user included In 69411a, we changed the indexer behavior so that removable media is not indexed by default. This commit tries to extend this behavior to any temporarily mounted file system. For instance, fuse.sshfs and overlay mounted file systems are managed in Solid under the /org/kde/fstab parent. Most likely, users will not want to index these file systems by default. This commit also changes the initialization procedure for StorageDevices. We now attempt to create a cached entry for *all* Solid devices when initializing. It makes sense to do this because `createCacheEntry` is already called whenever a device is added or removed, without any further filtering. Trying to precisely specify which devices to include at the initialization stage risks leaving out devices like the /org/kde/fstab devices that are the subject of this PR. Related: bug 390830 M +3 -3 src/file/storagedevices.cpp https://invent.kde.org/frameworks/baloo/-/commit/373cf1e567e2580145f137176d440da27c319f06 |