Summary: | Baloo does not remove deleted files from index | ||
---|---|---|---|
Product: | [Frameworks and Libraries] frameworks-baloo | Reporter: | Ongun Kanat <ongun.kanat> |
Component: | general | Assignee: | Pinak Ahuja <pinak.ahuja> |
Status: | ASSIGNED --- | ||
Severity: | normal | CC: | anthony, aspotashev, b-misc, bauer.klaus.dieter, bugseforuns, d0048, dashonwwIII, eric1, francogpellegrini, heri+kde, igor.poboiko, jackhill3103, johann.hoechtl, kde, kdeu, kitts.mailinglists, leftcrane, Marco.Leise, meven, nate, oded, ongun.kanat, peter, pinak.ahuja, rulatir, skierpage, smowtenshi, sommerluk, tagwerk19, trmdi, unlovable_fridge356, zalimannard |
Priority: | HI | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Kubuntu | ||
OS: | Linux | ||
See Also: | https://bugs.kde.org/show_bug.cgi?id=429006 | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Ongun Kanat
2015-10-13 19:53:45 UTC
*** Bug 370429 has been marked as a duplicate of this bug. *** *** Bug 373430 has been marked as a duplicate of this bug. *** *** Bug 362226 has been marked as a duplicate of this bug. *** *** Bug 377302 has been marked as a duplicate of this bug. *** *** Bug 374736 has been marked as a duplicate of this bug. *** Re-posting my comments from a duplicate bug here. It is not possible to clear deleted files from the db, baloo returns the error: "Could not stat file" This is because of line 243 in main.cpp, where non-existing files are skipped. We should be happy that we found a wrong record referring to a non-existing file in our db of files, and remove the wrong file record instead. https://github.com/KDE/baloo/compare/master...vitamins:patch-1 Hmm it's not that simple, since the next check also fails if id is 0, and we seem to need the id to remove the record, but it is 0 for non-existing files. tr.removeDocument(id) Can we remove using only the url instead? Clearing an existing file which is in an indexed path is also problematic, since it will get added back later on automatically, reverting the clear action. *** Bug 388761 has been marked as a duplicate of this bug. *** The only way to overcome this currently seems to be to reindex. balooctl disable balooctl enable Sometimes doing a `baloo check` resolve the random files that appear, sometimes it doesn't. This should have been fixed by https://phabricator.kde.org/D15939 and commit https://phabricator.kde.org/R293:f8897a2511c4652c203bf25f6d788d0a698e4203 Feel free to reopen if this bug still affects you. Not fixed on 5.21.4 Moved a whole bunch of files to an external drive several days ago. Multiple reboots AND baloo enable/disable cycles later, the files are still showing up in Krunner. I went ahead a purged the database with "balootcl --purge" and ... the nonexistent files are still being helpfully found by krunner/kickoff. Baloo is a framework BTW, not a part of Plasma (5.21.4 is a plasma version) If the index is removed entirely yet deleted files are still found in a search, then the fault is elsewhere, in whatever is caching the old content. Well the purge worked, after logout. So krunner/kickoff's only fault is - possibly - that they don't update results until you logout. The bug is with baloo. Try it on your system, you should get a similar result. I should have checked the files directly from balooctl of course, but in all likelihood this is a baloo bug, given that purging the database worked. (In reply to leftcrane from comment #11) > Moved a whole bunch of files to an external drive several days ago. Multiple > reboots AND baloo enable/disable cycles later, the files are still showing > up in Krunner. If "a whole bunch" is more than: sysctl fs.inotify.max_queued_events baloo might not have seen the delete notifications. It also seems that deletions are not finished when baloo is closed down (on a logout, say), the entries end up stuck in the index. Some extra info: https://bugs.kde.org/show_bug.cgi?id=437754#c1 No, it's definitely less than that. BTW, I saw two recent reports from reddit of the same problem. When I download a file, baloosearch can't detect it until I manually index it with balooctl index. Is it a related bug, or does it need a delayed time before indexing new files ? (In reply to trmdi from comment #17) > When I download a file, baloosearch can't detect it until I manually index > it with balooctl index. Is it a related bug, or does it need a delayed time > before indexing new files ? No, I don't think there's a delay. Baloo should "pick up" a new or changed file immediately and put in the queue for "full text" indexing. If you do a "balooctl index ..." you are telling baloo to do it there and then. If you are running on battery then I think baloo waits with the full text index until the machine's plugged back in. I think that it also "backs off" on indexing if it sees that the system is heavily used. If baloo is not "immediately" noticing a new file then there's a bit of troubleshooting to do. If it doesn't notice a change but it is found with a "balooctl check", then it's worth looking at the inotify settings (particularly if you are using Neon and have loads of folders). See what sysctl fs.inotify.max_user_watches says, if this is smaller than the number of folders you have then baloo won't see changes as they happen. Note that "balooctl index ..." does a one-off indexing of the file, irrespective of whether it's in a folder you want indexed or not. External discs are, for example, not automatically indexed but a "balooctl index ..." would index a file on them. There's a lot of bases to cover here, if the above doesn't help, maybe open a new bug and include all the details. Still happening on Arch with kde plasma 5.22.5. Only solutiom is to purge the index, reindex, then logout and back in. A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/113 (In reply to Bug Janitor Service from comment #20) > A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/113 From the MR: > ... After I applied this patch, killed baloo_file, deleted an indexed file, and started baloo_file again, > the deleted file didn't appear anymore in the balooseach results. That didn't happen with the > unpatched baloo, the deleted file was still there and trying to open it with KRunner did nothing ... So the test sequence is: Create a test file Check that it is indexed (including file content) Kill baloo Delete the file Restart baloo and check whether the file is still in the index. Yes, the file is still in the index, as per baloosearch. This is slightly more specific than comment 0 but consistent with what I've seen when having deleted a large folder and not waiting until baloo has cleared all its entries or if there are too many deletes and notifications "overflow", as mentioned in https://bugs.kde.org/show_bug.cgi?id=437754#c1 It might be worth mentioning a couple of the wraiths in the mist... Where the device number has changed and baloo reindexed the files, deleting the test file even when baloo_file is running will not result in the earlier entry being removed. This is in the cases with BTRFS and multiple subvols, such as with openSUSE, where "baloosearch -i searchstring" shows several hits with different DocIDs, see https://bugs.kde.org/show_bug.cgi?id=402154#c12 There's also the possibility that krunner caches the data from baloo and presents remembered results... Revisited with Neon Unstable: Plasma: 5.27.80 Frameworks: 5.104.0 Qt: 5.15.8 Kernel: 5.19.0-35-generic (64-bit) Baloo should be be able to fix this using fanotify https://man7.org/linux/man-pages/man7/fanotify.7.html for any user with linux 5.1+. (In reply to Méven from comment #22) > Baloo should be be able to fix this using fanotify > https://man7.org/linux/man-pages/man7/fanotify.7.html for any user with > linux 5.1+. I see a: Calling fanotify_init() requires the CAP_SYS_ADMIN capability. presumably meaning fanotify needs admin rights. (In reply to tagwerk19 from comment #23) > (In reply to Méven from comment #22) > > Baloo should be be able to fix this using fanotify > > https://man7.org/linux/man-pages/man7/fanotify.7.html for any user with > > linux 5.1+. > I see a: > Calling fanotify_init() requires the CAP_SYS_ADMIN capability. > presumably meaning fanotify needs admin rights. It seems to me that's not what man fanotify documentation says. The example does not make use of it either. It mention fanotify should not be run with CAP_SYS_ADMIN or unprivileged users would have access to more than they should. (In reply to Méven from comment #24) > (In reply to tagwerk19 from comment #23) > > (In reply to Méven from comment #22) > > > Baloo should be be able to fix this using fanotify > > > https://man7.org/linux/man-pages/man7/fanotify.7.html for any user with > > > linux 5.1+. > > I see a: > > Calling fanotify_init() requires the CAP_SYS_ADMIN capability. > > presumably meaning fanotify needs admin rights. > > It seems to me that's not what man fanotify documentation says. > The example does not make use of it either. > It mention fanotify should not be run with CAP_SYS_ADMIN or unprivileged > users would have access to more than they should. Sorry you are right https://man7.org/linux/man-pages/man2/fanotify_init.2.html The API does need CAP_SYS_ADMIN. So baloo could achieve this using an external root-owned with sticky bit exec whose only role would be to send to baloo files changes in index directories. (In reply to Méven from comment #25) > The API does need CAP_SYS_ADMIN. This was indeed true, up until Linux 5.12: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/notify/fanotify/fanotify_user.c?h=v5.12#n923 Since Linux 5.13, `CAP_SYS_ADMIN` is no longer required and instead just limits the flags you can use and the behavior you can expect: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/notify/fanotify/fanotify_user.c?h=v5.13#n1044 With 5.13 and later, without `CAP_SYS_ADMIN` you cannot set fanotify for filesystem/mount wide marks and you can only get events with a file descriptor (that you can use, AFAIU): https://patchwork.kernel.org/project/linux-fsdevel/patch/20210524135321.2190062-1-amir73il@gmail.com/ I believe this should still be good enough for Baloo's purposes, as we are only expecting Baloo to use FAN_MARK_INODE for directories listed "File Search" configuration. (In reply to Oded Arbel from comment #26) > ... I believe this should still be good enough for Baloo's purposes ... What does fanotify beyond inotify? I know deleting "too many things all at once" can overflow the inotify queue. Also that when unpacking a .tar folders can be created and files extracted into them faster than baloo can set up notify watches on the folders. It used to be that max_user_watches was too small (on some distributions) but I think no longer a problem. (In reply to tagwerk19 from comment #27) > (In reply to Oded Arbel from comment #26) > > ... I believe this should still be good enough for Baloo's purposes ... > What does fanotify beyond inotify? > Also that when unpacking a .tar folders can be created and files > extracted into them faster than baloo can set up notify watches on the > folders. fanotify allows you to ignore that issue by setting up one watch on each of the folders configured in Baloo's KCM and that's it - there are no more race conditions between applications creating folders and Baloo putting inotify watches on them. > I know deleting "too many things all at once" can overflow the inotify > queue. That is possibly still going to be an issue with fanotify - the default event queue with fanotify is 16384 events, and without `CAP_SYS_ADMIN` you can't increase that size - thought it is likely that baloo can consume events fast enough for this not to be a serious issue. (In reply to Oded Arbel from comment #28) >[...] > fanotify allows you to ignore that issue by setting up one watch on each of > the folders configured in Baloo's KCM and that's it - there are no more race > conditions between applications creating folders and Baloo putting inotify > watches on them. Sorry, I will add my 50 cents here. `man fanotify` claims that watches are not recursive, and should be set up for subdirectories separately, so such race condition is still there. Those could have been avoided if we could put a mark for the whole mount point / tree, but AFAIK that requires CAP_SYS_ADMIN. (In reply to Igor Poboiko from comment #29) > (In reply to Oded Arbel from comment #28) > >[...] > > fanotify allows you to ignore that issue by setting up one watch on each of > > the folders configured in Baloo's KCM and that's it - there are no more race > > conditions between applications creating folders and Baloo putting inotify > > watches on them. > > Sorry, I will add my 50 cents here. `man fanotify` claims that watches are > not recursive, and should be set up for subdirectories separately, so such > race condition is still there. Those could have been avoided if we could put > a mark for the whole mount point / tree, but AFAIK that requires > CAP_SYS_ADMIN. I tested the program provided in the example and it works reporting any event whose type we ask, happening on a filesystem. Here I am not sure what this recursive applies to. (In reply to Méven from comment #30) > I tested the program provided in the example and it works reporting any > event whose type we ask, happening on a filesystem. > Here I am not sure what this recursive applies to. Did you run the test with `CAP_SYS_ADMIN`? If so, did you test filesystem marks or directory marks? The "recursive" issue is that fanotify only improves upon the issues with inotify if you can set a watch on a directory and receive all events on all of its subdirectories - without needing to register more watches on each sub directory. If this is not he case - as the man page definitely claim that it is not (unless you use mount or filesystem marks) - then we're still stuck with the race condition of a fast application (such as Ark) creating new directories and immediately new directories within them, and Baloo will not see files created in the sub directories. I came to this bug report via this Reddit discussion: https://www.reddit.com/r/kde/comments/nud5kj/outdated_file_results_in_application_launcher/ It describes a possible tightly related issue, where search results in KRunner and Application Launcher may lag behind the results in baloo itself, indicating that there may be some unnecessary intermediate caching. E.g. "baloosearch cardona" already gives the expected result, but KRunner/Application Launcher give no result, or an outdated location. *** Bug 429006 has been marked as a duplicate of this bug. *** *** Bug 457746 has been marked as a duplicate of this bug. *** For what its worth, I am no longer experiencing this issue. My apologies for not coming back to this sooner. |