Bug 472525 - Deleted files not being removed from baloo's index
Summary: Deleted files not being removed from baloo's index
Status: REPORTED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: 5.107.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-23 08:58 UTC by tagwerk19
Modified: 2025-03-03 17:06 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description tagwerk19 2023-07-23 08:58:56 UTC
SUMMARY

    Files in a directory indexed by baloo can be deleted but they do not
    disappear from the index

    There have been various issues logged about this, with troubleshooting
    and potential root causes buried in the comments. The closest, without actually
    pinpointing the answer, is probably Bug 437754.

    The issue here is where the user logs out or reboots before the process finished.

STEPS TO REPRODUCE:

1...

    Make sure you have content indexing enabled and you are indexing your
    test directory

2...

    Create a test directory with many files, following the steps in Bug 437754:

        for i in {00001..50000}; do echo "This is file $i" > file$i.txt; done

    and watch to see that the files are indexed (it's possible that you'll need
    a "balooctl check").

    Delete 1,000 of these files:

        rm file20*.txt 

3...

    Logout or reboot, log back in and check the search results

        baloosearch file20 | wc

    wait a bit and try again....

OBSERVED RESULT:

    The deleted files don't get removed from the index. The information that they've
    been deleted and need to be removed from the index has been lost in the restart.

    Running "balooctl check" does not nudge baloo to remove the entries

EXPECTED RESULT:

    Three things:

        Baloo should remember that files have been deleted and resume removing
        entries after a restart

        A "balooctl check" should identify missing files and queue the entries
        for removal (it may be that baloo has missed the iNotify messages that the
        files have been deleted)

        You should also be able to follow deletions with "balooctl monitor"

SOFTWARE/OS VERSIONS:

    This was tested on Fedora 38 (that has the BTRFS patch)

        Fedora Linux 38
        Plasma: 5.27.6
        Frameworks: 5.107.0
        Qt: 5.15.10

ADDITIONAL INFORMATION

    Baloo is slow removing deleted entries from its index and seems to commit
    a change to disc after removing each file. This is a lot slower (and far harder
    on an SSD) than when content indexing where batches of files are indexed
    and then committed.

    It's possible to script a clean-up, although it's a hack (part of which being you
    cannot get a complete list of files baloo has in its index so you have to ask for
    the file extensions you are interested in and I don't seem able to get the
    handling of filenames with embedded spaces to work as it ought to):

    baloosearch txt OR doc OR jpg OR jpeg OR png OR mp3 |
       sort -u |
       while read i; do
          if [ ! -e "$i" ]
             then echo $i
             fi
          done |
             sed -e 'N;s/\n/ /' |
             sed -s 'N;s/\n/ /' |
             sed -s 'N;s/\n/ /' |
             sed -s 'N;s/\n/ /' |
             sed -s 'N;s/\n/ /' |
             sed -s 'N;s/\n/ /' |
             sed -s 'N;s/\n/ /' |
             while read line; do balooctl clear $line; done

    This calls "balooctl clear" with batches of "not there anymore" files. Each clear
    takes longer but the total disc writes are less
Comment 1 tagwerk19 2023-07-24 07:01:34 UTC
(In reply to tagwerk19 from comment #0)
> ... the handling of filenames with embedded spaces to work as it ought to ...

This is better:

    baloosearch txt OR doc OR jpg OR jpeg OR png OR mp3 |
       sort -u |
       while read i; do
          if [ ! -e "$i" ]
             then echo "$i" | sed -e 's/ /\\ /g'
             fi
          done |
             sed -e 'N;s/\n/ /' |
             sed -e 'N;s/\n/ /' |
             sed -e 'N;s/\n/ /' |
             sed -e 'N;s/\n/ /' |
             sed -e 'N;s/\n/ /' |
             sed -e 'N;s/\n/ /' |
             sed -e 'N;s/\n/ /' |
             xargs -r balooctl clear