Bug 437754 - "balooctl status" can trigger high memory use
Summary: "balooctl status" can trigger high memory use
Status: CONFIRMED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: balooctl (show other bugs)
Version: 5.82.0
Platform: Neon Linux
: NOR normal
Target Milestone: ---
Assignee: Stefan Brüns
URL:
Keywords:
: 449713 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-05-27 21:26 UTC by tagwerk19
Modified: 2022-11-30 17:14 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description tagwerk19 2021-05-27 21:26:43 UTC
SUMMARY:

    Baloo's index size and memory usage can balloon when running
    "balooctl status" while baloo is handling file deletions.

STEPS TO REPRODUCE:

    Create a temporary folder and create 50000 one-line files in it:

        mkdir ~/Testdir
        cd ~/Testdir
        for i in {1..50000}; do echo "This is file $i" > file$i.txt; done 

    Baloo will need a while to index these, watch with

        balooctl monitor 

    in one window and check the count of indexed files with

        balooctl status 

    It's quite likely that creating so many files so quickly hits the
    inotify "event limit" and baloo doesn't get told of all the new files.
    Run

        balooctl check 

    to get it to look for any files it's missed. Keep an eye on the index
    size and the memory used by baloo_file as shown by htop. There's
    nothing remarkable

    Remove the test folder

        rm -r ~/Testdir 

    Keep watching the index size and the memory used by baloo_file. Still
    no sign anything untoward...

    Run

        balooctl status; balooctl indexSize 

    a few times and ... 

OBSERVED RESULTS:

    ... "balooctl status" takes a considerable amount of time to respond.
    The "File Size" reported by "balooctl indexSize" increases quite
    dramatically (while the "Used" count is dropping slowly).
    "balooctl monitor" does not report file deletions.

    Watching the memory used (MEM%) by baloo_file as shown by htop is
    similarly increasing, more-or-less in line with the "File Size"

    For my tests, the initial "File Size" was 50 Mbyte with "Used" 28 MByte.
    After several runs of "balooctl status" while baloo is dealing with file
    deletions, "File Size" had risen to 2.8 Gbyte.

        Baloo File Indexer is running
        Indexer state: Idle
        Total files indexed: 29,476
        Files waiting for content indexing: 0
        Files failed to index: 0
        Current size of index is 2.98 GiB

        File Size: 2.98 GiB
        Used:      16.07 MiB

    MEM% was also shown as 2.8 Gbyte, or about 75% of total memory (in
    a 4 GByte machine)

    The guesswork here is that "balooctl status" is counting the
    indexed files and is locking the db so that writes don't change the
    number. However the process of deleting entries continues and the
    changes are appended to the DB. This seems strange and better
    explanations are welcome 8-/

EXPECTED RESULTS:

    Baloo maintains a count of indexed files and "balooctl status" can
    show it without needing to lock the DB and count the entries.

    "balooctl monitor" should probably show files as they are deleted (as
    a bit of reassurance that something is happening)

SOFTWARE/OS VERSIONS:

    Checked on Neon Unstable...
    Plasma: 5.22.80
    Frameworks: 5.83.0
    Qt: 5.15.2 

ADDITIONAL INFORMATION:

    Once baloo_file memory usage has gone up, it does not drop down again.
    You need to restart baloo
Comment 1 tagwerk19 2021-05-28 12:13:12 UTC
(In reply to tagwerk19 from comment #0)
> It's quite likely that creating so many files so quickly hits the
> inotify "event limit" and baloo doesn't get told of all the new files.
Creating the testfiles via a script can give you a

    kf.baloo: Inotify - too many event - Overflowed

and you need to run "balooctl check" when the script has finished to find the rest of the newly created files.

It is also possible to get an "Overflowed" message when deleting files and when this happens, baloo stops removing deleted entries and a "balooctl check" does not to resolve the situation.

EXPECTED RESULTS:

Ideally, if a baloo receives an "inotify overflow", it should queue up a "balooctl check" 

"balooctl check" should recognise up files that no longer exist in the filesystem and remove the index entries
Comment 2 Martin Steigerwald 2021-06-26 11:20:42 UTC
tagwerk can you show the output of "balooctl indexSize". For me it currently is:

% balooctl indexSize
File Size: 8,12 GiB
Used:      77,17 MiB

           PostingDB:       1,36 GiB  1801.974 %
          PositionDB:       1,68 GiB  2228.958 %
            DocTerms:     784,75 MiB  1016.891 %
    DocFilenameTerms:      68,79 MiB    89.137 %
       DocXattrTerms:       4,00 KiB     0.005 %
              IdTree:      17,55 MiB    22.742 %
          IdFileName:      77,23 MiB   100.071 %
             DocTime:      51,05 MiB    66.157 %
             DocData:      38,66 MiB    50.101 %
   ContentIndexingDB:       9,32 MiB    12.077 %
         FailedIdsDB:            0 B     0.000 %
             MTimeDB:      15,06 MiB    19.518 %

I do not claim I understand the output though.

2228% of what? Why 77 MiB used?
Comment 3 tagwerk19 2021-06-26 18:37:40 UTC
(In reply to Martin Steigerwald from comment #2)
> % balooctl indexSize
> File Size: 8,12 GiB
> Used:      77,17 MiB
There's some analysis/discussion/confusion about the percentages here:

    https://bugs.kde.org/show_bug.cgi?id=354636#c10

I think the "used" sizes are believable. I have copied/compressed a test index with

    mdb_copy -n -c index index.new

(from lmdb-utils) and this changes "indexSize" details from:

    File Size: 2,28 GiB
    Used:      18,99 MiB

               PostingDB:       4,89 MiB    25.735 %
              PositionDB:       4,92 MiB    25.921 %
                DocTerms:       2,47 MiB    13.001 %
        DocFilenameTerms:       1,70 MiB     8.969 %
           DocXattrTerms:       4,00 KiB     0.021 %
                  IdTree:     240,00 KiB     1.234 %
              IdFileName:       1,94 MiB    10.204 %
                 DocTime:       1,29 MiB     6.809 %
                 DocData:       1,53 MiB     8.044 %
       ContentIndexingDB:            0 B     0.000 %
             FailedIdsDB:            0 B     0.000 %
                 MTimeDB:      12,00 KiB     0.062 %

to:

    File Size: 19,37 MiB
    Used:      18,99 MiB

               PostingDB:       4,89 MiB    25.735 %
              PositionDB:       4,92 MiB    25.921 %
                DocTerms:       2,47 MiB    13.001 %
        DocFilenameTerms:       1,70 MiB     8.969 %
           DocXattrTerms:       4,00 KiB     0.021 %
                  IdTree:     240,00 KiB     1.234 %
              IdFileName:       1,94 MiB    10.204 %
                 DocTime:       1,29 MiB     6.809 %
                 DocData:       1,53 MiB     8.044 %
       ContentIndexingDB:            0 B     0.000 %
             FailedIdsDB:            0 B     0.000 %
                 MTimeDB:      12,00 KiB     0.062 %

Which points at loads of "empty space" created during the deletions/status. This is on ext4, after having created 50000 files and deleted 20000. I will try the same on BTRFS

Whether this helps any...?
Comment 4 Martin Steigerwald 2021-06-26 19:20:49 UTC
I helped, but not as much as with your setup:

% ~/.local/share/baloo> balooctl indexSize
File Size: 8,12 GiB
Used:      79,78 MiB

           PostingDB:       1,36 GiB  1743.970 %
          PositionDB:       1,68 GiB  2157.734 %
            DocTerms:     785,45 MiB   984.552 %
    DocFilenameTerms:      68,79 MiB    86.226 %
       DocXattrTerms:       4,00 KiB     0.005 %
              IdTree:      17,55 MiB    22.000 %
          IdFileName:      77,23 MiB    96.803 %
             DocTime:      51,05 MiB    63.996 %
             DocData:      38,67 MiB    48.475 %
   ContentIndexingDB:       9,29 MiB    11.649 %
         FailedIdsDB:            0 B     0.000 %
             MTimeDB:      15,06 MiB    18.881 %
% ~/.local/share/baloo> mdb_copy -n -c index index.new

% ~/.local/share/baloo> LANG=en ls -lh
total 13G
-rw-r--r-- 1 martin martin 8.2G Jun 26 15:21 index
-rw-r--r-- 1 martin martin 8.0K Jun 26 21:02 index-lock
-rw-r--r-- 1 martin martin 4.1G Jun 26 21:02 index.new

% ~/.local/share/baloo> mv index.new index

% ~/.local/share/baloo> balooctl indexSize            
File Size: 4,08 GiB
Used:      79,78 MiB

           PostingDB:       1,36 GiB  1743.970 %
          PositionDB:       1,68 GiB  2157.734 %
            DocTerms:     785,45 MiB   984.552 %
    DocFilenameTerms:      68,79 MiB    86.226 %
       DocXattrTerms:       4,00 KiB     0.005 %
              IdTree:      17,55 MiB    22.000 %
          IdFileName:      77,23 MiB    96.803 %
             DocTime:      51,05 MiB    63.996 %
             DocData:      38,67 MiB    48.475 %
   ContentIndexingDB:       9,29 MiB    11.649 %
         FailedIdsDB:            0 B     0.000 %
             MTimeDB:      15,06 MiB    18.881 %
Comment 5 medin 2022-02-06 23:18:54 UTC
*** Bug 449713 has been marked as a duplicate of this bug. ***
Comment 6 tagwerk19 2022-10-16 21:40:23 UTC
Flagging as Confirmed on the basis of Bug 460460