Bug 470382 - Baloo unable to index any files
Summary: Baloo unable to index any files
Status: REPORTED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: balooctl (show other bugs)
Version: 5.106.0
Platform: Arch Linux Linux
: NOR major
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-29 01:09 UTC by Ovear
Modified: 2023-06-11 05:37 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ovear 2023-05-29 01:09:07 UTC
STEPS TO REPRODUCE
1. Search new files
2. Use balooctl to index those unindex files

OBSERVED RESULT

Manually index through `balooctl index` result no responding no matter what file is used(including files that not exist).

And baloo has been stopped index file for days, even after execute `balooctl check`, still not work.

Besides that `balooctl status` takes long time to return result.

EXPECTED RESULT

Baloo can index new files normally.


SOFTWARE/OS VERSIONS

Operating System: EndeavourOS 
KDE Plasma Version: 5.27.5
KDE Frameworks Version: 5.106.0
Qt Version: 5.15.9
Kernel Version: 6.3.4-arch1-1 (64-bit)
Graphics Platform: Wayland

ADDITIONAL INFORMATION

Nothing releated was found in journalctl. inotify checked through inotifywait and confirmed work.

```
$ time balooctl status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 1,192,757
Files waiting for content indexing: 1,554
Files failed to index: 0
Current size of index is 5.78 GiB

real    0m50.142s
user    0m0.010s
sys     0m0.007s

$ balooctl indexSize
File Size: 5.78 GiB
Used:      290.39 MiB

           PostingDB:       1.47 GiB   517.279 %
          PositionDB:       1.92 GiB   677.729 %
            DocTerms:     666.52 MiB   229.526 %
    DocFilenameTerms:      77.92 MiB    26.832 %
       DocXattrTerms:       4.00 KiB     0.001 %
              IdTree:      20.74 MiB     7.143 %
          IdFileName:      86.97 MiB    29.950 %
             DocTime:      47.92 MiB    16.503 %
             DocData:      12.54 MiB     4.317 %
   ContentIndexingDB:      48.00 KiB     0.016 %
         FailedIdsDB:            0 B     0.000 %
             MTimeDB:       3.54 MiB     1.217 %

$ balooctl monitor
Press ctrl+c to stop monitoring
File indexer is running
Idle
```
Comment 1 tagwerk19 2023-06-01 19:49:07 UTC
Can you perhaps check whether baloo is being throttled by its systemd service settings? There has been a recent change to these.

Arch uses systemd, or at least Google thinks so, so try

    systemctl status --user kde-baloo

and see if that comes up with anything strange. It should give you the config file, likely:

    /usr/lib/systemd/user/kde-baloo.service

Check this and you will probably see a recently added line:

    MemoryHigh=512M

This limits the amount of memory that baloo can grab. It could be that this is too tight, you'll see if you are up at the limit in the systemctl status (it will give you "high" and "available" values for memory). You can try changing the line to:

    MemoryHigh=50%

and see if that makes any difference.
Comment 2 Ovear 2023-06-01 20:50:00 UTC
Hi,

Oh, change MemoryHigh works for me. According to status output from systemd, baloo just hits the memory limited by systemd.

> Memory: 600.6M (high: 512.0M available: 0B)

Is that possible to output a warning when baloo failed to request more memory as both systemd and baloo keep silence when it happened?

Or may be adjust default memory limit is an option? I am not sure what is the optimal minimal required memory to keep Baloo from freezing. 512M may be a bit insufficient.

But I also noticed Baloo doesn't seem to release memory after indexed. 
Except that, is possible to find what file / reason it wants to index that caused those high memory demand.

> Memory: 8.1G (high: 31.2G available: 23.0G)

Anyway, Thanks for your help. 
I will try to add this to Baloo's ArchWiki troubleshoot section.

To those who may be affected by this, here is the process to increase systemd memory limit.

1. Use systemd edit function to override default limit
> # systemctl --user edit kde-baloo

2. You may modify the file to something like below.

```
### Anything between here and the comment below will become the contents of the drop-in file

[Service]
MemoryHigh=50%

### Edits below this comment will be discarded
```

3. Restart kde-baloo, and it should work.
> # systemctl restart --user kde-baloo
Comment 3 Ovear 2023-06-01 20:57:05 UTC
Sorry for the typo.

> Except that, is that possible to find what file / reason it wants to index that caused those high memory demand?
Comment 4 tagwerk19 2023-06-02 05:59:10 UTC
> ... change MemoryHigh works for me. According to status output from systemd,
> baloo just hits the memory limited by systemd ...
That was a lucky guess then :-)

> Is that possible to output a warning when baloo failed to request more memory
> as both systemd and baloo keep silence when it happened?
From reading, baloo doesn't get an immediate failure, it just gets pressure to release memory. The problem is that in this case baloo does need the memory (or thinks it does)...

> But I also noticed Baloo doesn't seem to release memory after indexed.
> Except that, is possible to find what file / reason it wants to index that
> caused those high memory demand.
My working assumption has been that you are seeing the memory used for "cached" info, the memory mapped bits of database. In the simple case, that's no problem, when there's memory pressure, the memory will be released (and the info read back from the disc again when called for). That's more complicated if you are wanting to write a transaction to disc, the memory will be flagged dirty and must be held in memory until committed. That's the way, I think, that LMDB works..

What happens though is that, when baloo does its initial sweep through the filesystems to build a list of what's changed and what's new, this information is handled as a single transaction. That's an "Ouch" when you've got a very large number of changes. There's a wonderful, eye watering example in Bug 394759 (scroll down to the 13th comment). Your note that you'd indexed over a million files was a clue...

I'd say the

    MemoryHigh=50%

is reasonable although it's a juggling act. You want to allow baloo enough space to do its initial indexing but it shouldn't impact the rest of the system

The proper solution here would be for baloo to split up the work in its "first pass" and commit every 15 or 30 seconds rather than build up a single large transaction.
Comment 5 tagwerk19 2023-06-02 23:38:25 UTC
(In reply to tagwerk19 from comment #4)
> ... There's a wonderful, eye watering
> example in Bug 394759 (scroll down to the 13th comment) ...
Sorry that should be:
    https://bugs.kde.org/show_bug.cgi?id=394750#c13

The change that introduced the limit is here:
   https://invent.kde.org/frameworks/baloo/-/merge_requests/124