Bug 470382

Summary: Baloo unable to index any files
Product: [Frameworks and Libraries] frameworks-baloo Reporter: Ovear <ovearj>
Component: balooctlAssignee: baloo-bugs-null
Status: RESOLVED WORKSFORME    
Severity: major CC: ben, nate, tagwerk19
Priority: NOR    
Version: 5.106.0   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
See Also: https://bugs.kde.org/show_bug.cgi?id=470665
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Ovear 2023-05-29 01:09:07 UTC
STEPS TO REPRODUCE
1. Search new files
2. Use balooctl to index those unindex files

OBSERVED RESULT

Manually index through `balooctl index` result no responding no matter what file is used(including files that not exist).

And baloo has been stopped index file for days, even after execute `balooctl check`, still not work.

Besides that `balooctl status` takes long time to return result.

EXPECTED RESULT

Baloo can index new files normally.


SOFTWARE/OS VERSIONS

Operating System: EndeavourOS 
KDE Plasma Version: 5.27.5
KDE Frameworks Version: 5.106.0
Qt Version: 5.15.9
Kernel Version: 6.3.4-arch1-1 (64-bit)
Graphics Platform: Wayland

ADDITIONAL INFORMATION

Nothing releated was found in journalctl. inotify checked through inotifywait and confirmed work.

```
$ time balooctl status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 1,192,757
Files waiting for content indexing: 1,554
Files failed to index: 0
Current size of index is 5.78 GiB

real    0m50.142s
user    0m0.010s
sys     0m0.007s

$ balooctl indexSize
File Size: 5.78 GiB
Used:      290.39 MiB

           PostingDB:       1.47 GiB   517.279 %
          PositionDB:       1.92 GiB   677.729 %
            DocTerms:     666.52 MiB   229.526 %
    DocFilenameTerms:      77.92 MiB    26.832 %
       DocXattrTerms:       4.00 KiB     0.001 %
              IdTree:      20.74 MiB     7.143 %
          IdFileName:      86.97 MiB    29.950 %
             DocTime:      47.92 MiB    16.503 %
             DocData:      12.54 MiB     4.317 %
   ContentIndexingDB:      48.00 KiB     0.016 %
         FailedIdsDB:            0 B     0.000 %
             MTimeDB:       3.54 MiB     1.217 %

$ balooctl monitor
Press ctrl+c to stop monitoring
File indexer is running
Idle
```
Comment 1 tagwerk19 2023-06-01 19:49:07 UTC
Can you perhaps check whether baloo is being throttled by its systemd service settings? There has been a recent change to these.

Arch uses systemd, or at least Google thinks so, so try

    systemctl status --user kde-baloo

and see if that comes up with anything strange. It should give you the config file, likely:

    /usr/lib/systemd/user/kde-baloo.service

Check this and you will probably see a recently added line:

    MemoryHigh=512M

This limits the amount of memory that baloo can grab. It could be that this is too tight, you'll see if you are up at the limit in the systemctl status (it will give you "high" and "available" values for memory). You can try changing the line to:

    MemoryHigh=50%

and see if that makes any difference.
Comment 2 Ovear 2023-06-01 20:50:00 UTC
Hi,

Oh, change MemoryHigh works for me. According to status output from systemd, baloo just hits the memory limited by systemd.

> Memory: 600.6M (high: 512.0M available: 0B)

Is that possible to output a warning when baloo failed to request more memory as both systemd and baloo keep silence when it happened?

Or may be adjust default memory limit is an option? I am not sure what is the optimal minimal required memory to keep Baloo from freezing. 512M may be a bit insufficient.

But I also noticed Baloo doesn't seem to release memory after indexed. 
Except that, is possible to find what file / reason it wants to index that caused those high memory demand.

> Memory: 8.1G (high: 31.2G available: 23.0G)

Anyway, Thanks for your help. 
I will try to add this to Baloo's ArchWiki troubleshoot section.

To those who may be affected by this, here is the process to increase systemd memory limit.

1. Use systemd edit function to override default limit
> # systemctl --user edit kde-baloo

2. You may modify the file to something like below.

```
### Anything between here and the comment below will become the contents of the drop-in file

[Service]
MemoryHigh=50%

### Edits below this comment will be discarded
```

3. Restart kde-baloo, and it should work.
> # systemctl restart --user kde-baloo
Comment 3 Ovear 2023-06-01 20:57:05 UTC
Sorry for the typo.

> Except that, is that possible to find what file / reason it wants to index that caused those high memory demand?
Comment 4 tagwerk19 2023-06-02 05:59:10 UTC
> ... change MemoryHigh works for me. According to status output from systemd,
> baloo just hits the memory limited by systemd ...
That was a lucky guess then :-)

> Is that possible to output a warning when baloo failed to request more memory
> as both systemd and baloo keep silence when it happened?
From reading, baloo doesn't get an immediate failure, it just gets pressure to release memory. The problem is that in this case baloo does need the memory (or thinks it does)...

> But I also noticed Baloo doesn't seem to release memory after indexed.
> Except that, is possible to find what file / reason it wants to index that
> caused those high memory demand.
My working assumption has been that you are seeing the memory used for "cached" info, the memory mapped bits of database. In the simple case, that's no problem, when there's memory pressure, the memory will be released (and the info read back from the disc again when called for). That's more complicated if you are wanting to write a transaction to disc, the memory will be flagged dirty and must be held in memory until committed. That's the way, I think, that LMDB works..

What happens though is that, when baloo does its initial sweep through the filesystems to build a list of what's changed and what's new, this information is handled as a single transaction. That's an "Ouch" when you've got a very large number of changes. There's a wonderful, eye watering example in Bug 394759 (scroll down to the 13th comment). Your note that you'd indexed over a million files was a clue...

I'd say the

    MemoryHigh=50%

is reasonable although it's a juggling act. You want to allow baloo enough space to do its initial indexing but it shouldn't impact the rest of the system

The proper solution here would be for baloo to split up the work in its "first pass" and commit every 15 or 30 seconds rather than build up a single large transaction.
Comment 5 tagwerk19 2023-06-02 23:38:25 UTC
(In reply to tagwerk19 from comment #4)
> ... There's a wonderful, eye watering
> example in Bug 394759 (scroll down to the 13th comment) ...
Sorry that should be:
    https://bugs.kde.org/show_bug.cgi?id=394750#c13

The change that introduced the limit is here:
   https://invent.kde.org/frameworks/baloo/-/merge_requests/124
Comment 6 tagwerk19 2024-07-01 07:30:08 UTC
(In reply to tagwerk19 from comment #4)
> ... The proper solution here would be for baloo to split up the work in its
> "first pass" and commit every 15 or 30 seconds rather than build up a single
> large transaction ...
That was done - a thank you to Stefan....
    https://invent.kde.org/frameworks/baloo/-/merge_requests/148
There wasn't a link between the change and this issue though so the info didn't get through

OK to close this now?
Comment 7 Ovear 2024-07-07 18:48:42 UTC
Hi,

Thanks for the update! I hadn't noticed that this problem was fixed. I'll remove the overridden settings to see if kde-baloo works as expected.

For now, here's a brief update on the current memory usage, which I'm not quite sure how to interpret.

```
● kde-baloo.service - Baloo File Indexer Daemon
     Loaded: loaded (/usr/lib/systemd/user/kde-baloo.service; disabled; preset: enabled)
    Drop-In: /home/REDACTED/.config/system/user/kde-baloo.service.d
             └─override.conf
     Active: active (running) since Sat 2024-06-29 15:56:04 CST; 1 week 1 day ago
   Main PID: 3606 (baloo_file)
      Tasks: 2 (limit: 76867)
     Memory: 12.0G (high: 31.2G available: 5.2G peak: 13.2G) ----> Limit to 50% by override.conf
        CPU: 7min 39.112s
     CGroup: /user.slice/user-1000.slice/user@1000.service/background.slice/kde-baloo.service
             └─3606 /usr/lib/kf6/baloo_file
```
Comment 8 tagwerk19 2024-07-07 21:44:03 UTC
(In reply to Ovear from comment #7)
> Memory: 12.0G (high: 31.2G available: 5.2G peak: 13.2G) ----> Limit to 50% by override.conf :-/

Maybe the rest of the system is not asking for memory so Baloo is not being pushed to release it. I think I'd stepwise reduce the MemoryHigh value rather than removing the override. I'd be interested in what you discover.
Comment 9 Ovear 2024-07-10 10:52:12 UTC
Hi,

I have done some tests of this situation.

> I think I'd stepwise reduce the MemoryHigh value rather than removing the override.
Thanks for your suggestion, and you predicted exactly what will happen next.

First, the good news: baloo works with the updated version. It can initialize, load the index and is operational for both indexing and searching.

```
$ LANG=en_US.UTF8 balooctl6 status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 700,557
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 12.17 GiB

● kde-baloo.service - Baloo File Indexer Daemon
     Loaded: loaded (/usr/lib/systemd/user/kde-baloo.service; disabled; preset: enabled)
    Drop-In: /home/REDACTED/.config/system/user/kde-baloo.service.d
             └─override.conf
     Active: active (running) since Wed 2024-07-10 18:20:26 CST; 50s ago
    Process: 1441235 ExecCondition=/usr/bin/kde-systemd-start-condition --condition baloofilerc:Basic Settings:Indexing-Enabled:true (code=exited, status=0/SUCCESS)
   Main PID: 1441238 (baloo_file)
      Tasks: 3 (limit: 76867)
     Memory: 222.6M (high: 512.0M available: 289.3M peak: 223.5M)
        CPU: 20.640s
     CGroup: /user.slice/user-1000.slice/user@1000.service/background.slice/kde-baloo.service
             └─1441238 /usr/lib/kf6/baloo_file

```

However, there's also some bad news. About a day after I reverted my changes to systemd service files, baloo got stuck again.

The `balooctl6 status` hung up and didn't response.

Upon a quick diagnosis, I found that baloo hit the memory limit again.

```
$ balooctl6 -v
baloo 6.2.0

$ ps -e -o pid,comm,pmem,rss|grep baloo
  45489 baloorunner      0.5 363196
1144015 baloo_file       1.5 984736
1322906 baloo_file_extr  1.4 922908

$ systemctl status --user kde-baloo.service 
● kde-baloo.service - Baloo File Indexer Daemon
     Loaded: loaded (/usr/lib/systemd/user/kde-baloo.service; disabled; preset: enabled)
     Active: active (running) since Mon 2024-07-08 02:47:52 CST; 2 days ago
    Process: 1144013 ExecCondition=/usr/bin/kde-systemd-start-condition --condition baloofilerc:Basic Settings:Indexing-Enabled:true (code=exited, status=0/SUCCESS)
   Main PID: 1144015 (baloo_file)
      Tasks: 5 (limit: 76867)
     Memory: 584.4M (high: 512.0M available: 0B peak: 584.9M)
        CPU: 3min 49.366s
     CGroup: /user.slice/user-1000.slice/user@1000.service/background.slice/kde-baloo.service
             ├─1144015 /usr/lib/kf6/baloo_file
             └─1322906 /usr/lib/kf6/baloo_file_extractor
```

After changeing MemoryHigh to 50%, baloo quickly consumed 3GB of memory and went back to operation.

```
Memory: 1.3G (high: 31.2G available: 5.7G peak: 3.0G)
```

I assumed this might be due to baloo indexing some files that require more memory. I am trying to isolate and reproduce this issue and will update once I make progress.

PS: I have tried creating small text files, and baloo works with the default MemoryHigh settings currently. Any advice or hints to isolate this problem would be greatly welcomed.
Comment 10 tagwerk19 2024-07-11 05:18:22 UTC
(In reply to Ovear from comment #9)
>      Memory: 584.4M (high: 512.0M available: 0B peak: 584.9M)
You can see Baloo is pushing at the limit, the system has given it a bit of leeway but it not going to carry on doing it...
>      Memory: 1.3G (high: 31.2G available: 5.7G peak: 3.0G)
... and 512M really wasn't enough 8-/
    
I'm chasing up an issue elsewhere where it seems that it you have a large index, a "large delete" requires a disproportionate amount of memory. See https://bugs.kde.org/show_bug.cgi?id=380456#c29 and following discussion.

> I assumed this might be due to baloo indexing some files that require more
> memory. I am trying to isolate and reproduce this issue and will update once
> I make progress.
No doubt that's possible, there are things like .mbox files (GB's of stored emails in a single file, with encoded attachments 8-)

> PS: I have tried creating small text files, and baloo works with the default
> MemoryHigh settings currently. Any advice or hints to isolate this problem
> would be greatly welcomed.
Checking whether it's baloo_file or baloo_file_extractor that takes the memory would be an initial hint (as discovered in 380456 above)
Comment 11 Bug Janitor Service 2024-07-26 03:46:11 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 12 Bug Janitor Service 2024-08-10 03:46:45 UTC
🐛🧹 This bug has been in NEEDSINFO status with no change for at least 30 days. Closing as RESOLVED WORKSFORME.