Bug 500665

Summary:	Baloo File Extractor uses 25-40% CPU and 3 GB of RAM while indexing an HDD and slows down the whole system
Product:	[Frameworks and Libraries] frameworks-baloo	Reporter:	postix <postix>
Component:	Baloo File Daemon	Assignee:	baloo-bugs-null
Status:	REPORTED ---
Severity:	normal	CC:	oded, sergio.callegari, slava0135, tagwerk19
Priority:	NOR	Keywords:	efficiency-and-performance
Version First Reported In:	6.11.0
Target Milestone:	---
Platform:	Other
OS:	Linux
Latest Commit:		Version Fixed In:
Sentry Crash Report:
Attachments:	Screenshot of htop Flamegraph for baloo_file_extractor Screenshot mdb PostingDB get: Caller/Callee Screenshot mdb page_search_root: Caller/Callee Screenshot PositionDB caller Screenshot of heaptrack summary: Recorded shortly after starting baloo_file_extractor Screenshot of heaptrack summary: Recorded a few minutes after starting baloo_file_extractor Screenshot of heaptrack summary: Recorded 25 min from start Screenshot of heaptrack leak flamegraph: Recorded over 25 mins from start Screenshot of heaptrack memory peak flamegraph: Recorded over 25 mins from start Screenshot of heaptrack bottom up memory leak 1/2: Recorded over 25 mins from start Screenshot of heaptrack bottom up memory leak 2/2: Recorded over 25 mins from start heaptrack: consumed memory over time (25 min recording) heaptrack: allocations over time (25 min recording) heaptrack: temp allocations over time (25 min recording) heaptrack: req allocation sizes (25 min recording) heaptrack: temp allocations over time (25 min recording)

Description postix 2025-02-24 13:28:08 UTC

Created attachment 178803 [details]
Screenshot of htop

STEPS TO REPRODUCE
1. Have two a few TB HDDs with NTFS mounted under /media/foo/bar{1,2}
2. Add the mount points to the list of paths to be indexed
3. Index file names and file contents
4. Enable baloo

OBSERVED RESULT
baloo_runner starts baloo_file, which starts baloo_file_extractor

Baloo File Extractor uses 25-40% CPU and 3 GB of RAM while indexing files on the HDD.

Worse it can actually slow down the whole system, making it lag and not fun to work with. The issues are immediately gone when killing baloo_file_extractor.

`balooctl disable` does not directly stop the process but it takes a few minutes.

Therefore I've disabled baloo with `balooctl disable` for now in general.

EXPECTED RESULT
Baloo File Extractor does never significantly slow down the system.

SOFTWARE/OS VERSIONS
Operating System: openSUSE Tumbleweed 20250222
KDE Plasma Version: 6.3.1
KDE Frameworks Version: 6.11.0
Qt Version: 6.8.2
Kernel Version: 6.13.3-1-default (64-bit)
Graphics Platform: Wayland
Processors: 24 × AMD Ryzen 9 5900X 12-Core Processor
Memory: 31.2 GiB of RAM
lmdb: 0.9.30-2.3

System is installed on an NVMe using BTRFS
HDDs are two WD RED 7200 RPM with a few TB and NTFS

Comment 1 postix 2025-02-24 13:31:21 UTC

Created attachment 178804 [details]
Flamegraph for baloo_file_extractor

Recorded over 2 min and 20 seconds. 

1.438E+11 (100%) aggregated cycles costs in Baloo::WriteTransaction::commit() (libKF5BalooEngine.so.5.116.0) and below.
v
5.122E+10 (35.6%) aggregated cycles costs in Baloo::PositionDB::get(QByteArray const&) (libKF5BalooEngine.so.5.116.0) and below.
8.215E+10 (57.1%) aggregated cycles costs in Baloo::PostingDB::get(QByteArray const&) (libKF5BalooEngine.so.5.116.0) and below.
v
4.809E+10 (33.4%) aggregated cycles costs in mdb_get (liblmdb-0.9.30.so) and below.
8.138E+10 (56.6%) aggregated cycles costs in mdb_get (liblmdb-0.9.30.so) and below.
v
4.808E+10 (33.4%) aggregated cycles costs in mdb_cursor_set (liblmdb-0.9.30.so) and below.
8.138E+10 (56.6%) aggregated cycles costs in mdb_cursor_set (liblmdb-0.9.30.so) and below.
v
4.787E+10 (33.3%) aggregated cycles costs in mdb_page_search_root (liblmdb-0.9.30.so) and below.
8.111E+10 (56.4%) aggregated cycles costs in mdb_page_search_root (liblmdb-0.9.30.so) and below.

Comment 2 postix 2025-02-24 13:32:30 UTC

Created attachment 178805 [details]
Screenshot mdb PostingDB get: Caller/Callee

Comment 3 postix 2025-02-24 13:39:32 UTC

Created attachment 178806 [details]
Screenshot mdb page_search_root: Caller/Callee

The costly line in code is the following: 
https://github.com/LMDB/lmdb/blob/ce201088de95d26fc0da36ba805bf2ddc2ba74ff/libraries/liblmdb/mdb.c#L5530

> while (IS_BRANCH(mp)) {

Comment 4 postix 2025-02-24 13:44:10 UTC

Created attachment 178807 [details]
Screenshot PositionDB caller

The costly line is https://invent.kde.org/frameworks/baloo/-/blob/master/src/engine/positiondb.cpp?ref_type=heads#L83
> int rc = mdb_get(m_txn, m_dbi, &key, &val);

Comment 5 postix 2025-02-24 13:45:10 UTC

To comment 2: The costly line is https://invent.kde.org/frameworks/baloo/-/blob/master/src/engine/postingdb.cpp?ref_type=heads#L81
> int rc = mdb_get(m_txn, m_dbi, &key, &val);

Comment 6 postix 2025-02-24 14:47:50 UTC

Regarding https://bugs.kde.org/show_bug.cgi?id=334325#c3

> cat /sys/block/sd{a,b}/queue/scheduler 
> none mq-deadline kyber [bfq]

Comment 7 postix 2025-02-24 15:10:47 UTC

https://invent.kde.org/frameworks/baloo/-/issues/7
> With the PID, run chrt -p <PID> to check the current policy.
```
chrt -p (pidof baloo_file)
pid 21116's current scheduling policy: SCHED_BATCH
pid 21116's current scheduling priority: 0
```

> CFS is represented by SCHED_NORMAL. - Policy Adjustment (if necessary):
```
sudo chrt -o -p 0 (pidof baloo_file)
chrt -p (pidof baloo_file)
pid 21116's current scheduling policy: SCHED_OTHER
pid 21116's current scheduling priority: 0
```

```
chrt -p (pidof baloo_file_extractor)
pid 21300's current scheduling policy: SCHED_IDLE
pid 21300's current scheduling priority: 0
```

At this point it still lags most of the time! 

----------------------------------------------------------------------------------

https://doc.opensuse.org/documentation/leap/archive/42.3/tuning/html/book.sle.tuning/cha.tuning.taskscheduler.html

```
Get policy:
 chrt [options] -p <pid>

Policy options:
 -b, --batch          set policy to SCHED_BATCH
 -d, --deadline       set policy to SCHED_DEADLINE
 -f, --fifo           set policy to SCHED_FIFO
 -i, --idle           set policy to SCHED_IDLE
 -o, --other          set policy to SCHED_OTHER
 -r, --rr             set policy to SCHED_RR (default)
```

```
chrt -m
SCHED_OTHER min/max priority    : 0/0
SCHED_FIFO min/max priority     : 1/99
SCHED_RR min/max priority       : 1/99
SCHED_BATCH min/max priority    : 0/0
SCHED_IDLE min/max priority     : 0/0
SCHED_DEADLINE min/max priority : 0/0
```

Comment 8 postix 2025-02-24 15:13:04 UTC

^ 

:facepalm: 
> Was this written by ChatGPT?
https://invent.kde.org/frameworks/baloo/-/issues/7#note_1078251

Comment 9 Alberto Salvia Novella 2025-02-24 15:59:36 UTC

I  won't use that bug report for reference. Nothing written there by the AI makes sense.

It doesn't make sense changing scheduler, restricting cores, or whatever.

No matter how smart an AI is, it has a big limitation: it is not context aware.

It can't really contrast what it is doing is actually true. Observing the real thing is the greatest source of wisdom.

Comment 10 postix 2025-02-24 16:33:16 UTC

Created attachment 178814 [details]
Screenshot of heaptrack summary: Recorded shortly after starting baloo_file_extractor

`balooctl enable`
Once the baloo_file_extractor process came up I've started recording 
`heaptrack -p (pidof baloo_file_extractor)`
for > minute, then stopped by ctrl+c. You can see the summary above.

Interestingly it says all memory - 1.1 GB - is leaked. Maybe I'm using the tool wrong?

Comment 11 postix 2025-02-24 16:39:35 UTC

Created attachment 178815 [details]
Screenshot of heaptrack summary: Recorded a few minutes after starting baloo_file_extractor

Recorded again a few minutes later while baloo_file_extractor was running all the time. Now it reports ~ 100 MB memory leaked. I'm not sure how to correctly interpret it.

Comment 12 postix 2025-02-24 16:40:30 UTC

> I  won't use that bug report for reference. Nothing written there by the AI makes sense.
No I won't, I blindly had expected to only find useful information on invent and didn't expect a troll. How embarrassing.

Comment 13 postix 2025-02-24 17:24:22 UTC

Created attachment 178820 [details]
Screenshot of heaptrack summary: Recorded 25 min from start

I've started the recording right after baloo_file_extractor came up and stopped recording 25 minutes later by ctrl+c.

I've tried to stop baloo by `balooctl disable` and `balooctl suspend`, however, baloo_file_extractor process still keeps running now 40 minutes after starting it.

Comment 14 postix 2025-02-24 17:25:09 UTC

Created attachment 178821 [details]
Screenshot of heaptrack leak flamegraph: Recorded over 25 mins from start

Comment 15 postix 2025-02-24 17:25:33 UTC

Created attachment 178822 [details]
Screenshot of heaptrack memory peak flamegraph: Recorded over 25 mins from start

Comment 16 postix 2025-02-24 17:25:57 UTC

Created attachment 178823 [details]
Screenshot of heaptrack bottom up memory leak 1/2: Recorded over 25 mins from start

Comment 17 postix 2025-02-24 17:26:13 UTC

Created attachment 178824 [details]
Screenshot of heaptrack bottom up memory leak 2/2: Recorded over 25 mins from start

Comment 18 postix 2025-02-24 17:29:38 UTC

I don't understand why there's anything swapped after all while the system has sufficient memory left.
Swap is currently 3.8 GB, peak RSS was 4.2 GB, the system has 32 GB installed.

Comment 19 postix 2025-02-24 17:31:41 UTC

Created attachment 178825 [details]
heaptrack: consumed memory over time (25 min recording)

Comment 20 postix 2025-02-24 17:32:00 UTC

Created attachment 178826 [details]
heaptrack: allocations over time (25 min recording)

Comment 21 postix 2025-02-24 17:32:15 UTC

Created attachment 178827 [details]
heaptrack: temp allocations over time (25 min recording)

Comment 22 postix 2025-02-24 17:32:32 UTC

Created attachment 178828 [details]
heaptrack: req allocation sizes (25 min recording)

Comment 23 postix 2025-02-24 17:33:57 UTC

Created attachment 178829 [details]
heaptrack: temp allocations over time (25 min recording)

Comment 24 postix 2025-02-24 19:05:59 UTC

I've changed the setting to indexing filenames only. Same picture. 

In htop one can observe a constant switching between the process state's R(unning) and D(isk Sleep).

Comment 25 tagwerk19 2025-02-24 19:36:57 UTC

(In reply to postix from comment #0)
> System is installed on an NVMe using BTRFS
> HDDs are two WD RED 7200 RPM with a few TB and NTFS
Let's get a few things sorted first...

Are you using the Paragon NTFS drivers or the older FUSE ntfs-3g drivers?

If you search for one of the files on your NTFS drives with "balooctl -i one-of-your-files.txt", possibly balooctl6, do you get just the one hit?

Look at the details of the file with "stat one-of-your-files.txt" and note down the Device: line (it will contain the Device numbers and Inode...)

Have a look as see whether Baloo is running under systemd, specifically the systemd unit file that limits RAM useage to 500 MB. You can see with "systemctl status --user kde-baloo" and you want to look at the "Memory:" line.

Are you watching the content indexing "as it happens" with a "balooctl monitor" in a separate window, again balooctl may be balooctl6

Comment 26 postix 2025-02-24 20:07:41 UTC

> Are you using the Paragon NTFS drivers or the older FUSE ntfs-3g drivers?

mount says
/dev/sda1 on /media/foo type fuseblk (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096)
/dev/sdb1 on /media/bar type fuseblk (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096)

ntfs-3g is installed and if I see it correctly openSUSE does not offer the newer Paragon NTFS driver


> If you search for one of the files on your NTFS drives with "balooctl -i one-of-your-files.txt", possibly balooctl6, do you get just the one hit?

For a few files I've tried I only got a single hit with `baloosearch -i myfile`

> Have a look as see whether Baloo is running under systemd, specifically the systemd unit file that limits RAM useage to 500 MB. You can see with "systemctl status --user kde-baloo" and you want to look at the "Memory:" line.

There's no memory line for me.

> Are you watching the content indexing "as it happens" with a "balooctl monitor" in a separate window, again balooctl may be balooctl6

Nope, I am not.

Comment 27 postix 2025-02-24 20:10:55 UTC

```
> systemctl --user status kde-baloo.service
○ kde-baloo.service - Baloo File Indexer Daemon
     Loaded: loaded (/usr/lib/systemd/user/kde-baloo.service; disabled; preset: disabled)
     Active: inactive (dead) (Result: exec-condition) since Mon 2025-02-24 20:09:04 CET; 59min ago
 Invocation: a28e72acaa4f44f2970055ad21f16e51
  Condition: start condition unmet at Mon 2025-02-24 20:09:04 CET; 59min ago
    Process: 2511 ExecCondition=/usr/bin/kde-systemd-start-condition --condition baloofilerc:Basic Settings:Indexing-Enabled:true (code=exited, status=1/FAILURE)
        CPU: 6ms

systemd[2435]: Starting Baloo File Indexer Daemon...
systemd[2435]: kde-baloo.service: Skipped due to 'exec-condition'.
systemd[2435]: Condition check resulted in Baloo File Indexer Daemon being skipped.
```

Comment 28 tagwerk19 2025-02-24 20:52:41 UTC

(In reply to postix from comment #26)
> ... ntfs-3g is installed and if I see it correctly openSUSE does not offer the newer Paragon NTFS driver ...
If I remember right, the FUSE drivers don't give "stable" device numbers (that stay the same from reboot to reboot) when they mount the drive. I'm not 100% sure whether this relates to some or all FUSE drivers 

You would see whether this is the case or not with the "stat one-of-your-files.txt" check. Keep an eye on the Device: line (it will contain the Device numbers and Inode...) and see if you get different values each time you reboot.

I think the Paragon drivers arrived in the Kernel with 5.15 so they should be there for you.

> For a few files I've tried I only got a single hit with `baloosearch -i myfile`
That's good...

It may be that Baloo is able to find the Filesystem ID, even through a FUSE mount and that is stable - as it is hoped to be.

> > Are you watching the content indexing "as it happens" with a "balooctl monitor" in a separate window, again balooctl may be balooctl6
> 
> Nope, I am not.
Might be worth looking... Normally, you would see files being listed in a batch of 40, then another batch, and another.  If you get to a file "where things stop", that's suspicious...

There are files that that overload Baloo, I've met scientific plots in a PDF that required massive work to unpack and sort through for the (vanishing small amount of) plain text. On the other end of the spectrum, if you have a multi gigabyte .mbox file, Baloo will try to index the contents....

Comment 29 tagwerk19 2025-02-24 21:05:49 UTC

(In reply to postix from comment #27)
>  Process: 2511 ExecCondition=/usr/bin/kde-systemd-start-condition --condition baloofilerc:Basic Settings:Indexing-Enabled:true (code=exited, status=1/FAILURE)
Looks funny....

> systemd[2435]: kde-baloo.service: Skipped due to 'exec-condition'.
Is Baloo actually running? If it is, it is running outside of the systemd limits.

What does "balooctl status" say?

Do you have section in your ~/.config/baloofilerc looking like:

    [Basic Settings]
    Indexing-Enabled=true

    ....

Comment 30 postix 2025-02-25 12:17:58 UTC

> Is Baloo actually running? If it is, it is running outside of the systemd limits.

I've had disabled baloo with `balooctl disable`, rebooted, checked 
> ~/.config/baloofilerc
it says 
> Indexing-Enabled=false

Then run `balooctl enable`:

```
balooctl monitor
Waiting for file indexer to start
Press Ctrl+C to stop monitoring
File indexer is running
Starting
Checking for stale index entries
Indexing file content
: Ok
Indexing: /media/foo/.thunderbird/$profile/Mail/Local Folders/Lokales Archiv.sbd/foo/bar/: Ok
: Ok
: Ok
(...)
```

```
balooctl status
Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 4,266,801
Files waiting for content indexing: 1,087,769
Files failed to index: 0
Current size of index is 57.33 GiB
```

```
tail -f ~/.config/baloofilerc 
Indexing-Enabled=false
```
doesn't change.  The systemd service seems not no be started when invoking balooctl enable.

Comment 31 postix 2025-02-25 12:24:31 UTC

So I've changed the entry in baloofilerc to meet the systemd script start condition, enabled the service and rebooted:

```
> systemctl --user status kde-baloo.service
● kde-baloo.service - Baloo File Indexer Daemon
     Loaded: loaded (/usr/lib/systemd/user/kde-baloo.service; enabled; preset: disabled)
     Active: active (running) since Tue 2025-02-25 13:22:23 CET; 14s ago
 Invocation: 0cc65a24c82f47f2ac68e505c84f9945
    Process: 2516 ExecCondition=/usr/bin/kde-systemd-start-condition --condition baloofilerc:Basic Settings:Indexing-Enabled:true (code=exited, status=0/SUCCESS)
   Main PID: 2519 (baloo_file)
      Tasks: 3 (limit: 38234)
        CPU: 328ms
     CGroup: /user.slice/user-1000.slice/user@1000.service/background.slice/kde-baloo.service
             └─2519 /usr/libexec/baloo_file

systemd[2436]: Starting Baloo File Indexer Daemon...
systemd[2436]: Started Baloo File Indexer Daemon.
baloo_file[2519]: QDBusConnection: name 'org.freedesktop.UDisks2' had owner '' but we thought it was ':1.31'
baloo_file[2519]: QDBusConnection: name 'org.freedesktop.UPower' had owner '' but we thought it was ':1.35'
```

Comment 32 postix 2025-02-25 12:27:55 UTC

> /usr/libexec/baloo_file_extractor
has come up now and is visible in sytemd status

I see 8 lines with 
> baloo_file_extractor[3629]: Invalid encoding. Ignoring "/media/foo/bar/"

balooctl status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 4,266,801
Files waiting for content indexing: 1,087,769
Files failed to index: 0
Current size of index is 57.33 GiB

Comment 33 postix 2025-02-25 12:29:11 UTC

> I see 8 lines with 
Sorry that I can't edit posts here. 8 lines corresponding to different files.

Comment 34 postix 2025-02-25 13:05:23 UTC

26 minutes after starting the indexing I see

> baloo_file[2519]: kf.baloo: KDE Baloo File Indexer has reached the inotify folder watch limit. File changes will be ignored.
in the systemd service status


```
balooctl monitor
Press ctrl+c to stop monitoring
File indexer is running
Indexing file content
```
gives no hint.

Comment 35 postix 2025-02-25 13:07:12 UTC

Interestingly 

> balooctl status
Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 4,266,801
Files waiting for content indexing: 1,087,769
Files failed to index: 0
Current size of index is 57.33 GiB


says Indexing file content, while I had changed the option to only indexing file names (including for hidden files and folders) in the SystemSettings' kcm two days ago!

Comment 36 postix 2025-02-25 13:10:47 UTC

On Fedora 41 I see the Memory: line in status, while here on openSUSE Tumbleweed I see it not, though the service file contains an entry:

```
 cat /usr/lib/systemd/user/kde-baloo.service
[Unit]
Description=Baloo File Indexer Daemon
PartOf=graphical-session.target

[Service]
ExecStart=/usr/libexec/baloo_file
BusName=org.kde.baloo
Slice=background.slice
ExecCondition=/usr/bin/kde-systemd-start-condition --condition "baloofilerc:Basic Settings:Indexing-Enabled:true"
# We'll basically only want to consume resources if they aren't needed anywhere else, hence weights are way low.
CPUWeight=1
IOWeight=1
# Memory should comfortably fit the binary itself, any extra data loaded ought to be subject to extreme constraints
# though - so as to avoid OOM conditions caused by baloo.
MemoryHigh=512M

[Install]
WantedBy=graphical-session.target
```

Comment 37 postix 2025-02-25 13:24:08 UTC

This looks actually fine to me:

```
cat ~/.config/baloofilerc
[Basic Settings]
Indexing-Enabled=true

[General]
dbVersion=2
exclude filters=*~,*.part,*.o,*.la,*.lo,*.loT,*.moc,moc_*.cpp,qrc_*.cpp,ui_*.h,cmake_install.cmake,CMakeCache.txt,CTestTestfile.cmake,libtool,config.status,confdefs.h,autom4te,conftest,confstat,Makefile.am,*.gcode,.ninja_deps,.ninja_log,build.ninja,*.csproj,*.m4,*.rej,*.gmo,*.pc,*.omf,*.aux,*.tmp,*.po,*.vm*,*.nvram,*.rcore,*.swp,*.swap,lzo,litmain.sh,*.orig,.histfile.*,.xsession-errors*,*.map,*.so,*.a,*.db,*.qrc,*.ini,*.init,*.img,*.vdi,*.vbox*,vbox.log,*.qcow2,*.vmdk,*.vhd,*.vhdx,*.sql,*.sql.gz,*.ytdl,*.class,*.pyc,*.pyo,*.elc,*.qmlc,*.jsc,*.fastq,*.fq,*.gb,*.fasta,*.fna,*.gbff,*.faa,po,CVS,.svn,.git,_darcs,.bzr,.hg,CMakeFiles,CMakeTmp,CMakeTmpQmake,.moc,.obj,.pch,.uic,.npm,.yarn,.yarn-cache,__pycache__,node_modules,node_packages,nbproject,core-dumps,lost+found
exclude filters version=8
exclude folders[$e]=$HOME/.cache/,$HOME/.local/share/containers/,$HOME/Development/
folders[$e]=$HOME/,/media/bar/,/media/foo/
index hidden folders=true
only basic indexing=true
```

Comment 38 postix 2025-02-25 13:47:59 UTC

After running 1h 22 minutes and having baloo monitor opened ever since it shows me
* 40 lines in total, out of those
* 25 starting with Indexing: /media/foo/$fileXYZ :Ok
* 15 just ": Ok"

What do these ": Ok" lines mean?

Comment 39 postix 2025-02-25 13:53:09 UTC

```
balooctl indexSize
File Size: 57.46 GiB
Used:      2.66 GiB

    PostingDB:                         3.72 GiB   140.075 %
    PositionDB:                   302.12 MiB    11.099 %
    DocTerms:                          1.43 GiB    53.910 %
    DocFilenameTerms:    327.26 MiB    12.023 %
    DocXattrTerms:                  4.00 KiB      0.000 %
    IdTree:                              69.89 MiB      2.568 %
    IdFileName:                   373.32 MiB    13.715 %
    DocTime:                        188.30 MiB      6.918 %
    DocData:                        179.86 MiB      6.608 %
    ContentIndexingDB:     34.59 MiB      1.271 %
    FailedIdsDB:                               0 B       0.000 %
    MTimeDB:                        62.39 MiB     2.292 %
```

3.72 / 2.66 ~ 140%. Is that correct or expected? Is there something unusual?

Comment 40 postix 2025-02-25 13:54:17 UTC

```
balooctl config show contentIndexing
no
```

contradicting balooctl status

Comment 41 postix 2025-02-25 13:56:04 UTC

`systemctl --user stop kde-baloo.service` has stopped baloo_file and baloo_file_indexer immediately at least now. :)

Comment 42 postix 2025-02-25 14:37:46 UTC

I've noticed that the ntfs3 driver was blacklisted, I've loaded it now with modprob and remounted the drives:

The high cpu/ram usage stays but the system seems to feel snappier now. Unfortunately it's hard to objectively benchmark.

Comment 43 postix 2025-02-25 14:40:35 UTC

> The high cpu/ram usage stays but the system seems to feel snappier now. Unfortunately it's hard to objectively benchmark.

but definitely not as snappy as when baloo_file_extractor is not running. It still lags a bit from time to time.

Comment 44 postix 2025-02-25 16:32:55 UTC

So, I let the system run and index for an hour: it uses now 8.3 GB  RES RAM and 100% CPU, the fan is spinning, but in the moment it lags much less. Not sure if I should call it resolved as it had lagged shortly after starting the indexing? 

-------

> balooctl indexSize
File Size: 61.38 GiB
Used:      2.92 GiB

PostingDB:       3.79 GiB   129.721 %
PositionDB:     410.92 MiB    13.728 %
DocTerms:       1.52 GiB    52.108 %
DocFilenameTerms:     327.26 MiB    10.933 %
DocXattrTerms:       4.00 KiB     0.000 %
IdTree:      69.90 MiB     2.335 %
IdFileName:     373.34 MiB    12.472 %
DocTime:     188.30 MiB     6.291 %
DocData:     179.93 MiB     6.011 %
ContentIndexingDB:      34.50 MiB     1.153 %
FailedIdsDB:            0 B     0.000 %
MTimeDB:      62.39 MiB     2.084 %

Comment 45 Alberto Salvia Novella 2025-02-25 21:27:59 UTC

I won't post all the details of your investigation.

Instead complete it, then post only the info that is relevant for someone fixing it.

Comment 46 tagwerk19 2025-02-25 22:37:38 UTC

That's a lot of information to sort through...

I'm not going to be able get to the bottom of each of the points. There a handful of good things, a few questions and some bits of advice though.

You are running with the ntfs3 driver now (not the ntfs-3g) and the system's working better. That is good.

Your "balooctl status" says you have 4,366,8011 files, could that be true?

You have an index size of 57.33 or 61.38 GB, which is wildly large. Your "balooctl indexSize" says 2.92 GB actually used. That should not be...

I see on my system, if I start Baloo with a "balooctl enable" (maybe you have to give the command twice, it might not immediately start), then you are just running the binary. If you start it with "systemctl start --user kde-baloo", you are running it "within" the limits on memory; it cannot use more than 512 MB of RAM. You can check with "systemctl status --user kde-baloo" whether Baloo is running within the systemd limits.

On my system it looks as if, if you do a "balooctl enable" and then try a "systemctl start --user kde-baloo", you get that "not able to start" failure that you saw....

As a heads up... If you have such a large index you will notice performance problems if you run "balooctl enable". The systemd limits prevent that although maybe the 512MB is too small. We might come back to that later.

I think you have to take a step back, swallow, purge the index and start again.

You've got the ntfs3 kernel (rather than FUSE) drivers so you should be faster. Make sure you are running under the systemd memory limits. These should stop Baloo taking too much RAM and affecting performance. Keep a look out for multiple hits for a file, the "baloosearch -i one-of-your-files.txt", that will be a clear warning that something is wrong. Check whether you really have over 4 million files you want to index, maybe you have 2 million and they are counted twice (that would be another pointer to something wrong). It's also probably wrong to run and repeatedly rerun "balooctl status" in order to watch how the number of indexed files is changing, there's a bug about how this makes the index size grow (horribly grow). I'll see if I can find it...

I'm wondering about the iNotify limit warning, that's something that used to cause trouble but has not recently. We can come back to that later as well.

Final snippet of info, if you are getting "balooctl monitor" showing lines with a plain "OK" and no filename, don't worry. Baloo has looked at the file and it is excluded on account of its Mime type. Yes, this diagnostic could be improved...

Comment 47 tagwerk19 2025-02-26 07:56:34 UTC

> Indexing: /media/foo/.thunderbird/$profile/Mail/Local Folders/Lokales
> Archiv.sbd/foo/bar/: Ok
You are indexing hidden files/folders... It would be best exclude .thunderbird as well as .mozilla, .cache, .local/share/Trash. Maybe also .var/apps. You might also do well to exclude application/mbox as a mime type:
    
    $ balooctl config add excludeMimetypes application/mbox

It's a subtype to text/plain so Baloo thinks it can index it - mbox files can be very large and evry time they grow, Baloo wants to index them again.

Comment 48 postix 2025-02-26 11:42:33 UTC

(In reply to tagwerk19 from comment #46)
> That's a lot of information to sort through...

> That's a lot of information to sort through...

Unfortunately posting larger sequential updates on Bugzilla makes the whole issue not very clear, I wish I had structured my postings differently.
Anyway, a short summary of my findings:


* NTFS-3G and NTFS3 drivers role is not clear year.
  While with NTFS-3G the system responsiveness was often very bad, it seemed that with NTFS3 this was not the case.
  However this could have been a pure coincidence when testing, as with the NTFS-3G driver the system runs fine in the moment during indexing. 

* CPU usage is around 100% for both drivers currently. The point above and CPU usage could be rather affected by what is currently indexed.

* RAM usage grows for both drivers over time. It seemed RES ram consumption grows faster in the case of NTFS3, but this needs to be re-checked again.

* Somehow MemoryHigh [1] seems to not be in effect, despite being defined in the systemd service file. This can be seen as the RAM usage grows above 512 MB and there's no memory line in systemctl --user status kde-baloo.service,
  while  the service is active (running). This needs to be investigated.

* A recent heaptrack showed only a marginal memory leak of 45 MB in mdb_page_malloc, which could also be a false positive. Peak heap memory consumption was 230 MB and peak RSS 2.2 GB. This time kde-baloo.serice was gracefully stopped before detaching heaptrack:
  systemctl --user start kde-baloo.service
  heaptrack -p (pidof baloo_file_extractor) # wait 8 Minutes
  systemctl --user stop kde-baloo.service

[1] https://www.freedesktop.org/software/systemd/man/latest/systemd.resource-control.html#MemoryHigh=bytes

Comment 49 postix 2025-02-26 12:15:51 UTC

* balooctl status claims "Indexer state: Indexing file content", while it should only be indexing file names, as this is set both in the baloofilerc and in the kcm.
Likely this error comes from the fact that initially, when I decided to enable baloo and started with a fresh db after purging the old one, I had checked the option to also index file contents and only later changed the option to index file names only.

* Since using the systemd service, the pausing and starting the baloo file indexer in the kcm works immediately.

> * RAM usage grows for both drivers over time. It seemed RES ram consumption grows faster in the case of NTFS3, but this needs to be re-checked again.
It now grows over 6 GB already in case of NTFS-3G. This seems to be driver independent as well. Interestingly only 9 MB are swapped currently. Of course I cannot fully rule out that the swapping was caused by any process other than baloo_file_extractor, but it coincided very well.

> Keep a look out for multiple hits for a file, the "baloosearch -i one-of-your-files.txt", that will be a clear warning that something is wrong.
`for x in {a..z}; do baloosearch -s auto -i "$x" | cut -d' ' -f2- | uniq -d; done` finds no duplicates. So there are likely none.

> Your "balooctl status" says you have 4,366,8011 files, could that be true?
Yes, lot's of measurement data etc.

> I'm wondering about the iNotify limit warning, that's something that used to cause trouble but has not recently. We can come back to that later as well. Just saw that today again with the ntfs-3g driver, but it doesn't seem to cause responsiveness issues. 

* 2nd summary: There doesn't seem to be a difference between both ntfs drivers. So much data points almost for nothing. At least we likely can rule out a few things and with the systemd service in work the starting / stopping works reliably finally.


> I think you have to take a step back, swallow, purge the index and start again.
I will do so a last time and this time I will only index file names right from the start. (It would of course be great if this option could be made directory dependent so that for slow drives content is not indexed but only for fast storage.)

Comment 50 postix 2025-02-26 13:51:32 UTC

systemd-analyze --user log-level debug

I see in journalctl
> Failed to establish memory pressure event source, ignoring: Operation not supported
> Failed to connect to /run/systemd/oom/io.systemd.ManagedOOM: File or folder not found
Just don't know the cause yet, especially of the first line. 

This seems to be the reason that 
> ls /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/background.slice/kde-baloo.service
shows no memory entries

Comment 51 tagwerk19 2025-02-26 19:37:01 UTC

I've spun up a Tumbleweed machine - and I agree ...

    ... The memory is not there in the "systemctl status --user kde-baloo".

I know that for memory limits to work, the MemoryHigh=512MB in the unit file, memory accounting needs to be enabled. It looks as if Tumbleweed has it disabled in:

    /usr/lib/systemd/system.conf.d/20-defaults-SUSE.conf

This disables by default, I'm not sure that implies the setting can be overridden (and where). I've not been successful here.

So, again. Every day a learning day....

Comment 52 tagwerk19 2025-02-26 20:09:29 UTC

Once you get the "Memory" line listed, you can try overriding the default limits.

   $ systemctl edit --user kde-baloo

You then be editing an override file, add the lines:

    [Service]
    MemoryHigh=40%

What you choose will be a bit of guesswork. Given the number of files you want to index, I'd rate 512MB as too low. You don't want Baloo squeezing out the rest of the system so you need some limit.

I have also tried putting a "MemorySwapMax=0" in the unit file, in cases where I've allowed Baloo a good amount of RAM. The logic being, if Baloo starts swapping (and these would be writing dirty pages to disk), you are in deep trouble....

Comment 53 tagwerk19 2025-02-27 07:39:31 UTC

For the impact of running "Balooctl status" while Baloo is indexing (or deleting), have a look at:
    https://bugs.kde.org/show_bug.cgi?id=437754

Looking at the iNotify watches error, it seems that Tumbleweed has a low default for the number of watches. It has been quite a while since this has been a problem, earlier the defaults were very low but I think they now adjust automatically. There was a reference here:
    https://bugs.kde.org/show_bug.cgi?id=454952#c4

I don't know why Tumbleweed might still have a small limit for watches but I've found:
    https://www.suse.com/support/kb/doc/?id=000020048
so suggest editing the file:
    /etc/sysctl.conf
and adding the line:
    fs.inotify.max_user_watches=524288
You will need something large...

Comment 54 postix 2025-02-27 16:54:57 UTC

> This disables by default, I'm not sure that implies the setting can be overridden (and where). I've not been successful here.
Me neither.

https://doc.opensuse.org/documentation/leap/archive/15.2/tuning/html/book-sle-tuning/cha-tuning-cgroups.html#sec-tuning-cgroups-accounting says
> This setting is available only if the unified control group hierarchy is used, and disables MemoryLimit=. To enable the unified control group hierarchy, append systemd.unified_cgroup_hierarchy=1 as a kernel command line parameter to the GRUB 2 boot loader. Refer to Book “Reference”, Chapter 12 “The Boot Loader GRUB 2” for more details about configuring GRUB 2. 
but it hasn't made a difference. I will need to ask in the openSUSE support chat.

> o suggest editing the file:
>     /etc/sysctl.conf
> and adding the line:
>     fs.inotify.max_user_watches=524288
Thanks, I've adjusted it now.

Thanks again so far for your help and time!

Comment 55 tagwerk19 2025-02-27 17:28:23 UTC

I was able to create a
    /usr/lib/systemd/system.conf.d/70-Enable-Memory-accounting.conf
file containing:
    [Manager]
    DefaultMemoryAccounting=yes
That overrode the setting in 20-defaults-SUSE.conf

As far as I can tell, all OK on my test system

Comment 56 Oded Arbel 2025-04-15 18:38:06 UTC

I think I have the same issue - baloo_file takes ~300M of RAM and ~50% of a CPU, and htop shows it in D status all the time, while balooctrl6 monitor says "Idle (Powersave)" and shows no traffic.

After restart of the systemd kde-baloo.service user service, the problem has disappeared (for now). I'll add more info when it happens again.

The kde-baloo.service on my system (Neon) has a limit of 512MB, which seems quite high to me - that's a significant chunk of memory for a background service that is supposed to be idle most of the time.

Operating System: KDE neon 6.3
KDE Plasma Version: 6.3.4
KDE Frameworks Version: 6.13.0
Qt Version: 6.8.3
Kernel Version: 6.11.0-21-generic (64-bit)
Graphics Platform: Wayland
Processors: 20 × 12th Gen Intel® Core™ i7-12700H
Memory: 31.0 GiB of RAM
Graphics Processor: Intel® Graphics

Comment 57 tagwerk19 2025-04-17 17:08:31 UTC

(In reply to Oded Arbel from comment #56)
> I think I have the same issue - baloo_file takes ~300M of RAM and ~50% of a
> CPU, and htop shows it in D status all the time, while balooctrl6 monitor says
> "Idle (Powersave)" and shows no traffic.
It's possible it's clearing the records for files that have been deleted. You wouldn't see that in the monitor, you might see baloo_file (not baloo_file_extractor) busy with htop or iotop

> The kde-baloo.service on my system (Neon) has a limit of 512MB, which seems
> quite high to me - that's a significant chunk of memory for a background
> service that is supposed to be idle most of the time.
99% of the time 512MB is a good limit.

Think that that's the memory that Baloo uses as "cache", quite possibly used by clean pages that get dropped when something else needs the space. This memory is shared between the always-running baloo_file and the when-there-are-things-to-index baloo_file_extractor.

The 1% of the time. 512MB is constricting. If you are indexing a lot and are building a big transaction, you've got dirty pages that cannot just be dropped. Maybe they get swapped (and you *really* don't want that). If you are in that 1% territory, you see Baloo slowing as systemd(?) delays giving it extra memory - Baloo spends its time reading, dropping, rereading, dropping and re-rereading clean pages that it needs. I've not kept an list of bugs that could very well be the result of the constraint but there have been a slow and steady stream. In general you suggest increasing the cap and the issue goes quiet, reasonable to assume that people are happy.

I think a 25% maximum is reasonable, a 40% maximum (and zero swap) is also possible. From my experience neither of these impact system performance even with pathological test cases.

Comment 58 Oded Arbel 2025-04-17 21:37:46 UTC

(In reply to tagwerk19 from comment #57)
> (In reply to Oded Arbel from comment #56)
> > I think I have the same issue - baloo_file takes ~300M of RAM and ~50% of a
> > CPU, and htop shows it in D status all the time, while balooctrl6 monitor says
> > "Idle (Powersave)" and shows no traffic.

> It's possible it's clearing the records for files that have been deleted.
> You wouldn't see that in the monitor, you might see baloo_file (not
> baloo_file_extractor) busy with htop or iotop

As I mentioned - that is what I'm seeing in htop: baloo_file eating RAM and doing IO blocking while pegging the CPU in a high power state.

> Think that that's the memory that Baloo uses as "cache", quite possibly used
> by clean pages that get dropped when something else needs the space.

htop lists the ~400M RAM as RES, which - AFAIK - means less RAM for other things. If you mean "dropped" as in goes to swap, then that's no good - I'm running without swap because of a small disk (or I set up a few GB swap file when I really really need the extra VM because I'm running a large workload). Swap VM is not free.

> The
> memory is shared between the always-running baloo_file and the
> when-there-are-things-to-index baloo_file_extractor.
> 
> The 1% of the time. 512MB is constricting. 

Not enough - this is how my system currently look:

---8<---
● kde-baloo.service - Baloo File Indexer Daemon
     Loaded: loaded (/usr/lib/systemd/user/kde-baloo.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-04-16 18:19:23 IDT; 1 day 6h ago
    Process: 5471 ExecCondition=/usr/bin/kde-systemd-start-condition --condition baloofilerc:Basic Settings:Indexing-Enabled:true (code=exited, status=0/SUCCESS)
   Main PID: 5487 (baloo_file)
      Tasks: 7 (limit: 37870)
     Memory: 638.2M (high: 512.0M available: 0B peak: 656.2M)
        CPU: 8min 47.429s
     CGroup: /user.slice/user-1000.slice/user@1000.service/background.slice/kde-baloo.service
             ├─5487 /usr/lib/x86_64-linux-gnu/libexec/kf6/baloo_file
             └─6621 /usr/lib/x86_64-linux-gnu/libexec/kf6/baloo_file_extractor

Apr 16 22:30:26 vesho baloo_file_extractor[6621]: kf.baloo: Not busy, fast indexing
... many of those log lines
---8<---

Its 638MB out of 512MB (!!). It's mostly baloo_file_extractor, with baloo_file taking only about 100MB. monitor still says "Idle" and status says:

Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 5,827,389
Files waiting for content indexing: 4,051
Files failed to index: 68
Current size of index is 7.85 GiB
 
> I think a 25% maximum is reasonable, a 40% maximum (and zero swap) is also
> possible. 

40% of system RAM for the background file indexer? That seems atrocious to me. When I run a development workflow I need all the RAM that I can get, to the point that I clear trashes and caches and set up a swap file. Spending 25% of my RAM on a background file indexer - Idling -`seems insane to me.

But my main problem - which I think is the OPs as well - is that baloo_file (not extractor) sometime pegs the CPU. Keeping the CPU in a high power state is not free: it costs battery life and available time off AC, not to mention heating and fans.

I like the features I get from baloo, but if I need to manually handle the state of the background indexer (turning it off when I leave my desk, or need to start a large workload), then there is a problem.

Comment 59 tagwerk19 2025-04-18 08:55:32 UTC

(In reply to Oded Arbel from comment #58)
> Memory: 638.2M (high: 512.0M available: 0B peak: 656.2M)
That makes it pretty clear...

Baloo has been trying *long* and *hard* to get extra memory. The limit is soft limit (MemoryHigh rather than MemoryMax) so the system does begrudgingly allocate more but with steadily increasing delay. I've not seen Baloo ever manage to get hold of 638M RAM!

I think in this case, you will need to override the limit, try "systemctl --user edit kde-baloo" as per Bug 502641

> ... It's mostly baloo_file_extractor, with baloo_file
> taking only about 100MB. monitor still says "Idle" and status says ...
That bit I don't understand - but I'm not sure how the monitor checks whether the extractor running.

> 40% of system RAM for the background file indexer? That seems atrocious to me.
You are seeing an edge case I think. If you get baloo_file_extractor to finish, it will release the memory it's using. You are trading space for time - or the Baloo design is.

You have 5 million files, try asking for "baloosearch the | wc" and see how many hits you get and long it takes (and worth trying it more than once) and compare to your favourite grep (again try it more than once).
    
There is an issue with baloo_file when cleaning up records for deleted files, there is room for improvement there....

> If you mean "dropped" as in goes to swap, then that's no good
No, Baloo reads pages from the database when, say, it is looking to see which files contain the term. If it doesn't modify them they are still "clean" and can be dropped (forgotten) to make way for another page it needs. If it does need that first page again it rereads it from the index. So, when memory gets tight, you see a growing number of reads from the index and that can dominate the I/O.

> ... it costs battery life and available time off AC ...
Content indexing is paused when on battery, baloo_file stops feeding batches of files to the extractor when on battery (assuming you've got power management working). If the extractor is in the middle of indexing as in your case, it is not killed.

Comment 60 Oded Arbel 2025-04-18 17:41:17 UTC

(In reply to tagwerk19 from comment #59)
> Baloo has been trying *long* and *hard* to get extra memory.

> I think in this case, you will need to override the limit, try "systemctl
> --user edit kde-baloo" as per Bug 502641

I have purged the database and started re-indexing from scratch. This has been running for almost 24 hours now, and status now shows

Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 3,144,355
Files waiting for content indexing: 2,764,823
Files failed to index: 0
Current size of index is 9.22 GiB

(I've since deleted a large folder that I didn't need, and it had some 4M files, I've done so after reindexing started and it looks like baloo_file_extractor has been spending most of the day going through the removed files and seeing that they are gone, very very very slowly - about 9 files / sec. I think I'll restart the indexing again...)

I started the re-indexing by running `balooctl6 disable`, `balooctl6 purge`, then `systemctl --user stop kde-baloo`, followed by `balooctl6 enable` - so baloo_file isn't running under systemd (which is a different issue that I'm not going to get into) and now baloo_file takes RES 3.1GB (SHR 2.4GB) and baloo_file_extractor takes RES 4.2GB (SHR 3.8GB). I'm assuming the shared memory is memory maps that have not been cleaned out since there is no memory pressure (I'm not running any serious workload as its the weekend, and with just some browsers my system can spare 8GB and more), and disregarding that - the actual usage is more or less what I posted in my previous comment.
 
> You have 5 million files, try asking for "baloosearch the | wc" and see how
> many hits you get and long it takes (and worth trying it more than once) and
> compare to your favourite grep (again try it more than once).

You'll find no quarrel with me about how useful baloo is. I personally use the tag search the most, and trying to run my workflows that need tag lookups without baloo (while it was purging and re-indexing) was a pain.
   
> No, Baloo reads pages from the database ... If it doesn't modify them they are still
> "clean" and can be dropped (forgotten) to make way for another page it
> needs.

Is this done automatically by the kernel, or does the application need to run a command when it encounters memory pressure?

> > ... it costs battery life and available time off AC ...
> Content indexing is paused when on battery

Thank you, I wasn't aware of that. I run a couple of tests and it looks like baloo_file_extractor is killed when going on battery (at least in my tests where it wasn't actually doing anything interesting). But the problem I reported in comment #56 was that baloo_file - not the extractor - was taking up RAM and CPU, and that isn't stopped when going on battery. And now I realize that this isn't the OP, which clearly states "File Extractor".

So at this point I'll stop posting on this issue - unless I can reproduce the *extractor* having issues. I'll do more investigations (as time permits) and also maybe I'll just get into the habit of spending a weekend every few weeks to re-index. I can probably easily automate that.

Thank you, Tagwerk, for the useful information!

Comment 61 Oded Arbel 2025-04-18 18:45:08 UTC

Ok, deleting 90% of the files Baloo sees (that I really didn't need - one git repo that made no sense and a bunch of node_modules for projects I haven't touched in years), and it behaves much better.

I wish Filelight would allow sorting disk usage by file count instead of size - Baobab does that, and that helped a lot to find the worse offenders.

Also, it would be very useful to specify folders to not index by name instead of path, so we could - for example - tell Baloo to never index a node_modules folder, no matter where it finds it in the disk, or a drive_c folder.

Comment 62 tagwerk19 2025-04-18 19:15:09 UTC

(In reply to Oded Arbel from comment #60)
> ... going through the removed files and seeing that they are gone ...
Yes, cleaning old records if you've deleted a large folder is troublesome :-/

> ... very very very slowly - about 9 files / sec ...
I suppose if it's flagged a file for content indexing and the file has gone by the time the extractor looks at it, then yes, deleting the record is good. I don't really have a feeling for whether 9 files a second is good or bad.

> ... so baloo_file isn't running under systemd (which is a different issue that I'm not
> going to get into) ...
There's a Bug on this, yes Bug 488178

> Is this done automatically by the kernel,
As far as I know, it's a kernel thing... And even it it wasn't it would probably be concealed within the LMDB database software.

> I personally use the tag search the most
I find the tag search The Amazing Hidden Superpower in Dolphin. I couldn't go back to working on a system where you put a file into just one folder. Don't know how people can work like that!

> ... workflows that need tag lookups without baloo (while it was purging and re-indexing) was a pain ...
You can index filenames (not content) and get the xattr tags, that would be fast. I think then you can enable content indexing and start the harder part of the job.

> Thank you, Tagwerk, for the useful information!
All part of the service, and thank you for the patience :-)

Comment 63 tagwerk19 2025-04-18 19:20:31 UTC

(In reply to Oded Arbel from comment #61)
> ... for example - tell Baloo to never index a
> node_modules folder, no matter where it finds it in the disk, or a drive_c
> folder ...
I think you can add the foldername to the "exclude filters" (rather than "exclude folders"). There is a "node_modules" included in the list. I would need to check the current behaviour

Comment 64 Oded Arbel 2025-04-19 01:37:35 UTC

(In reply to tagwerk19 from comment #63)
> I think you can add the foldername to the "exclude filters" (rather than
> "exclude folders"). There is a "node_modules" included in the list. I would
> need to check the current behaviour

Right - there is, and it is included. Its not exposed in the UI, which is why I didn't think about it. I've added a few additional filters that I need and my life is now better. Thanks again 🙏

Comment 65 Vyacheslav Kovalevsky 2025-04-28 12:25:28 UTC

(In reply to tagwerk19 from comment #57)
> (In reply to Oded Arbel from comment #56)
> > I think I have the same issue - baloo_file takes ~300M of RAM and ~50% of a
> > CPU, and htop shows it in D status all the time, while balooctrl6 monitor says
> > "Idle (Powersave)" and shows no traffic.
> It's possible it's clearing the records for files that have been deleted.
> You wouldn't see that in the monitor, you might see baloo_file (not
> baloo_file_extractor) busy with htop or iotop
> 
> > The kde-baloo.service on my system (Neon) has a limit of 512MB, which seems
> > quite high to me - that's a significant chunk of memory for a background
> > service that is supposed to be idle most of the time.
> 99% of the time 512MB is a good limit.
> 
> Think that that's the memory that Baloo uses as "cache", quite possibly used
> by clean pages that get dropped when something else needs the space. This
> memory is shared between the always-running baloo_file and the
> when-there-are-things-to-index baloo_file_extractor.
> 
> The 1% of the time. 512MB is constricting. If you are indexing a lot and are
> building a big transaction, you've got dirty pages that cannot just be
> dropped. Maybe they get swapped (and you *really* don't want that). If you
> are in that 1% territory, you see Baloo slowing as systemd(?) delays giving
> it extra memory - Baloo spends its time reading, dropping, rereading,
> dropping and re-rereading clean pages that it needs. I've not kept an list
> of bugs that could very well be the result of the constraint but there have
> been a slow and steady stream. In general you suggest increasing the cap and
> the issue goes quiet, reasonable to assume that people are happy.
> 
> I think a 25% maximum is reasonable, a 40% maximum (and zero swap) is also
> possible. From my experience neither of these impact system performance even
> with pathological test cases.

It has been really awful for me too, system being sluggish unresponsive for seconds (and minutes sometimes!). Ballo file extractor was not really showing up in KDE System Monitor (it said it was using very few resources, like 1%, no spikes, nothing), but when I run ioctl I saw it actually used a lot of disk IO.
I have a lot of files on my Ext4 drive (compiling kernels, creating many QEMU images etc.), and it was very awful when computer straight stopped responding, I almost considered reinstalling OS (I am using Arch Linux).

Comment 66 Oded Arbel 2025-04-28 13:21:37 UTC

(In reply to Vyacheslav Kovalevsky from comment #65)
> I have a lot of files on my Ext4 drive (compiling kernels, creating many
> QEMU images etc.), and it was very awful when computer straight stopped
> responding, I almost considered reinstalling OS (I am using Arch Linux).

You may want to exclude directories with lots of file that are not interesting to index (such as kernel source directories) - reducing the number of files that Baloo monitors is, IMO, the number one way to reduce the load from Baloo and to reduce the probability that it gets stuck.

Comment 67 John Kizer 2025-04-29 03:02:10 UTC

*** Bug 487916 has been marked as a duplicate of this bug. ***

Comment 68 tagwerk19 2025-04-29 07:15:37 UTC

(In reply to John Kizer from comment #67)
> *** Bug 487916 has been marked as a duplicate of this bug. ***
Bug 487916 was, initially at least, about possible changed with KF6. I think there wasn't anything specific about KF6. This bug was, initially at least, focusing on issues with NTFS.

It makes some sense to keep baloo_file_extractor issues separate from baloo_file; the extractor has to get hold of the plain text. It seems that the baloo_file issues *might* be connected to the background clean up of records from deleted files.

Both however can be affected by "too restrictive" systemd limits on one hand - and cases where the limits are not applying on the other.

Comment 69 Bug Janitor Service 2025-05-16 05:35:29 UTC

A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/233

Comment 70 Bug Janitor Service 2025-06-02 12:29:40 UTC

A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/236