As far as I can see, the baloofilerc config file doesn't offer any option to configure the number of file indexing workers (baloo_file_extractor): https://community.kde.org/Baloo/Configuration On my system (up-to-date Archlinux), it seems to occupy only one CPU core at a time. While this might be the preferred default on many laptops and office PCs, it is rather limiting the initial indexing on a computer with a lot of files and a lot of computing power. My machine is equipped a 16-core Ryzen 3950X with Hyperthreading, so using only a single core for indexing means I have to wait much longer until the initial scan is finished than the hardware would be capable of otherwise. So my request is the following: Please consider adding the capability of configuring the number of parallel threads / processes used for indexing the files.
I think this has to be the first request in history for Baloo to use *more* resources. :)
Are you sure on your system the limit isn't the I/O? Using I/O from multiple threads concurrently could even degrade the performance. Also, I don't think the underlying database handles updates from concurrently running threads. It is a simple mmap'ed file, not a SQL database managed by a database server.
It shouldn't be limited I/O wise: My system is on a Samsung SSD 970 EVO Plus 2TB with the following specs: sequential read 3500MB/s, sequential write: 3300MB/s (SLC cached 1750MB/s TLC), IOPS 4K (read/write): 620k/560k.