427344 – Ability to configure (higher) number of indexing workers

Bug 427344 - Ability to configure (higher) number of indexing workers

Summary: Ability to configure (higher) number of indexing workers

Status:	REPORTED

Alias:	None

Product:	frameworks-baloo
Classification:	Frameworks and Libraries
Component:	general (other bugs)
Version First Reported In:	unspecified
Platform:	Other Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Stefan Brüns

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-10-04 21:43 UTC by philipp.l.klaus
Modified:	2024-07-09 18:47 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description philipp.l.klaus 2020-10-04 21:43:16 UTC

As far as I can see, the baloofilerc config file doesn't offer any option to configure the number of file indexing workers (baloo_file_extractor):
https://community.kde.org/Baloo/Configuration

On my system (up-to-date Archlinux), it seems to occupy only one CPU core at a time. While this might be the preferred default on many laptops and office PCs, it is rather limiting the initial indexing on a computer with a lot of files and a lot of computing power. My machine is equipped a 16-core Ryzen 3950X with Hyperthreading, so using only a single core for indexing means I have to wait much longer until the initial scan is finished than the hardware would be capable of otherwise.

So my request is the following:
Please consider adding the capability of configuring the number of parallel threads / processes used for indexing the files.

Comment 1 Nate Graham 2020-10-05 21:12:29 UTC

I think this has to be the first request in history for Baloo to use *more* resources. :)

Comment 2 Christoph Feck 2020-11-01 13:20:29 UTC

Are you sure on your system the limit isn't the I/O? Using I/O from multiple threads concurrently could even degrade the performance.

Also, I don't think the underlying database handles updates from concurrently running threads. It is a simple mmap'ed file, not a SQL database managed by a database server.

Comment 3 philipp.l.klaus 2021-02-16 10:12:46 UTC

It shouldn't be limited I/O wise: My system is on a Samsung SSD 970 EVO Plus 2TB with the following specs: sequential read 3500MB/s, sequential write: 3300MB/s (SLC cached 1750MB/s TLC), IOPS 4K (read/write): 620k/560k.