Bug 356357 - Continous index flushing with fdatasync degrades interactive performance
Summary: Continous index flushing with fdatasync degrades interactive performance
Status: CONFIRMED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: Other Linux
: HI normal
Target Milestone: ---
Assignee: Pinak Ahuja
URL:
Keywords:
: 393741 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-12-07 09:36 UTC by Riku Voipio
Modified: 2022-11-30 17:15 UTC (History)
17 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
disable sync to make baloo indexing less intrusive (452 bytes, patch)
2015-12-07 11:16 UTC, Riku Voipio
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Riku Voipio 2015-12-07 09:36:43 UTC
While baloo is indexing, the KDE UI becomes periodically unresponsive. While stracing the baloo_file_extractor, it is apparent the unresponsiveness happens when fdatasync() is called:

59872
pwrite(16, "D\16\6\0\0\0\0\0\t\25\4\0\303\2\0\0\1\10\0\0\26\1\0\1\1\10\0\0E\1\0\1"..., 2895872, 1625571328) = 2895872
pwrite(16, "\377w\6\0\0\0\0\0\240\376\4\0\254\3\0\0\1\10\0\0\26\1\0\1\1\241P\1\10\0\0 "..., 3850240, 1736437760) = 3850240
fdatasync(16, ...

It would be better to use MDB_NOSYNC and regenerate the index in the extremely unlikely event of database corruption.
Comment 1 Riku Voipio 2015-12-07 11:16:43 UTC
Created attachment 95923 [details]
disable sync to make baloo indexing less intrusive

This patch made indexing unnoticeable in UI. As side effect, indexing is a lot faster as it doesn't pause to sync all the time.
Comment 2 Vishesh Handa 2015-12-14 23:02:22 UTC
(In reply to Riku Voipio from comment #1)
> Created attachment 95923 [details]
> disable sync to make baloo indexing less intrusive
> 
> This patch made indexing unnoticeable in UI. As side effect, indexing is a
> lot faster as it doesn't pause to sync all the time.

I'm a little conflicted about this approach since when the index does get corrupted, it will be impossible for us to detect it. With our previous backend (xapian), we used to get lots of bug reports which were because of corrupted databases :(

Perhaps the correct approach would be to refactor `baloo_file_extractor` so as to not perform a commit so frequently. We currently do it after a fixed 40 files. Perhaps it would make sense to try and estimate the amount of changes, and then do a commit when we reach the threshold.

I'm not sure if I should keep this bug open or what. Specially since this is probably only a problem during first run.
Comment 3 Riku Voipio 2016-01-11 12:25:01 UTC
> I'm a little conflicted about this approach since when the index does get corrupted, it will be impossible for us to detect it. 

I think it would better to work on detecting and recovering corruption. It's not like a system crash while indexing is the only way the DB can be corrupted. To take a page from the crash-only software ideology[1], the idea is to concentrate in solid recovery rather than preventing crashes and data corruption in first place.

[1] https://lwn.net/Articles/191059/
Comment 4 Idonotexist 2016-10-14 06:44:27 UTC
(In reply to Vishesh Handa from comment #2)
> Perhaps the correct approach would be to refactor `baloo_file_extractor` so
> as to not perform a commit so frequently. We currently do it after a fixed
> 40 files. Perhaps it would make sense to try and estimate the amount of
> changes, and then do a commit when we reach the threshold.
> 
> I'm not sure if I should keep this bug open or what. Specially since this is
> probably only a problem during first run.

As I write this, Baloo is hammering my very modern system's HDD to a pulp. The disk activity LED is furiously lit. KDE's UI periodically freezes because of heavy disk I/O.

My typical solution is to
1) Pause indexing
2) Mount a 10GB ramdisk
3) Move ~/.local/share/baloo to said ramdisk
4) Symlink ~/.local/share/baloo to the ramdisk baloo
5) Resume indexing
6) When indexing is done, undo the above. 

I definitely do not think this bug should be closed. It is most certainly not caused only on first runs. The current Baloo hyperactivity was caused by my copying of a large number of small files from another system.

Vishesh, Baloo is a worthy attempt at an indexing system, and I commend your work. It uses a quality database backend in the form of LMDB. But any which way you and I might spin it, Baloo has a serious problem with I/O: it simply causes too much of it, too frequently. Numerous users have complained about this, and several currently open and closed bugs are traceable directly to this behaviour. Several users' impressions of Baloo, and KDE writ large, are tainted by Baloo's abusive disk activity.

As for how to fix this problem: 40 files per transaction commit, as you said, is not a good enough solution. At the very least, the criterion should be based on LMDB's page size and the disk block size. I also propose that this criterion not be based purely on number of files; It should have a time component, and should not commit transactions more often than once per second. A human user couldn't care less that newly-appeared files were indexed this second or next, and a file indexer is after all primarily, though not exclusively, for human use.

Here's a relatively simple proposal: The indexer operates on a configurable *duty cycle* D of 1%-50% and a time period T of 1s-3600s. For (1-D)*T seconds per period, Baloo sleeps. For D*T seconds per period, Baloo *exclusively* performs data/metadata reads from the filesystem, keeping an eye on wall-clock time. Once D*T seconds of work have elapsed, make a *single transaction* containing all of the stuff that the indexer read in the previous duty cycle. Then go back to sleep again. In this way, exactly one mdb_txn_commit() and fdatasync()/msync() occurs per time period, they are likely to have accumulated far more than 40 files worth of information, and 50-99% of I/O bandwidth is available for other uses, such as satisfying the desktop UI's needs.
Comment 5 Lukas Ba. 2017-01-03 02:24:39 UTC
> Vishesh, Baloo is a worthy attempt at an indexing system, and I commend your work. It uses a quality database backend in the form of LMDB. But any which way you and I might spin it, Baloo has a serious problem with I/O: it simply causes too much of it, too frequently. Numerous users have complained about this, and several currently open and closed bugs are traceable directly to this behaviour. Several users' impressions of Baloo, and KDE writ large, are tainted by Baloo's abusive disk activity.

https://linux.die.net/man/1/ionice
[...]
Idle
    A program running with idle io priority will only get disk time when no other program has asked for disk io for a defined grace period. The impact of idle io processes on normal system activity should be zero. This scheduling class does not take a priority argument. Presently, this scheduling class is permitted for an ordinary user (since kernel 2.6.25). 
Best effort
[...]

Baloo's priority is set to Idle, which means it should not cause "abusive" disk activity.
Unless your IO scheduler does not support ionice. Could this be your problem?
https://blogs.kde.org/2014/10/15/ubuntus-linux-scheduler-or-why-baloo-might-be-slowing-your-system-1404

> Here's a relatively simple proposal: The indexer operates on a configurable *duty cycle* D of 1%-50% and a time period T of 1s-3600s. For (1-D)*T seconds per period, Baloo sleeps. For D*T seconds per period, Baloo *exclusively* performs data/metadata reads from the filesystem, keeping an eye on wall-clock time. Once D*T seconds of work have elapsed, make a *single transaction* containing all of the stuff that the indexer read in the previous duty cycle. Then go back to sleep again. In this way, exactly one mdb_txn_commit() and fdatasync()/msync() occurs per time period, they are likely to have accumulated far more than 40 files worth of information, and 50-99% of I/O bandwidth is available for other uses, such as satisfying the desktop UI's needs.

You are suggesting rate limiting the IO. Rate limiting is inferior to scheduling (which is already being done) because:

1. The rate limit is wasting (1-D) of the available bandwidth in idle situations. (When baloo is the only application using IO.)

2. If Bmin < 1 is needed to satisfy the user requirements, (1-D) might still be smaller than Bmin. Scheduling with Idle priority will leave 1 instead of of (1-D) to the user, which is enough in any case. Also, now we don't need to find Bmin anymore.


Anyway, minimizing the caused IO is still useful.
Comment 6 Riku Voipio 2017-01-03 09:05:53 UTC
(In reply to kdeu from comment #5)
> https://linux.die.net/man/1/ionice

> Baloo's priority is set to Idle, which means it should not cause "abusive"
> disk activity.

ionice is being used, and it does a good job to makes sure the crawling activity happens at lower priority than other use. 

The effect of ionice is ruined by aggressive fdatasync usage when writing the large LMDB database. It appears fdatasync causes disk writes from a kernel thread that has collected all buffered disk writes. Buffers don't carry the iopriority info on them. Kernel thread just sees the red flag "please commit this data ASAP" and then thinks "to keep FS consistent, I should also commit lots of other unwritten pages just to be sure".

Try the patch I made. The disk light still flashes like mad but it doesn't ruin interactive use anymore. Iopriority works as expected until you ask the kernel to be sure writes get to disk too. 

(In reply to kdeu from comment #3)
> I'm not sure if I should keep this bug open or what. Specially since this is probably only a problem during first run.

It also appears when doing operations like switching branches in huge git trees (linux, chromium), copying directories etc.
Comment 7 Lukas Ba. 2017-01-03 17:28:40 UTC
> I think it would better to work on detecting and recovering corruption.

> MDB_NOSYNC Don't flush system buffers to disk
> when committing a transaction. This optimization
> means a system crash can corrupt the database or 
> lose the last transactions if buffers are not yet
> flushed to disk.

Are there any experiences whith using LMDB and NOSYNC?

Are there tools/ways to recover a corrupted database or lost transactions with LMDB? They might not even exist yet, if people don't use NOSYNC.

I think that developing such a tool is out of the scope of this project.
It seems that LMDB was built around the idea of not needing recovery.

> How safe is your DB? LMDB is crash-proof on all current filesystem designs.
Comment 8 Riku Voipio 2017-01-03 21:12:27 UTC
(In reply to kdeu from comment #7)
> Are there any experiences whith using LMDB and NOSYNC?

Personally I've happily used that with patch attached since filing this bug - except for a handful of upgrades when I forgot the patch, only to notice that suddenly the disk light flashing means jittery UI again.

Baloo makes finding files from local HD almost as easy as finding public files with google. It's really sad if people disable baloo because it's causing the desktop freeze and stutter.

> Are there tools/ways to recover a corrupted database or lost transactions
> with LMDB? They might not even exist yet, if people don't use NOSYNC.

You are assuming that under current configuration LMDB can't get corrupted. File systems are nasty and even with fdatasync there are caveats. But for most users, sudden crashes (especially in middle of transactions) is really rare events. 

Lost transactions are not a problem, entries would be just regenerated in next index scanning. Recovering the DB is somewhat pointless - you can just regenerate it from scratch, if under idle iopriority the indexing really has no user impact.
Comment 9 Lukas Ba. 2017-01-03 23:24:56 UTC
(In reply to Riku Voipio from comment #8)
> (In reply to kdeu from comment #7)
> > Are there any experiences whith using LMDB and NOSYNC?
> 
> Personally I've happily used that with patch attached since filing this bug
> - except for a handful of upgrades when I forgot the patch, only to notice
> that suddenly the disk light flashing means jittery UI again.

I mean experiences with many users, and with crashes. If you don't test for crashes that's not a test for data corruption. You probably didn't crash your computer on purpose.
 
> You are assuming that under current configuration LMDB can't get corrupted.
> File systems are nasty and even with fdatasync there are caveats.

That's not true. Without NOSYNC, LMDB is safe and does not get corrupted. See:

http://openldap-devel.openldap.narkive.com/k1bbhN5H/lmdb-crash-consistency-again#post7
> All in all a bunch of bogus reporting; claiming that all DBs are broken when
> in fact LMDB is perfectly correct

> But for
> most users, sudden crashes (especially in middle of transactions) is really
> rare events.

There are linux users who suffer from frequent power outages.
> we used to get lots of bug reports which were because of corrupted databases

> Recovering the DB is somewhat pointless - you can just
> regenerate it from scratch, if under idle iopriority the indexing really has
> no user impact.

Yes you can regenerate from scratch, but how do you detect when to have to do that? This concern was already mentioned in comment #2:

> I'm a little conflicted about this approach since when the index does get corrupted,
> it will be impossible for us to detect it. With our previous backend (xapian), we
> used to get lots of bug reports which were because of corrupted databases :(
Comment 10 Nate Graham 2018-11-26 21:26:08 UTC
40 files per sync seems reasonable for incremental additions after the DB has already been been populated during the initial indexing operation. It seems like the place where this really gets people is during that initial indexing, where the system's responsiveness can be degraded due to the heavy IO. If we have a way to detect the initial indexing operation, maybe we could use a less aggressive sync policy there, either increasing the number of files before each sync, or switching to a time-based sync or something.

Stefan and/or Igor, does this idea make any sense?
Comment 11 Feng 2018-12-01 13:06:34 UTC
(In reply to Nate Graham from comment #10)
> 40 files per sync seems reasonable for incremental additions after the DB
> has already been been populated during the initial indexing operation. It
> seems like the place where this really gets people is during that initial
> indexing, where the system's responsiveness can be degraded due to the heavy
> IO. If we have a way to detect the initial indexing operation, maybe we
> could use a less aggressive sync policy there, either increasing the number
> of files before each sync, or switching to a time-based sync or something.
> 
> Stefan and/or Igor, does this idea make any sense?

My laptop has 32GB memory with SSD driver. But when baloo is indexing, I have to manually reboot it, as everything is freezed except the power button.

Is is possible for baloo to do indexing cacahed in memory instead of instant i/o on disk?
Comment 12 Stefan Brüns 2018-12-01 14:06:38 UTC
Eventually the data has to be flushed to disk. The flushing has to be done in a specific order, to guarantee the on-disk data is consistent.

You can of course delay the flush, but then you are just shifting the stutters from one time instant to a different one.
Comment 13 Nate Graham 2019-05-12 23:24:58 UTC
*** Bug 393741 has been marked as a duplicate of this bug. ***
Comment 14 Kai Krakow 2019-09-29 19:37:39 UTC
I've added some patches before finding this bug. My findings are that disabling read-ahead on the database somewhat helps in low-mem situation but the biggest problem is fsync: That call will actually sync the whole filesystem and not just the database file, and doing that constantly is toxic to performance. It's as simple as that. Here's the link: https://bugs.kde.org/show_bug.cgi?id=404057 and https://github.com/kakra/baloo/commits/fixes/bko-404057. Some of these patches may not be needed at all, some optimize for corner cases. But we should really turn off fsync as the very least.

If you don't want to disable fsync, then LMDB is probably the wrong tool to do the job. You'd then need some append-only database with garbage collection (LMDB is already acting a lot like this). I'm pretty sure LMDB is actually a bad choice for baloo, if, and only if, you expect it to be the only software needing to do IO. But after some research, I think LMDB is not the wrong tool, thus we need to adjust how Baloo uses it.

The devs of LMDB say that it is safe to use without fsync on any current Linux filesystem (it can loose transactions but it won't corrupt). It is not safe to use on some hypothetical filesystems (it could corrupt).

Can we please at least let the user decide and allow him to shoot his own foot? Maybe a config option or env variable?

Baloo already has some sort of recovery: If it fails to open the database it will simply purge and recreate it. Maybe it could detect corruptions during use somehow and act similar? I'm not sure if LMDB function could return errors or simply cause crashes. In the first case, it should be easy.

I also like the time-based instead of count-based approach much more: Linux already flushes data after no more than 30s, why not just use the same amount?

Regarding fsync: I'm not sure if LMDB uses fsync or fdatasync, or if this is even a choice. The developers say in their documentation it's fsync, the strace by Riku says fdatasync. Whatever is used: It's a problem: You cannot expect users to use the software if it totally destroys their user experience.

Baloo should be designed around the idea that corruption can occur and luckily it's easy to recover from it: Just rebuild the database.

So the proposed solution is really about: How do we properly detect database corruption?
Comment 15 tagwerk19 2021-08-04 10:49:46 UTC
Is this still an issue ... ?

    ... To the extent that it can be pinned down to syncing writes
    to the index?

I think there are still things to look at - for example batching up the initial indexing when there are *very* *many* new files to index (Bug 394750), adjusting the number of files "indexed in a batch" when content indexing (Bug 373021), and dealing with many deleted items (possibly Bug 437754 or Bug 353874. I'm not sure there's a bug specifically that clearing up deleted items is slow)

I think these are more related making sure you commit before you "risk" using swap but also you make maximum use of RAM so you don't commit too often.

I don't think the attached patch

    https://bugs.kde.org/attachment.cgi?id=95923

that avoids the "sync" after each transaction was applied. This was also proposed (2019/09) here:

    https://bugs.kde.org/show_bug.cgi?id=404057#c12

It may be that "batching up" the indexing, implying fewer, larger, transactions, reduced the advantage

For Bug 400704, most of the reports date from 2017/2018.

I am tempted to flag this as "needs info" to see if there are other test cases that need to be looked at...