Bug 404057 - Uses an insane amount of memory (RSS/PSS) writing a *ton* of data while re-indexing unchanged files
Summary: Uses an insane amount of memory (RSS/PSS) writing a *ton* of data while re-in...
Status: CONFIRMED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: 5.54.0
Platform: Debian unstable Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-07 11:48 UTC by Martin Steigerwald
Modified: 2023-01-01 11:49 UTC (History)
12 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Reduce stack pressure (1.35 KB, patch)
2019-09-28 15:37 UTC, Kai Krakow
Details
Experimental: Reduce mmap by one magnitude (1.57 KB, patch)
2019-09-28 15:56 UTC, Kai Krakow
Details
Prepare simpler coding of environment flags (1.28 KB, patch)
2019-09-28 20:40 UTC, Kai Krakow
Details
Disable read-ahead of mmap access (1.11 KB, patch)
2019-09-28 20:42 UTC, Kai Krakow
Details
Don't fsync the file-system (1.18 KB, patch)
2019-09-28 20:44 UTC, Kai Krakow
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Steigerwald 2019-02-07 11:48:57 UTC
SUMMARY
I see that baloo_file_extractor easily uses 5 GiB or more of RSS (resident memory). The Proportional Set Size which attributes shared memory to all of the processes who share it proportionately is almost as high. So it appears to me the process uses almost all of the memory for itself.



STEPS TO REPRODUCE
1. Have it index a lot of files
2. Watch memory usage 
3. If you like to kick it beyond any sanity:
   - have it go at the results of git clone https://github.com/danielmiessler/SecLists.git
   - here it eats the resources of a quite potent laptop with 16 GiB of RAM as if there was no tomorrow.

OBSERVED RESULT
Sample of smemstat -T:
   PID      Swap       USS       PSS       RSS User       Command
  4791     0,0 B  6136,7 M  6142,8 M  6169,7 M martin     /usr/bin/baloo_file_extractor

   PID      Swap       USS       PSS       RSS User       Command
  4791     0,0 B  4595,1 M  4598,2 M  4617,6 M martin     /usr/bin/baloo_file_extractor

Yes, there are times when Baloo even frees some memory again, just to use even more later on.

Granted, this laptop has 16 GiB of RAM, but this still appears to be off for me. Also I see the machine actually swapping out.

Also the disk I/O it generates is beyond anything that I would even consider to be remotely sane for a laptop or any desktop machine:

pidstat -p 4791 -d 1
Linux 5.0.0-rc4-tp520 (merkaba)         07.02.2019      _x86_64_        (4 CPU)

12:32:21      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
12:32:22     1000      4791  75736,00      0,00      0,00       4  baloo_file_extr
12:32:23     1000      4791  33348,00 111232,00      0,00       3  baloo_file_extr
12:32:24     1000      4791  54288,00      0,00      0,00       4  baloo_file_extr
12:32:25     1000      4791  20516,00 119616,00      0,00       2  baloo_file_extr
12:32:26     1000      4791  24296,00      0,00      0,00       2  baloo_file_extr
12:32:27     1000      4791  35532,00      0,00      0,00       3  baloo_file_extr
12:32:28     1000      4791  32548,00 113112,00      0,00       3  baloo_file_extr
12:32:29     1000      4791  26720,00      0,00      0,00       1  baloo_file_extr
12:32:30     1000      4791  24048,00 103496,00      0,00       6  baloo_file_extr
12:32:31     1000      4791   7636,00      0,00      0,00      71  baloo_file_extr
12:32:32     1000      4791  16208,00      0,00      0,00      36  baloo_file_extr
12:32:33     1000      4791  18048,00      0,00      0,00      67  baloo_file_extr
12:32:34     1000      4791  23236,00      0,00      0,00      63  baloo_file_extr
12:32:35     1000      4791  16700,00      0,00      0,00      61  baloo_file_extr
12:32:36     1000      4791  20736,00 122392,00      0,00      23  baloo_file_extr
12:32:37     1000      4791  26752,00      0,00      0,00      36  baloo_file_extr
12:32:38     1000      4791  42456,00      0,00      0,00       4  baloo_file_extr
12:32:39     1000      4791  25156,00 118104,00      0,00       2  baloo_file_extr
12:32:40     1000      4791  12828,00      0,00      0,00       1  baloo_file_extr
12:32:41     1000      4791  14512,00      0,00      0,00       3  baloo_file_extr
12:32:42     1000      4791   7384,00      0,00      0,00       0  baloo_file_extr
12:32:43     1000      4791   2316,00 420664,00      0,00       1  baloo_file_extr
12:32:44     1000      4791      0,00  56520,00      0,00       0  baloo_file_extr
12:32:45     1000      4791      0,00  75188,00      0,00       0  baloo_file_extr
12:32:46     1000      4791      0,00  55376,00      0,00       0  baloo_file_extr
12:32:47     1000      4791      0,00  64496,00      0,00      33  baloo_file_extr
12:32:48     1000      4791      0,00      0,00      0,00      85  baloo_file_extr
12:32:49     1000      4791      0,00      0,00      0,00      89  baloo_file_extr
12:32:50     1000      4791      0,00      0,00      0,00      86  baloo_file_extr
12:32:51     1000      4791     16,00      0,00      0,00      83  baloo_file_extr
12:32:52     1000      4791   2772,00    220,00      0,00      58  baloo_file_extr
12:32:53     1000      4791  28056,00      4,00      0,00       3  baloo_file_extr
12:32:54     1000      4791  81328,00      0,00      0,00       8  baloo_file_extr
12:32:55     1000      4791  71740,00      0,00      0,00       8  baloo_file_extr
12:32:56     1000      4791  46088,00      0,00      0,00       6  baloo_file_extr
12:32:57     1000      4791  44320,00      0,00      0,00       5  baloo_file_extr
12:32:58     1000      4791  29576,00      0,00      0,00       4  baloo_file_extr
12:32:59     1000      4791  41568,00      0,00      0,00       5  baloo_file_extr
12:33:00     1000      4791  31244,00      0,00      0,00       5  baloo_file_extr

12:33:00      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
12:33:01     1000      4791  23764,00      0,00      0,00       4  baloo_file_extr
12:33:02     1000      4791  24272,00      0,00      0,00       5  baloo_file_extr
12:33:03     1000      4791  19840,00      0,00      0,00       5  baloo_file_extr
12:33:04     1000      4791  22096,00      0,00      0,00       5  baloo_file_extr
12:33:05     1000      4791  14696,00      0,00      0,00       4  baloo_file_extr
12:33:06     1000      4791  14204,00      0,00      0,00       4  baloo_file_extr
12:33:07     1000      4791  12336,00      0,00      0,00       3  baloo_file_extr
12:33:08     1000      4791  23796,00      0,00      0,00       3  baloo_file_extr
12:33:09     1000      4791  21076,00      0,00      0,00       3  baloo_file_extr
12:33:10     1000      4791   8280,00 194116,00      0,00       2  baloo_file_extr
12:33:11     1000      4791    744,00 777584,00      0,00       4  baloo_file_extr

Yep, that is right: that are 770 MiB!

12:33:12     1000      4791    160,00      0,00      0,00      39  baloo_file_extr
12:33:13     1000      4791     16,00      0,00      0,00      90  baloo_file_extr
12:33:14     1000      4791      0,00      0,00      0,00      53  baloo_file_extr
12:33:15     1000      4791      0,00      0,00      0,00     139  baloo_file_extr
12:33:16     1000      4791      0,00      0,00      0,00     103  baloo_file_extr
12:33:17     1000      4791      0,00  29072,00      0,00      88  baloo_file_extr
12:33:18     1000      4791      0,00  70980,00      0,00      68  baloo_file_extr
^C
Durchschn.:  1000      4791  19701,54  42669,68      0,00      26  baloo_file_extr

Yes, that is about 42 MiB/s! But on the other hand the index size does not nearly increase by that rate. So what does it actually write there? The index is currently at 9,48 GiB.

Now I have a gem here:

   PID      Swap       USS       PSS       RSS User       Command
  4791     0,0 B  8615,9 M  8617,1 M  8630,8 M martin     /usr/bin/baloo_file_extractor

According to balooctl status during that time it indexed:
[…]SecLists/Passwords/Common-Credentials/10-million-password-list-top-100000.txt: OK
[…]SecLists/Passwords/Common-Credentials/10-million-password-list-top-1000000.txt

Seriously there are two things wrong with that:
- That file is *only* 8.2 MiB big
- There is never ever an excuse to use 8 GiB of RSS for file indexing.

I bet there should be a size limit at what to grok. Baloo certainly should not try to index files which are several GiB big.

And yes, I can tell it to exclude those, but then its something else. In my oppinion it is Baloo's responsibility to keep resource usage within check.

So in short: Recent Baloo, I did not see this prior to KDE Frameworks 5.54, at least not in that dimension, basically manages to hog a ThinkPad T520 with Sandybridge dual core, 16 GiB of RAM, and dual SSD BTRFS RAID 1.

For now I let it run, in the hope that eventually at some time it completes and stays quiet without me having to kill its processes, as it does not appear to respond to balooctl stop in a reasonable time either.


EXPECTED RESULT
A more reasonable memory and I/O usage while indexing. Basically Baloo should stay in the background. IMHO there is never ever an excuse for baloo_file_extractor to use 8 GiB or more of RSS. Never… ever…


SOFTWARE/OS VERSIONS
Linux: Debian Unstable
KDE Plasma Version: 5.14.5
KDE Frameworks Version: 5.54
Qt Version: 5.11.3
Comment 1 Martin Steigerwald 2019-02-07 11:51:34 UTC
I stopped it as laptop became unresponsive due to Baloo's activity. This is not going to fly. I may let it do its work over night or so when I do not use the laptop.
Comment 2 Martin Steigerwald 2019-02-07 21:17:56 UTC
I disabled it completely for now, cause an index size of 17 GiB, is more that I am willing to accept:

LANG=C balooctl status
Baloo File Indexer is running
Indexer state: Idle
Indexed 542390 / 559358 files
Current size of index is 17.64 GiB

That index size raised during about indexing maybe 3000 new files. It was basically crunching around 540000 files for *hours*. During that time it added about 8 GiB for the index.

According to balooctl indexSize there is clearly a bug somewhere:

balooctl indexSize
Actual Size: 17,64 GiB
Expected Size: 6,31 GiB

           PostingDB:       1,70 GiB    73.408 %
          PositionDB:       2,49 GiB   107.739 %
            DocTerms:       1,15 GiB    49.863 %
    DocFilenameTerms:      56,76 MiB     2.397 %
       DocXattrTerms:            0 B     0.000 %
              IdTree:      10,19 MiB     0.430 %
          IdFileName:      42,19 MiB     1.781 %
             DocTime:      25,41 MiB     1.073 %
             DocData:      34,28 MiB     1.448 %
   ContentIndexingDB:     540,00 KiB     0.022 %
         FailedIdsDB:       4,00 KiB     0.000 %
             MTimeDB:      11,02 MiB     0.465 %

Just see actual versus expected size.

I am willing to keep this index for a while in case you'd like me to run some command on it.

I now see something else in ~/.xsession-errors:

"somefilename in home" id seems to have changed. Perhaps baloo was not running, and this file was deleted + recreated

This message appeared more than 150000 times in ~/.xsession-errors:

% grep -c "Perhaps baloo was not running" ~/.xsession-errors
153331

The files are on an BTRFS RAID 1 and most have not been changed… however I stopped Baloo due to excessive resource usage quite often already. Thus, yes, it has not been running all the time.
Comment 3 Kai Krakow 2019-02-08 00:19:36 UTC
@Martin Thanks for pointing me here.

I can confirm the observations:

RSS can grow easily above 3-4 GB.

baloo_file_extractor generates a lot IO with high throughput (sometimes 100 MB/s), mostly while scraping PDF files (i.e. my Calibre library), up to the point that the whole desktop becomes unresponsive and laggy. It's mostly read accesses with writes coming by in bursts once in a while. Especially btrfs has it's problems with these access patterns. The DB is already created nocow.

The index file seems to be growing and growing. Last time I purged it when it reached 19 GB. This is about the point when the system becomes unusable due to IO stalls.

"balooctl" cannot really do anything: Run "balooctl stop" and it wouldn't stop (or restart instantly). Run "balooctl disable" and it will be back on next reboot. Run "balooctl start" and it says that another instance is already running even when there isn't. I'm not sure if baloo is currently even able to monitor and know its own status.

VSS of at least two baloo processes is 256GB. While I know that this is only allocated not used, it still seems to have an effect on kernel memory allocation performance. The system feels snappier when I "killall baloo" even when baloo was idle and only used minor amounts of memory. It should probably just not do that. I'm not sure if this is by using mmap. But if it is, it may explain a lot of the overwhelming IO patterns.

Eventually baloo finishes if letting it run long enough. But the whole process repeats from scratch when rebooting the machine. The counter for indexed files is growing by a huge amount after each reboot - as if it doesn't properly detect duplicates nor cleanup old stuff. It looks like it detects all files as new/modified (which is not true) and adds them to the index again.

CPU usage was moderate and nothing I care about too much because it runs at low CPU priority.

System specs:

Linux 4.20.6-gentoo with CK patchset, i7-3770K, 16 GB RAM
BFQ-MQ IO scheduler
4-disk RAID-1 btrfs running through bcache on a 400G SSD caching partition
systemd with dbus-user-session

Baloo database directory is made nocow (otherwise I get very rhythmic IO noise from the harddisks as it seems to rewrite data over and over again, resulting in a lot of fragmentation and cow relocations)

Wishlist entry:
It should be possible to easily move baloo into a cgroup (maybe it could create one itself, or we could configure it to optionally join a systemd slice) so I could limit its memory usage. Modern kernels will limit cache usage that way, too. Currently when running baloo, it will dominate the disk cache for it's own purpose. OTOH, maybe it's just missing proper cache hinting via fadvise().

Limiting memory usage via cgroups is already pretty effective for browsers, see here:
https://github.com/kakra/gentoo-cgw

I already considered doing something similar for baloo but I think it's preferable if it would manage its own resource usage better by itself.

Baloo could also monitor loadavg and reduce its impact on system performance automatically. Here's an example which has been very successful:
https://github.com/Zygo/bees/commit/e66086516fdb9f9cc2d703fb8101f6116ce169e9

It inverts the loadavg function to calculate the current point-in-time load and adjusts its resource impact based on this, targeting a user-defined loadavg. This commit did magic to system responsiveness while the daemon is running and working.
Comment 4 Martin Steigerwald 2019-02-08 07:39:38 UTC
Thanks, Kai for confirming this issue. I can also confirm that I was not able to stop any of this reliably via balooctl stop / suspend, whatever… just killing Baloo process (with SIGTERM was enough) helped.

Let's not discuss possible solutions in order to keep the report concise, I'd say, and first let the developer have a say. I believe there is a bug somewhere, as I did not see this excessive behavior with earlier version. Baloo has often been a resource hog, but not in this cross dimension. I am not positive that high VSZ would have a big influence, so I'd focus on RSS/PSS usage. Also IMHO Baloo should not need to resort to CGroup limits – IMHO there is a bug here and I'd first see about addressing it and then whether anything else is needed. No need to discuss any of that either. Let's just provide the developer with what we found, the facts. (I know it can be tempting to provide further guidance. I am holding Linux performance analysis & tuning trainings, since almost a decade.)
Comment 5 Kai Krakow 2019-02-08 08:08:18 UTC
@Martin
Yes, you're right. Let's keep this nice and tidy. Let's see how a solution works out. I'll then look into maybe opening new issues or looking into the source code to suggest a patch.
Comment 6 Martin Steigerwald 2019-04-16 13:10:28 UTC
This now happened also for the second user on this laptop. Totaling to another 16 GiB of database and baloo_file_extractor again at somewhat between 4.4 / 5 GiB RSS.

So I disabled Baloo completely for the second user as well.
Comment 7 Kai Krakow 2019-09-28 15:37:37 UTC
Created attachment 122918 [details]
Reduce stack pressure

This patch reduces pressure on the used stack size by looping instead of recursively calling itself when skipping files to index. This can add up a lot when skipping many files/directories in succession.
Comment 8 Kai Krakow 2019-09-28 15:56:54 UTC
Created attachment 122919 [details]
Experimental: Reduce mmap by one magnitude

This patch reduces the memory map size for LMDB by one order of magnitude (16 instead of 256 GB). After applying the patch, I purged the DB and restarted baloo.

It churns along nicely now, I/O is down to less than 10 MB/s instead of 50-100 MB/s constantly before. Also, running action that obviously do a bunch of memory allocations in Plasma (like opening the app drawer) now run much smoother again (instantly instead of a noticeable but subjective delay). The whole system feels much smoother again. I'm guessing that a lot of ongoing dirty-page writebacks, page faults and VMM handling has a lot of drawbacks and introduces a lot of TLB flushes because mappings into the process are constantly updated. It also seems to introduce a lot of I/O overhead. I'm not sure why that is but it seems this big mmap has indeed drawbacks. A lot of random accesses into the mmap may cause unintentional read-ahead, unpredictable IO patterns and may dominate the cache which is what I believe causes the excessive I/O behavior.

This patch (together with the previous patch) makes my system run much nicer again. I can actually use krunner again without causing a lot of additional IO and lags. My system has 32 GB of RAM.

Looking at all this, I wonder if LMDB is really the right tool. It is tempting to use it, but from the project documentation it seems to be intended as a read-mostly database. This is clearly not what baloo does with it, especially during re-indexing/updating or first indexing. The mmap size seems to be tightly bound to the maximum DB size which, looking at my above test results, limits the scaling of baloo a lot.

It should probably not be too difficult to swap LMDB with another key/value database better fitting the usage pattern (bursts of lots of write transactions with only occasional read transactions when a user actually searches for something). LMDB (as the DBE backing the OpenLDAP project) seems to be designed for exactly the opposite usage pattern.

Are there any more thoughts of it? Any idea which key/value DBE could fit better? What about multi-threading? Current code seems to run with only 1 thread in parallel only anyways despite using the thread pool classes of Qt. I'd volunteer to invest some spare time into swapping out LMDB for something different.
Comment 9 Kai Krakow 2019-09-28 17:24:08 UTC
Here's more evidence of why LMDB may be a particularly bad choice for the workload applied by baloo: It is btree organized, and writing and maintaining btrees will result in a lot of random I/O. At some point in time, when the DB has become big enough or scrambled enough due to constant updates, this will backfire badly resulting in very bad I/O patterns.

https://blog.dgraph.io/post/badger-lmdb-boltdb/

Baloo should migrate to a key/value store that is much better at writing data and maintaining its internals. Read performance of the database should probably not be the primary concern but performance of long-term writing and updating: It should maintain good read and write performance. According to the article, LMDB doesn't (except you give it the full 256GB RAM and lock it into memory).

Researching a little further, we can find a quite different picture:
https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database

It says that LMDB has exceptional write performance and maintains good performance altogether. Maybe this would need some benchmarks but it probably holds true only when the DB fully fits into memory all the time. And looking at the design description in that article we can easily see the downsides: The database can always only increase in size, even more when writing concurrently (there's no locking, so any time during concurrent access patterns it will append to the database). It also never re-organizes its internal structure, it just reuses memory blocks allocated from a free blocks tree without taking HDD access patterns into account. And, LMBDs design pays back best only with big values. I don't think this is what baloo stores.

The article further says that LMDB can (on hypothetical file systems) fail on Linux when not using fsync(). Was fsync() added to LMDB for such a hypothetical case? This would be fatal to system performance.

LMDB seems to be baked into a lot of KV databases due to it's seemingly good performance.

So actually, this would need a lot more insight to decide whether LMDB is suitable for baloo (maybe it is but it isn't used optimally). Someone with more real-world experience of KV databases and associated usage patterns may comment on this.

Currently, limiting the mmap size helps a lot here. And as mentioned by Martin, there's clearly a bug somewhere resulting in massive write work-loads and exceptional growth of the database. Maybe it's just a really bad access pattern by coincidence that results in exceptional bad behavior of LMDB. I was very happy with baloo performance for a long time until it suddenly broke some day. I'm not even sure that's baloo's fault: Judging from the commit subjects the code hasn't undergone any substantial changes since a long time, only small fixes and tweaks. There's commit b0890aca71aa4f0fdabe65ee7b7fbd0bc844d8b8 after KF 5.27.0 which bumped maximum index size from 5 GB to 256 GB. @Martin May this be around the time (end of 2016) when it broke for you? Your "balooctl indexSize" example seems to suggest there's a big rollover of copy-on-write operations leaving unused memory blocks behind (maybe to small to be effectively reused) and thus blowing up the DB file size.
Comment 10 Kai Krakow 2019-09-28 20:40:27 UTC
Created attachment 122925 [details]
Prepare simpler coding of environment flags

This simply prepares the following patches and introduces no functional change.
Comment 11 Kai Krakow 2019-09-28 20:42:16 UTC
Created attachment 122926 [details]
Disable read-ahead of mmap access

We should not read-ahead when accessing the database because it may introduce thrashing during low-mem situation because read-aheads would start dominating the cache. This also takes some pressure away from the file-system.
Comment 12 Kai Krakow 2019-09-28 20:44:49 UTC
Created attachment 122927 [details]
Don't fsync the file-system

Let's not stress the system with fsync() after each DB transaction. This database can be easily rebuild in case it crashes. This patch removes latency spikes and input lag from the desktop environment while the indexer is running. It also seems to increase general indexing throughput while lowering the performance impact.
Comment 13 Kai Krakow 2019-09-28 20:49:10 UTC
I've added some patches to my experimental patchset after crunching through some of the documentation and articles available. The system responsiveness has improved a lot. Can anyone confirm that these patches help?
Comment 14 Kai Krakow 2019-09-28 22:10:16 UTC
Meanwhile, my patched indexer started content indexing phase. I also added back all the expensive directories I excluded previously. It's currently indexing with a mixed R/W workload of up to 200 MB/s (most time 50-100 MB/s) without much of an impact on desktop performance. Looks good so far.
Comment 15 Martin Steigerwald 2019-09-28 22:33:00 UTC
Kai, thank you very much for your work on this! About an alternative to LMDB… I am not sure at the moment. Will think about it.
Comment 16 Martin Steigerwald 2019-09-28 22:34:49 UTC
I bet I can just compile baloo via kdesrc. Does it have many dependencies on KF libraries? I do not like to break my system at this point in time and it has been a long time I last used kdesrc.
Comment 17 Kai Krakow 2019-09-28 22:36:10 UTC
(In reply to Martin Steigerwald from comment #15)
> Kai, thank you very much for your work on this! About an alternative to
> LMDB… I am not sure at the moment. Will think about it.

Currently with my patches it seems mostly fine with LMDB. The next days of usage will draw a better picture. I think baloo needs to handle LMDB just a little bit more optimized, i.e. does it use batched writes?
Comment 18 Kai Krakow 2019-09-28 22:40:07 UTC
(In reply to Martin Steigerwald from comment #16)
> I bet I can just compile baloo via kdesrc. Does it have many dependencies on
> KF libraries? I do not like to break my system at this point in time and it
> has been a long time I last used kdesrc.

I have no idea. I'm on Gentoo and just did "git format-patch origin/master", then symlinked my baloo Git src to /etc/portage/patches/kde-frameworks/baloo. So at least in Gentoo, that's a single package. Here's the deps from Gentoo:

DEPEND="
        $(add_frameworks_dep kconfig)
        $(add_frameworks_dep kcoreaddons)
        $(add_frameworks_dep kcrash)
        $(add_frameworks_dep kdbusaddons)
        $(add_frameworks_dep kfilemetadata)
        $(add_frameworks_dep ki18n)
        $(add_frameworks_dep kidletime)
        $(add_frameworks_dep kio)
        $(add_frameworks_dep solid)
        $(add_qt_dep qtdbus)
        $(add_qt_dep qtdeclarative)
        $(add_qt_dep qtgui)
        $(add_qt_dep qtwidgets)
        >=dev-db/lmdb-0.9.17
"

I could attach my binary package if you want to try (it's basically a tar.gz, you could extract just the binaries and libs).
Comment 19 Kai Krakow 2019-10-07 15:16:16 UTC
After testing this a few days, with my patches it works flawlessly: No performance impact, krunner finds result immediately without thrashing the HDD, etc. That is, until you reboot: While with the patches it has no longer any perceived negative impact on desktop responsiveness, I see that Baloo still re-indexes all files.

Steps to reproduce:

1. Start baloo for the first time (aka "remove your index")
2. Let it complete a full indexing cycle
3. Observe "balooctl indexSize" to see that there's no file left to index
4. Also observe that it has data in "ContentIndexingDB"
5. Reboot
6. Observe "balooctl indexSize" to see that "ContentIndexingDB" changed back
   to 0 bytes
7. Observe "balooctl status" and wait until it finished checking for missing
   files
8. Observe both commands to see how baloo now refills the ContentIndexingDB
   and adds all your files to the index again, resulting in double the amount
   of files after finishing the second run
9. From now on, on every reboot, the index only grows, constantly forgetting
   the ContentIndexDB and re-adding all files.

This behavior is with and without my patches.

@Martin Is this what you've seen, too?
Comment 20 Martin Steigerwald 2019-10-07 18:41:10 UTC
Dear Kai,

(In reply to Kai Krakow from comment #19)
> After testing this a few days, with my patches it works flawlessly: No
> performance impact, krunner finds result immediately without thrashing the
> HDD, etc. That is, until you reboot: While with the patches it has no longer
> any perceived negative impact on desktop responsiveness, I see that Baloo
> still re-indexes all files.
> 
> Steps to reproduce:
> 
> 1. Start baloo for the first time (aka "remove your index")
> 2. Let it complete a full indexing cycle
[…]
> This behavior is with and without my patches.
> 
> @Martin Is this what you've seen, too?

I did not verify every step exactly as you did, but I have seen in the last one or two weeks that Baloo re-indexes all files several times. For me it did not happen on every reboot, but sometimes. I did not recognize this behavior this clearly before, as I had Baloo disabled as it consumed too much memory. I enabled Baloo on my main laptop again as KDE Frameworks 5.62 entered Debian unstable, cause I read it contains a lots of improvements and I actually really found that Baloo indexes fewer files. I have read about a size limits and better filters for files with certain filename endings.

I did not yet manage to test with your patches, but I can certainly confirm that Baloo from KDE Frameworks 5.62 re-indexes all files. Shall I open a new bug report about that, if not already there?
Comment 21 Kai Krakow 2019-10-08 11:53:30 UTC
@Martin

I think your bug report is already really about this issue: Re-indexing all files over and over again and consuming a lot of memory and IO that way.

The performance aspects of this are already covered by another bug report:
https://bugs.kde.org/show_bug.cgi?id=356357
Comment 22 Martin Steigerwald 2019-10-08 18:44:45 UTC
@Kai: Fine with me. I don't care all that much which bug report is about which problem. Do you have any idea why it re-indexes all the files already?
Comment 23 Kai Krakow 2019-10-08 18:48:49 UTC
I'll have a look at that soon. First I'd like to get the "Reduce stack pressure" patch upstreamed (also as a learning curve because this is my first KDE contribution). I've uploaded it to Phabricator and already received review feedback.

I have some ideas what may be causing this problem (loss of ContentIndexingDB) which in turn is probably the immediate cause of DB growing and increasing IO pressure.
Comment 24 Nate Graham 2019-10-09 14:38:18 UTC
Nice work. Would you be interested in submitting the other patches on phabricator too?
Comment 25 Kai Krakow 2019-10-09 15:00:08 UTC
Yes, that's my plan. But I'd like to refine them a bit first. Especially turning fsync() off seems to involve a big controversy discussion about whether this should be done or not. So I will research the side effects a little more and try to come up with a better solution.

FWIW, I'd like to submit a read-ahead patch next (fixes low-memory problems), and then I'd like to fix the problem that ContextIndexingDB becomes dropped for some reason which is, at least for Martin and me, the most prominent origin of always increasing IO-pressure and memory usage. Later I'll look into reducing the transaction commits dynamically which should relax the fsync problem, and maybe even sizing the mmap dynamically (which should me possible according to some first research).
Comment 26 Kai Krakow 2019-10-09 15:05:14 UTC
Comment on attachment 122918 [details]
Reduce stack pressure

Stack pressure patch obsoleted by https://phabricator.kde.org/D24502
Comment 27 Kai Krakow 2019-10-11 18:21:44 UTC
Comment on attachment 122925 [details]
Prepare simpler coding of environment flags

Has been merged into master
Comment 28 Kai Krakow 2019-10-11 20:30:51 UTC
(In reply to Kai Krakow from comment #19)
> After testing this a few days, with my patches it works flawlessly: No
> performance impact, krunner finds result immediately without thrashing the
> HDD, etc. That is, until you reboot: While with the patches it has no longer
> any perceived negative impact on desktop responsiveness, I see that Baloo
> still re-indexes all files.
> 
> Steps to reproduce:
> 
> 1. Start baloo for the first time (aka "remove your index")
> 2. Let it complete a full indexing cycle
> 3. Observe "balooctl indexSize" to see that there's no file left to index
> 4. Also observe that it has data in "ContentIndexingDB"
> 5. Reboot
> 6. Observe "balooctl indexSize" to see that "ContentIndexingDB" changed back
>    to 0 bytes
> 7. Observe "balooctl status" and wait until it finished checking for missing
>    files
> 8. Observe both commands to see how baloo now refills the ContentIndexingDB
>    and adds all your files to the index again, resulting in double the amount
>    of files after finishing the second run
> 9. From now on, on every reboot, the index only grows, constantly forgetting
>    the ContentIndexDB and re-adding all files.
> 
> This behavior is with and without my patches.
> 
> @Martin Is this what you've seen, too?

Following up with more details:

The problem seems to be the following:

After reboot, the indexer finds all files as changed. For every file in the index, it will log to stdout/stderr:

"path to file" id seems to have changed. Perhaps baloo was not running, and this file was deleted + re-created

This results in a lot of transactions to the database, blowing it up in size (due to its lock-free implementation it appends to the database before freeing the old data) and creating high IO pressure due to a lot of fsync calls going on in short succession.

What follows: After removing all the seemingly changed files from the database, it re-indexes all those files. This in turn appends to the database again it seems, probably because its unlikely to find big enough "holes" at the beginning of the database: Although a lot of data has been removed, it has probably been filled with meta data updates, leaving no space to put back in the content index data.

This access patterns adds up after some time, spreading out data more and more, leading to very random access patterns. The kernel will start to struggle with the mmap because it constantly swaps in new pages: Due to the random and wide spread access patterns, it becomes more and more less likely that memory pages are already swapped in. Access behavior becomes more and more seeky, the database contents could be said to be too fragmented. This will introduce high desktop latency because baloo will start to dominate the cache with its mmap. After all, we should keep in mind that LMDBs design is made for systems primarily running only the database, but not mixed with desktop workloads.

The phabricator site already has some valuable analysis and ideas of this which I collected here: https://phabricator.kde.org/T11859

Thinks that currently help here a lot:

  - Remove fsync from baloo via patch (this seems to have the biggest impact)
  - Limiting the working set memory baloo can use by using cgroups

Removing fsync from baloo could mean that the database is not crash-safe. Thus, I suggest to not use my fsync patch upstream until extensive testing of such situations (I do not know how to do that, it's a tedious task depending on a vast amount of factors) or until someone comes up with some clever recovery/transaction idea. Maybe the LMDB author has some more insight on this.

Limiting memory with cgroups helps because cgroups can limit RAM usage by accounting for both heap and cache usage: It effectively hinders baloo from dominating the cache and thus impacting desktop performance too much. The read-ahead patch reduces additional pressure on the cache occupancy. It should also be possible to use madvise()/fadvise() to actively instruct the kernel that baloo no longer uses some memory or doesn't plan on doing so in the future. I'm not sure if baloo and/or LMDB use these functions or how they use them.

Also, I wonder if LMDB uses MAP_HUGETLB. It may be worth checking if flipping this setting improves or worsens things because I can think of different scenarios:

 1. hugetlb uses 2 MB page size which could reduce the IOPS needed to work
    on spatially near data in the DB (good)
 2. 2 MB page size could increase the IO throughput needed when paging DB data
    in, thus negatively impacting the rest of the system (bad)
 3. LMDB should grow the database in bigger chunks to reduce external
    fragmentation of the index file, hugetlb could help that (undecided)
 4. hugetlb with very random, spread-out access patterns could increase the
    memory pressure (bad)
 5. 4k pages with very random access patterns could reduce memory pressure
    (good)
 6. hugetlb would improve sequential access patterns by reducing IOPS pressure
    (undecided)
 7. hugetlb would reduce TLB lookups in the processor, said to have an up
    to 10% performance improvement of memory intensive workloads (good)
 8. hugetlb can introduce allocation stalls which leads to very perceivable
    lags in desktop performance because the kernel more likely has to
    defragment memory to huge page allocations (bad)
 9. There are system out there that support 1GB page size, we definitely don't
    want that - it would effectively lock the whole DB into memory (bad)

Maybe, and we can see discussions around this topic in phabricator, it makes sense to put more effort in designing the database scheme and access patterns more around on of the above scenarios and optimize for on of these, whichever fits better. The current consensus seems it isn't much optimized around any design pattern - it just does it's thing. And of course fix some bugs like the one discarding all the contents on each reboot. But that seems to be tightly coupled with some of the design decisions that went into the database scheme.

Also, i.e. in the bees project (https://github.com/Zygo/bees), the author found that using mmap without memlock has very bad performance on a system busy with other tasks. So he decided to lock anonymous memory, and use a writeback thread which writes data back in big enough chunks to not introduce too much fragmentation. LMDB currently does a similar thing: Writing to the database is not done through the mmap. But memory isn't locked. But locking memory isn't an option here as the DB potentially gets bigger than what the systems RAM size is.

In this light, I wonder if it's possible to LMDB (or some other DB engine) to be mmap based but use some sort of journal: Locking the journal into memory (mmap-backed) and using a writeback thread writing the journal to the DB at regular intervals could work very well. It would double the amount of data written, tho. Baloo already seems to follow a similar idea: It batches updates into transactions of 40 files. But that approach is not optimal from various perspectives. I'd like to work on this aspect next.
Comment 29 Kai Krakow 2019-10-11 21:07:20 UTC
During the re-index job after reboot, baloo shows very strange behavior when evaluating the indexSize:

$ balooctl indexSize
File Size: 6,83 GiB
Used:      25,48 MiB

           PostingDB:   1.018,96 MiB  3998.360 %
          PositionDB:       1,73 GiB  6957.158 %
            DocTerms:     909,35 MiB  3568.271 %
    DocFilenameTerms:     131,17 MiB   514.700 %
       DocXattrTerms:       4,00 KiB     0.015 %
              IdTree:      31,09 MiB   121.980 %
          IdFileName:     107,08 MiB   420.172 %
             DocTime:      67,19 MiB   263.642 %
             DocData:      67,14 MiB   263.473 %
   ContentIndexingDB:       3,99 MiB    15.650 %
         FailedIdsDB:            0 B     0.000 %
             MTimeDB:      12,53 MiB    49.172 %


Also, "File Size" increases with every reboot while "Used" stays around the same when compared after baloo has decided to idle.
Comment 30 Martin Steigerwald 2019-10-11 21:15:34 UTC
(In reply to Kai Krakow from comment #28)
> (In reply to Kai Krakow from comment #19)
[…]
> Following up with more details:
> 
> The problem seems to be the following:
> 
> After reboot, the indexer finds all files as changed. For every file in the
> index, it will log to stdout/stderr:
> 
> "path to file" id seems to have changed. Perhaps baloo was not running, and
> this file was deleted + re-created

Kai, please see my comment #2 of this bug report:

https://bugs.kde.org/show_bug.cgi?id=404057#c2

I got exactly the same message. Just have been rereading what I wrote as I have not been aware of it anymore.

So yes, that is indeed an important issue here. I believe this bug is at least about two or three independent issues, but as you told, let's have it about the re-indexing files thing. I bet getting rid of needlessly re-indexing files will be one of the most effective to sort out performance issues with Baloo. I changed bug subject accordingly.
Comment 31 Kai Krakow 2019-10-11 21:27:29 UTC
As far as I understand from researching the discussions in phabricator, this problem won't be easy to fix as it is baked into the design decision that defined the database scheme.

Based on the fact that the DocId (which is used to find if a file still exists, or changed, or just moved unmodified) is a 64-bit value which is actually made of a 32-bit st_dev (device id) and 32-bit ino (inode number), I see two problems here:

1. btrfs uses 64-bit inode numbers, at some point, numbers will overflow and the baloo database becomes confused.

2. multi-dev btrfs (and I think you also use that, as I do) may have an unstable st_dev number across reboots, resulting in changed DocIds every once in a while after reboot

The phabricator discussions point out that other (primarily network) file systems suffer the same problem: They have unstable st_dev values maybe even after reconnect. User space file systems even depend on mount order for the st_dev value.

Changing this is no easy task, it would require a format change which either invalidates your DB, or it needs migration. So I'm currently evaluating if it makes sense to switch to a key/value store that doesn't rely on mmap (as it has clear downsides on a typical desktop system). This would allow to easily change the database scheme in the same step as the index would've to be recreated anyways. I'm currently digging my nose into Facebook's RocksDB, it looks mostly good except that it was optimized solely around flash-based storage.
Comment 32 Kai Krakow 2019-10-11 21:43:14 UTC
(In reply to Martin Steigerwald from comment #30)
> I believe this bug is at
> least about two or three independent issues, but as you told, let's have it
> about the re-indexing files thing. I bet getting rid of needlessly
> re-indexing files will be one of the most effective to sort out performance
> issues with Baloo. I changed bug subject accordingly.

That's why I started a task to better distinguish between which problem is what in Phabricator:
https://phabricator.kde.org/T11859

And that's why I suggested a few comments ago to concentrate on fixing the re-indexing bug first.

But it turned out to be not that easy. Together with the other performance issues and especially the tight coupling of mmap, access patterns, and available memory, I think it's worth rethinking if LMDB is still the right tool:

mmap introduces a lot of problems and there's no easy way around it. There's to many things to think of to optimize access patterns then. It's unlikely to be done anytime when looking at the problems exposed by the database scheme already.

LMDB seems to be designed around the idea to be the only user of system RAM, or at least only use a very smallish part of it (which may not be that small if you have huge amounts of RAM). That's unlikely to be the situation on systems where baloo is used.

Bad design choices have already been made and been meshed deeply into the database scheme, which makes it difficult to migrate existing installations.

BTW: Following up on your comment #2 was totally unintentional but actually I followed up and explained why that happens. ;-)
Comment 33 Kai Krakow 2019-10-11 21:46:19 UTC
Also, LMDB is totally the wrong tool when using 32-bit systems because your index cannot grow beyond a certain size before crashing baloo.

I'm not sure if 32-bit systems are still a thing - but if they are, the decision for LMDB was clearly wrong.
Comment 34 Martin Steigerwald 2019-10-11 21:53:43 UTC
I would not worry all that much about redoing the database for a format change. In the end what we have now that it re-indexes anyway. And yes, I have a multi device BTRFS. Actually a BTRFS RAID 1.

I wonder whether Baloo would be better off by using filesystem UUID together with 64 bit inode number as an identifier. However using a complete filesystem UUID may need too much storage, I don't know.

Another idea would be to mark each indexed file with an kind of ID or timestap using extended attributes. However, that might get lost as well and it won't work on filesystems not supporting those.

A third idea would be to write an 32 bit identifier as a filesystem ID into a hidden file below the root directory of the filesystem or a hidden sub directory of it. This would at least avoid using an identifier that could change. It would not solve the 32 bit for inode number not enough for storing a 64 bit inode number issue. However this might be a change that might be easiest to apply short term.
Comment 35 Martin Steigerwald 2019-10-11 21:58:19 UTC
I like the idea to use one DB per filesystem. This way you can save the complete filesystem UUID and/or other identifying information *once* and use the full 64 bit for the inode number thing.
Comment 36 Kai Krakow 2019-10-11 21:58:59 UTC
Oh, nice... Sometimes it helps to talk about a few things.

I could think of the following solution:

Add another UUID->CounterID mapping table to the database, that is easy to achieve. Everytime we encounter a new UUID, we increase the CounterID one above the maximum value in the DB and use that as a file system identifier.

We can now bit-reverse the CounterID so that the least-significant bits switch position with the highest. The result will be XOR'ed with the 64-bit inode number. Et voila: There's a DocID.

What do you think?
Comment 37 Martin Steigerwald 2019-10-11 22:04:01 UTC
It is probably too late for me or I am not into programming enough at the moment, to fully understand what you propose. :) May be something to bring to the suitable Phabricator tasks, maybe

Overhaul Baloo database scheme: https://phabricator.kde.org/T9805
Comment 38 Kai Krakow 2019-10-11 22:04:26 UTC
(In reply to Martin Steigerwald from comment #35)
> I like the idea to use one DB per filesystem. This way you can save the
> complete filesystem UUID and/or other identifying information *once* and use
> the full 64 bit for the inode number thing.

Such things are always difficult (the same for your hidden ID file in the root of an FS): It requires permissions you may not have, thus you may decide to use the most top-level directory you have write permissions to. And at that point I'd define that the storage path is undefined.

Other solution: Name the index files by UUID and store it at a defined location. We'll have other problems now:

Do you really want multiple multi-GB mmaps LMDB files mapped at once in RAM? With even more chaotic random access patterns (and their potential to push your precious cache out of memory)? Also, multi-file databases are hard to sync with each other: At some point we may need to ensure integrity between all the files. This won't end well. I'm all in for a single-DB approach.
Comment 39 Martin Steigerwald 2019-10-12 08:39:39 UTC
Good morning Kai. Overnight I think I got your idea about UUID->CounterID mapping table. Sounds like an approach that can work. About… the several databases thing… I get the disadvantages you mention. Of course it could PolicyKit to ask for permission to create a folder… however… usually Baloo is also *per* user, not *per* filesystem. I have two users with different Baloo databases on my laptop. One withing ecryptfs layered on top of BTRFS.
Comment 40 Kai Krakow 2019-10-12 13:38:59 UTC
Further research confirms: btrfs has unstable device ids because it exposes subvolumes as virtual block devices without their own device node in /dev. Thus, device id numbers are allocated dynamically at runtime from the kernel, the same happens for NFS and FUSE file systems. The latter usually even have unstable inode numbers.

So currently, baloo is actually only safe to use on ext2/3/4 and xfs as the most prominent examples. On many other filesystems it will reindex files, and will even be unable to return proper results because reverse mapping of DocIds to filesystem paths is unreliable.

This problem is deeply baked into the design of Baloo.

My idea of merging device id and ino into one 64-bit integer wouldn't needing much modification to the existing code/storage format in theory. But apparently this would make using the reverse mapping impossible because the functions couldn't extract the device id from the docid to convert it back to a mount path.

Additionally, btrfs shares the UUID between all subvolumes so using it would make duplicate inode numbers which would confuse baloo. Btrfs has UUID_SUB instead.

After all, it seems we'd need a specialized GUID generator per filesystem type. Since GUID formats may differ widely, I'd suggest to create a registry table inside the database, much similar to my counted ID idea outlined above. Each new GUID would be simply registered in the table with a monotonically increasing number which can be used as the device ID for DocId. Still, we'd need to expand the DocId but temporarily it would do.
Comment 41 tagwerk19 2021-04-28 11:55:52 UTC
(In reply to Kai Krakow from comment #40)
> Further research confirms: btrfs has unstable device ids because it exposes
> subvolumes as virtual block devices without their own device node in /dev.
This has resurfaced in Bug 402154.
There are several reports related to openSuSE - BTRFS with multiple subvols....
    https://bugs.kde.org/show_bug.cgi?id=402154#c12
Comment 42 Joachim Wagner 2022-01-12 17:15:24 UTC
In addition to btrfs's mount-time allocated device numbers, the device numbers of dm-crypt devices can also be an issue for users with multiple such devices as the device minor numbers are not stable across restarts.

(I assume the numbers depend on the timing of luksOpen for each device, further assuming backing devices are probed in parallel. This may be specific to the Linux distribution and whether the same passphrase is used for the devices. When I find the time, I'll create new Luks key slots with substantially different iter-time to test this.)