Hi, for unattended reason, my system was slow. I checked and found baloo_file_extractor was consuming 1.2Go memory !!!! => baloo indexing is now disabled on my system....(furthermore, I do not need it .... ) It should be possible to act on memory / indexing settings. Regards Steph Reproducible: Always
Same for (just it could eats more than 3GB memory), I believe that bug: https://bugs.kde.org/show_bug.cgi?id=332421 — should be reopened.
I'm afraid a generic "consumes too much memory" doesn't give us much information on how to fix this. This bug is very specific to a kind of file which was being indexed. It could even be a bug in the underlying library used to fetch the metadata from the file. Please reopen the bug if you're willing to provide more information. Relevant info which could be useful - 1. Try reproducing the issue with a fresh index (balooctl disable && balooctl enable) and a good way to start. 2. Try excluding some folders which are being indexed, and possibly try and track the file down.
Maybe you are not using your computer to store files .... take a whole directory (almost 1g and so) of files with pictures (raw photos, jpg high and low densite) ,svg files, libreoffice files, development c/c++ files , eg android source code and so on ... a thunderbird imap account with more than 1000 messages . There are lot of files generated , that we can not master in our computer. I mean a classic developer computer..... you may have not tried using 1gb and more files. This should easily trigger some problems
(In reply to Vishesh Handa from comment #2) > 2. Try excluding some folders which are being indexed, and possibly try and > track the file down. This doesn't help. Even diabling all home doesn't imply on baloo_file_extractor memory eating. > 1. Try reproducing the issue with a fresh index (balooctl disable && > balooctl enable) and a good way to start. This looks help, at least now baloo_file_extractor consume reasonable ammount of RAM. So ATM it looks like some old index files bug. I'll send more information if it'll begin to eat memory again.
This is an old bug, but after the recent upgrade to Frameworks 5.80.0, Baloo gobbles up a tremendous amount of memory as it re-indexes all files in $HOME folder. It doesn't appear to free memory as it indexes. My system becomes laggy and unresponsive. STEPS TO REPRODUCE 1. You may need a $HOME directory with a lot of files. My personal case is 318GB of data, ~1M files, ~76K subdirectories, of all types of documents, audio, video. 2. Upgrade to Frameworks 5.80.0, reboot. 3. Upon logging in, Baloo will want to re-index all files. Use "System Monitor" to observe baloo_file_extractor and baloo_file. OBSERVED RESULT Notice that the "Shared Memory" attribute of baloo_file_extractor continuously rises, not staying steady or falling. The "Memory" attribute is similary high at 1G. In my case Shared Memory gets easily gets to 3.4G and my swap file becomes 5.4G, with no other apps running. EXPECTED RESULT Memory usage of Baloo should not keep rising and affecting the swap file. Memory usage should be constant. SOFTWARE/OS VERSIONS Operating System: KDE neon 5.21 KDE Plasma Version: 5.21.2 KDE Frameworks Version: 5.80.0 Qt Version: 5.15.2 Kernel Version: 5.4.0-67-generic OS Type: 64-bit Graphics Platform: X11 Memory: 7.7 GiB of RAM ADDITIONAL INFORMATION I'm attaching some screenshots of System Monitor and htop. Notice the memory, CPU and swap usage. They all are very high and as a side-effect, my laptop is not responsive. I had to let it run overnight to finish indexing, but even then, when I rebooted, it wanted to re-index everything again(!).
Created attachment 136761 [details] System Monitor showing baloo processes
Created attachment 136763 [details] htop showing baloo processes
Created attachment 136764 [details] System Monitor showing baloo processes as a history graph
For me baloo has been acting up every now and then, and I'd like to finally get to the bottom of this. The behavior is currently catatonic and after posting this issue I will kill baloo but not change any configuration or data, so the behavior can be reproduced if someone wants to continue research on this issue. 1. baloo_file_extractor takes a lot of CPU and memory. Here's its line in htop: ----8<---- PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 74566 odeda 39 19 257G 10.1G 9006M S 102. 32.2 8h02:42 /usr/bin/baloo_file_extractor ----8<---- (its a 4 core system, CPU usage looks like a single thread that tries to take up an entire CPU but is slowed down a bit by IO on my fast NVME and the "over 100%" is a sampling error on the part of htop) 2. `balooctl monitor` shows almost no activity, and from time to time bursts of a couple dozen entries that look like this: ----8<---- Indexing: /home/odeda/.cache/mozilla/firefox/i1m74zv1.default/cache2/entries/CE2BB927E036CFCEE27E7795DFB198E7C41A14B6: Ok ----8<---- It should not be indexing `~/.cache` as `~/.config/baloofilerc` has this: exclude folders[$e]=$HOME/.cache/,$HOME/mnt/,$HOME/snap/,[and a few other things] The weird excluded folder behavior may has something to do with the fact I have a trailing slash on my $HOME: ----8<---- $ balooctl config show excludeFolders kf.baloo: Folder cache: std::vector("/home/odeda//.cache/": excluded, "/home/odeda//snap/": excluded, "/home/odeda//mnt/": excluded, "/home/odeda/": included) /home/odeda//.cache/ /home/odeda//snap/ /home/odeda//mnt/ ----8<---- 3. The index file is huge - about 19GB, which doesn't make a lot of sense to me. `balooctl indexSize` has this to say: ----8<---- File Size: 18.75 GiB Used: 948.13 MiB PostingDB: 2.93 GiB 316.627 % PositionDB: 85.44 MiB 9.011 % DocTerms: 1.39 GiB 149.920 % DocFilenameTerms: 152.72 MiB 16.107 % DocXattrTerms: 8.39 MiB 0.885 % IdTree: 35.69 MiB 3.764 % IdFileName: 175.18 MiB 18.476 % DocTime: 92.85 MiB 9.793 % DocData: 43.49 MiB 4.587 % ContentIndexingDB: 448.00 KiB 0.046 % FailedIdsDB: 0 B 0.000 % MTimeDB: 26.48 MiB 2.793 % ----8<---- and to that I can only say "wahhh?!?!?" Here's also `balooctl status`: ----8<---- Baloo File Indexer is running Indexer state: Indexing file content Total files indexed: 2,103,903 Files waiting for content indexing: 6,832 Files failed to index: 0 Current size of index is 18.75 GiB ----8<----
> 3. The index file is huge - about 19GB, which doesn't make a lot of sense to > me. `balooctl indexSize` has this to say: > > ----8<---- > File Size: 18.75 GiB > Used: 948.13 MiB > > PostingDB: 2.93 GiB 316.627 % > PositionDB: 85.44 MiB 9.011 % > DocTerms: 1.39 GiB 149.920 % > DocFilenameTerms: 152.72 MiB 16.107 % > DocXattrTerms: 8.39 MiB 0.885 % > IdTree: 35.69 MiB 3.764 % > IdFileName: 175.18 MiB 18.476 % > DocTime: 92.85 MiB 9.793 % > DocData: 43.49 MiB 4.587 % > ContentIndexingDB: 448.00 KiB 0.046 % > FailedIdsDB: 0 B 0.000 % > MTimeDB: 26.48 MiB 2.793 % > ----8<---- > > and to that I can only say "wahhh?!?!?" After reviewing the code at https://github.com/KDE/baloo/blob/master , I'm more befuddled by the above numbers: 1. "Used" is `DatabaseSize.expectedSize` 2. The percentages are computed by 100 * "entry size" / "Used", so the 316% makes sense as it is larger than "Used". 3. `DatabaseSize.expectedSize` is calculated (src/engine/transaction.cpp:474) by adding up the sizes of all of the entries listed!! so it cannot be smaller than the sum of its parts, unless one of the parts is negative - which it can't be as the sizes are of type `size_t`, which - unless something really weird is going on in the build server - should be unsigned long int. There's something about page sizes, but that isn't relevant to the above calculation which seem to suggest that a/(a+b) > 1 where both a and b are non-negative integers. BTW - here's the result of running the `mdb_stat` tool from lmdb-utils on the baloo index: ----8<---- $ mdb_stat -af <path-to-index-db> Freelist Status Tree depth: 2 Branch pages: 1 Leaf pages: 41 Overflow pages: 5046 Entries: 3253 Free pages: 2566315 Status of Main DB Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 12 Status of docfilenameterms Tree depth: 4 Branch pages: 315 Leaf pages: 38726 Overflow pages: 0 Entries: 2104603 Status of docterms Tree depth: 4 Branch pages: 633 Leaf pages: 79407 Overflow pages: 284028 Entries: 2103699 Status of documentdatadb Tree depth: 3 Branch pages: 90 Leaf pages: 11012 Overflow pages: 38 Entries: 664790 Status of documenttimedb Tree depth: 3 Branch pages: 187 Leaf pages: 23555 Overflow pages: 0 Entries: 2111124 Status of docxatrrterms Tree depth: 3 Branch pages: 21 Leaf pages: 2040 Overflow pages: 86 Entries: 31253 Status of failediddb Tree depth: 0 Branch pages: 0 Leaf pages: 0 Overflow pages: 0 Entries: 0 Status of idfilename Tree depth: 4 Branch pages: 363 Leaf pages: 44411 Overflow pages: 0 Entries: 2120309 Status of idtree Tree depth: 3 Branch pages: 52 Leaf pages: 6960 Overflow pages: 2118 Entries: 223613 Status of indexingleveldb Tree depth: 3 Branch pages: 3 Leaf pages: 49 Overflow pages: 0 Entries: 5471 Status of mtimedb Tree depth: 3 Branch pages: 42 Leaf pages: 6719 Overflow pages: 0 Entries: 2111124 Status of positiondb Tree depth: 4 Branch pages: 6657 Leaf pages: 735531 Overflow pages: 328761 Entries: 42876611 Status of postingdb Tree depth: 4 Branch pages: 6181 Leaf pages: 657348 Overflow pages: 105167 Entries: 45851508 ----8<----
(In reply to Oded Arbel from comment #10) > $ mdb_stat -af <path-to-index-db> > Freelist Status > ... > Free pages: 2566315 If it says 2566315 free pages (and a page is 4K?), that's a lot of space in the file not being used. Have you tried copying the index with mdb_copy? I've just tried mdb_copy -n -c index index.copy It certainly seems to think for a while but the index.copy was smaller by 'more or less' the count of the free pages.
(In reply to tagwerk19 from comment #11) > (In reply to Oded Arbel from comment #10) > > $ mdb_stat -af <path-to-index-db> > > Freelist Status > > ... > > Free pages: 2566315 > If it says 2566315 free pages (and a page is 4K?), that's a lot of space in > the file not being used. > > Have you tried copying the index with mdb_copy? > > I've just tried > mdb_copy -n -c index index.copy > It certainly seems to think for a while but the index.copy was smaller by > 'more or less' the count of the free pages. Shouldn't baloo "auto trim" the index by itself? This is not something a user would know to do. Also - doesn't explain the weird percentages.
(In reply to Oded Arbel from comment #12) > Shouldn't baloo "auto trim" the index by itself? This is not something a > user would know to do. Also - doesn't explain the weird percentages. I'm reading http://www.lmdb.tech/doc/ Looks like if the database has 'grown' is does not shrink. Free pages are however reused. Question is whether this has an impact on performance...
(In reply to Oded Arbel from comment #9) > The weird excluded folder behavior may has something to do with the fact I > have a trailing slash on my $HOME: > > ----8<---- > $ balooctl config show excludeFolders > kf.baloo: Folder cache: std::vector("/home/odeda//.cache/": excluded, > "/home/odeda//snap/": excluded, "/home/odeda//mnt/": excluded, > "/home/odeda/": included) > /home/odeda//.cache/ > /home/odeda//snap/ > /home/odeda//mnt/ > ----8<---- Oooh. Indeed. If I "bend things" so I have a trailing slash in my $HOME, the include/exclude folders lines (for subfolders) in baloofilerc stop working. If I include folders[$e]=$HOME then a exclude folders[$e]=$HOME/.cache/ doesn't work If I want to index a set of subfolders, folders[$e]=$HOME/Documents/,$HOME/Music/,$HOME/Pictures/,$HOME/Videos/ doesn't work. It's not going to catch many people but it's probably worth reporting as a separate bug.