Baloo has been running on my desktop system for 5 days and I still see no significant progress. Once the file indexer is running, the desktop can no longer be used. No response time. No mouse pointer. Dead. So currently, I have to run the desktop search every night. Performance comparison ==================== Yesterday: 122.834 Today: 120.836 Difference: 1998 1998 files in 8 hours? That's very slow! balooctl status ==================== Baloo File Indexer is running Indexer state: Idle Total files indexed: 368.400 Files waiting for content indexing: 120.836 Files failed to index: 0 Current size of index is 44,92 GiB
(In reply to sourcemaker from comment #0) > balooctl status > ... > Current size of index is 44,92 GiB Which I suspect is more than your RAM... See what "balooctl indexSize" says, particularly if there's a big difference between "File Size" and "Used". I think you are not using BTRFS? I seem to remember you were using arch... I know there are times (when one process is reading the index while the indexer is writing), that the index size can explode. There is a LMDB utility for copying/compressing the index, see: http://www.lmdb.tech/doc/man1/mdb_copy_1.html Anecdotal experience is that "it worked a few times for me", your mileage may of course vary...
balooctl indexSize ============== File Size: 47,23 GiB Used: 2,50 GiB PostingDB: 2,03 GiB 81.320 % PositionDB: 3,13 GiB 125.602 % DocTerms: 1,25 GiB 49.970 % DocFilenameTerms: 24,62 MiB 0.963 % DocXattrTerms: 4,00 KiB 0.000 % IdTree: 4,73 MiB 0.185 % IdFileName: 26,76 MiB 1.047 % DocTime: 13,90 MiB 0.544 % DocData: 7,43 MiB 0.291 % ContentIndexingDB: 3,25 MiB 0.127 % FailedIdsDB: 0 B 0.000 % MTimeDB: 6,21 MiB 0.243 % Memory ======= 16 GB
(In reply to sourcemaker from comment #2) > File Size: 47,23 GiB > Used: 2,50 GiB I think you've not a lot to lose by trying the mdb_copy, I'm not sure what the arch package is but "dnf install lmdb" works on Fedora, then pkill baloo cd ~/.local/share/baloo mdb_copy -n -c index index.new and wait... When finished rename the files to swap them. You should see "balooctl status" and "balooctl indexSize" taking info from the compressed index. Good luck
mdb_copy -n -c index index.new ========================= File Size: 26,50 GiB Used: 2,50 GiB PostingDB: 2,03 GiB 81.320 % PositionDB: 3,13 GiB 125.602 % DocTerms: 1,25 GiB 49.970 % DocFilenameTerms: 24,62 MiB 0.963 % DocXattrTerms: 4,00 KiB 0.000 % IdTree: 4,73 MiB 0.185 % IdFileName: 26,76 MiB 1.047 % DocTime: 13,90 MiB 0.544 % DocData: 7,43 MiB 0.291 % ContentIndexingDB: 3,25 MiB 0.127 % FailedIdsDB: 0 B 0.000 % MTimeDB: 6,21 MiB 0.243 %
(In reply to sourcemaker from comment #4) > mdb_copy -n -c index index.new > ========================= > File Size: 26,50 GiB > Used: 2,50 GiB Hmm... Not as much space recovered as I hoped. I am guessing that won't help you much... There's a behaviour with LMDB that if one process is reading the index when another wants to write, the data written is appended. It's there to help "crash proof" the index. You might meet this in a baloo context if you do "balooctl status" as this counts the files "indexed" and "to be done". If you are counting a large number of files and indexing at the same time you might fall into the trap. I met it after deleting some thousands of files, have a look at Bug 437754 I wonder if your next step is to reindex from scratch, keeping an eye on progress with "balooctl monitor"; maybe restricting the directories you are interested in, at least initially. A common compromise is Documents, Music, Pictures, Videos
I hope there are updates soon. In the current version it is unfortunately a waste of time.
(In reply to sourcemaker from comment #6) > In the current version it is unfortunately a waste of time. We don't know, in your case, what triggered the index file size to be so much larger than the "used", the matching up to Bug 437754 is something of a guess. However if this is an issue (when content indexing as well as when doing bulk deletes), maybe baloo_file_extractor could "hold off" committing a transaction if there's another process reading. No idea whether there's a practical way of doing this, it would need someone with pretty deep knowledge of the baloo code and LMDB to be able to say.
Are there any news about this problem?
(In reply to sourcemaker from comment #8) > Are there any news about this problem? There's been a change here: https://invent.kde.org/frameworks/baloo/-/merge_requests/124 that makes use of the ability to limit memory usage that systemd gives. The change is pretty aggressive, limiting memory usage to 512M. There's a follow on change here: https://invent.kde.org/frameworks/baloo/-/merge_requests/148 that fixes one of the problems that constraining the memory triggers. My guess about "not usable" is that it's memory (or swap) dependent rather than CPU or IO. I've been setting my limits to 50% RAM and zero swap. Your mileage, as they say, may vary...
Unfortunately Baloo still doesn't work. 16 GB Ram and Baloo doesn't finish.
Operating System: Debian GNU/Linux 12 KDE Plasma Version: 5.27.5 KDE Frameworks Version: 5.103.0 Qt Version: 5.15.8 Kernel Version: 6.1.0-10-amd64 (64-bit) Graphics Platform: X11 Processors: 4 × Intel® Core™ i7-7600U CPU @ 2.80GHz Memory: 15,5 GiB of RAM Graphics Processor: Mesa Intel® HD Graphics 620 Manufacturer: Dell Inc. Product Name: Latitude 7480 System continually seems to run the baloo_file_extractor, quite frustrating so I've just suspended it. balooctl status output: balooctl status kf.i18n: KLocalizedString: Using an empty domain, fix the code. msgid: "Unknown" msgid_plural: "" msgctxt: "" kf.i18n: KLocalizedString: Using an empty domain, fix the code. msgid: "Indexing file content" msgid_plural: "" msgctxt: "" Baloo File Indexer is running Indexer state: Indexing file content Total files indexed: 514 133 Files waiting for content indexing: 187 525 Files failed to index: 0 Current size of index is 8,72 GiB (base) dieter@dell7480:~$ balooctl indexSize File Size: 8,72 GiB Used: 1,32 GiB PostingDB: 2,43 GiB 183.621 % PositionDB: 1,69 GiB 127.993 % DocTerms: 1,09 GiB 82.760 % DocFilenameTerms: 28,98 MiB 2.139 % DocXattrTerms: 0 B 0.000 % IdTree: 7,56 MiB 0.558 % IdFileName: 32,74 MiB 2.417 % DocTime: 19,79 MiB 1.461 % DocData: 8,41 MiB 0.621 % ContentIndexingDB: 4,77 MiB 0.352 % FailedIdsDB: 0 B 0.000 % MTimeDB: 5,80 MiB 0.428 % (base) dieter@dell7480:~$ balooctl suspend
(In reply to dietervdwes from comment #11) > System continually seems to run the baloo_file_extractor, quite frustrating so I've just suspended it. Do you see it indexing? If you run: balooctl monitor does it report files being indexed? Should happen in batches of 40. Could you be running BTRFS? There is a bug where BTRFS discs were mounted with "varying" device numbers, the device number wasn't stable reboot to reboot. Baloo uses a combination of the device number and inode for an internal "ID" for indexed files, if it sees a file "reappear" with a different ID, it thinks it's a new file and it should be indexed again. This caught OpenSUSE people a lot and then Fedora a little. There's a patch on the way. Final thing to try, as mentioned in comment 9, is to run: systemctl status --user kde-baloo.service and see if the Memory (RAM) is being constrained to 512M. This can slow down indexing to a crawl, particularly when baloo starts to swap. There's a balancing act here, I've changed my MemoryHigh to 50% (and MemorySwapMax to 0) with systemctl edit --user kde-baloo.service
(In reply to tagwerk19 from comment #12) > (In reply to dietervdwes from comment #11) > > System continually seems to run the baloo_file_extractor, quite frustrating so I've just suspended it. > Do you see it indexing? If you run: > balooctl monitor > does it report files being indexed? Should happen in batches of 40. > > Could you be running BTRFS? There is a bug where BTRFS discs were mounted > with "varying" device numbers, the device number wasn't stable reboot to > reboot. Baloo uses a combination of the device number and inode for an > internal "ID" for indexed files, if it sees a file "reappear" with a > different ID, it thinks it's a new file and it should be indexed again. This > caught OpenSUSE people a lot and then Fedora a little. There's a patch on > the way. > > Final thing to try, as mentioned in comment 9, is to run: > systemctl status --user kde-baloo.service > and see if the Memory (RAM) is being constrained to 512M. This can slow > down indexing to a crawl, particularly when baloo starts to swap. There's a > balancing act here, I've changed my MemoryHigh to 50% (and MemorySwapMax to > 0) with > systemctl edit --user kde-baloo.service Thanks for the advice @tagwerk19@innerjoin.org! - Using ext4 filesystem on a 500 Gb SSD. - In task manager it seems to use ~6Gb of ram (of 16). - It seems like it did correctly now when the power plug of laptop was removed (which didn't seem to happen previously) and started again when I plugged it in. -It seems to index quite a lot of cache files etc. Will search around if possible to restrict indexing of certain folders, e.g.: Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/00283fc3ee9c5ea7_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/23a15434b2138b9a_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/3db4f9689a74b257_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/e672388e6f5f77e4_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/1eff7d439b2e4b3c_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/c53849efe36a4cc6_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/2acb2a00b0fb6992_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/9c3f563612e461f6_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/9a1a919e044c1354_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/304857e39b157c35_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/b1d3fec74b3f4f98_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/46340e3d2df5b165_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/e0a41d38b2aea0f0_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/327824f1949fc9b3_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/233c36800c5209ff_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/e9cb387796466985_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/814dbb36d4e9b40d_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/4d0b64368efc240a_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/e567041e16f71b7f_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/f5bba5d32579072d_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/da787546514876b6_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Code Cache/js/9ae5a9b974e524b3_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/f4beda8473c0e78b_0: Ok Indexing: /home/dieter/.config/google-chrome/Default/Service Worker/CacheStorage/d55a62ee4934dd0a67863044121c781b05e4f716/f22490e6-d91f-4273-be81-98f26c6966d0/a184eabc3b57ad8c_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/c7da8b20dd325d1a_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/b345c172ded21244_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/a68920e68d520f0e_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/524955e6f54a61fa_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/9dbb1301c3ee55b4_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/b1369a74ae8446f2_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/813fe8f08248c179_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/6a875ccda31c4a0f_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/13f5183ba5587dfc_0: Ok Indexing: /home/dieter/.config/google-chrome/Default/Service Worker/CacheStorage/d55a62ee4934dd0a67863044121c781b05e4f716/f22490e6-d91f-4273-be81-98f26c6966d0/e6c97907d5a7e71c_1: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/bae3495d40a21ebf_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Code Cache/js/fdc4f366870daa4c_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/c9378570e252edac_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/ad197d4d75c10818_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/6ad7d699c9129519_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/c0f2678d6ddba309_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/47636d6f10c1745a_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/d481bbb76bdd6293_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/155fb8b05c934a85_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/f3dd65a9959b6da6_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/52f5c13cb374fd7d_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/16c00a9369d87c23_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/84a7aa4b8d92a5fe_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/d050372f244e475f_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/435344909b381574_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/1aa212e23c789916_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/68831b9972d75863_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/fdea91b5b0c66970_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/3ded238b254e8923_0: Ok Indexing: /home/dieter/.config/google-chrome/Default/Service Worker/CacheStorage/d55a62ee4934dd0a67863044121c781b05e4f716/f22490e6-d91f-4273-be81-98f26c6966d0/e41978e4f01f6216_0: Ok Indexing: /home/dieter/.cache/google-chrome/Default/Cache/Cache_Data/e69d228168e25820_0: Ok
(In reply to dietervdwes from comment #13) > ... It seems like it did correctly now when the power plug of laptop was > removed (which didn't seem to happen previously) and started again when I > plugged it in ... Yes, it should notice when on battery and stop content indexing. It finishes its current batch of files though. > ... It seems to index quite a lot of cache files etc ... Ahhh! yes. If you've configured indexing hidden files/folders you could catch a *lot* of files you don't know about. Have a look at Bug 434705 (even if we didn't find exactly what was happening on one particular case).
I'm currently trying to index the Akonadi directory with all emails. Indexing takes far too long and never ends.
(In reply to sourcemaker from comment #15) > I'm currently trying to index the Akonadi directory with all emails. > Indexing takes far too long and never ends. These are separate .eml files? or a big .mbox files with concatenated mails? In both cases watch out for encoded attachments, you can be indexing strings you'll never want to search for. Have a look at Bug 460882. If Akonadi stores mail in .eml or .mbox, you might want to append a comment to that...
It's stored as maildir.
(In reply to sourcemaker from comment #17) > It's stored as maildir. So there'll be some/several/many .mbox files (application/mbox without the .mbox suffix) Baloo will try to index them (but won't separate out individual messages, you'll just get the .mbox file as a result). You will have trouble with encoded message parts (including attachments) giving Baloo a lot to do... and Baloo will also attempt to index the file however big it is, so a 1GB .mbox will probably kill it. There's a now a pretty strict systemd limit on Baloo (see "systemctl --user status kde-baloo"), it caps the RAM usage to 512 MB. This could significantly slow indexing of large .mbox. Check to see how big your files are and think about changing the memory cap. My preferences (for what they are worth) are to increase the cap to 50% and prevent Baloo using Swap: MemoryHigh=50% MemorySwapMax=0B