The baloo_file process has been running for five hours and uses about 4±2 GiB of RAM, causing swapping, and not a single file has been indexed yet: $ balooctl -v baloo 5.46.0 $ balooctl status Baloo File Indexer is running Indexer state: Initial Indexing Indexed 0 / 0 files Current size of index is 21.26 GiB $ ps -C baloo_file -o comm,etime,%cpu,%mem,vsz,rss COMMAND ELAPSED %CPU %MEM VSZ RSS baloo_file 05:09:33 43.6 32.5 274650148 3965904 $ ls -lh .local/share/baloo/index -rw-rw-r-- 1 tyl tyl 22G May 27 14:04 .local/share/baloo/index This link suggested I file this bug: https://community.kde.org/Baloo/Debugging. I really like the idea of Baloo, so I wish for it to work a bit better. I don't know how often Baloo works flawlessly. My setup is barely unusual: I have some directories with a million small files (records of Go games obtained from this command: https://github.com/espadrine/badukjs/blob/master/Makefile#L13), and some files which are quite big, like a few Linux .iso. In total, I have about 150 GiB in /home — including the 22 GiB of Baloo index, which is now a significant amount of "0 files indexed". If that large folder and the iso are the files that baloo_file chokes on, could we make Baloo give up if it spends more than 10 seconds on a single file or folder? (An `ls` on the Go games folder takes 11 minutes.) But really, I only care about indexing the contents of my PDFs and LibreOffice documents, and maybe my images. All told, a few thousand files. Philosophically, it makes more sense to whitelist files by type than to index files that are unlikely to be properly read. Looking through the configuration parameters, it looks like files are blacklisted by type. It would make more sense to whitelist them: there are more file types that are unreadable than there are supported ones. Most users only care about indexing of .pdf, .docx and .jpg files, maybe a handful of others. I don't see a use-case for indexing an .iso file. Yet it is neither in excludeFilters nor in excludeMimetypes by default. Aside. Is Baloo indexing file paths themselves? It would be both pretty inefficient and a duplication of effort, since mlocate does it stellarly and yet unnoticeably. /var/lib/mlocate is 98 MiB and `locate *.pdf` takes about a second to run. Could we make Baloo stream its processing? For each file extension in the whitelist we discussed, it would regularly use locate(1) to get them, feed them to the content indexer if they were updated, and that's it. Finally, when Baloo does pointless busywork, it would be welcome to have more debugging tools. balooctl could have a command to debug what baloo_file is currently indexing.
I can confirm this issue as well on ArchLinux with the same version. Originally my disk drive filled up, apparently causing the index db to get corrupted. baloo_file_extr was eating up cpu and my journal was filled with: May 27 09:38:17 mymachine kdeinit5[2549]: () May 27 09:38:18 mymachine kdeinit5[2549]: () May 27 09:38:18 mymachine kdeinit5[2549]: () May 27 09:38:18 mymachine kdeinit5[2549]: () May 27 09:38:20 mymachine kdeinit5[2549]: () May 27 09:38:20 mymachine kdeinit5[2549]: () May 27 09:38:20 mymachine kdeinit5[2549]: () So I nuked baloo with `balooctl disable`. After re-enabled it with `balooctl enable`. I'm seeing the behavior described in the original description.
Possibly related, balooctl stop/suspend/disable has no effect on the baloo_file process. I had to kill it manually.
I am also seeing this issue, baloo_file_extractor uses all of RAM and half of swap (20GB total) slowing down the entire desktop, I'll try to investigate further and see what's using up so much memory.
Could you(In reply to Thaddee Tyl from comment #0) > If that large folder and the iso are the files that baloo_file chokes on, > could we make Baloo give up if it spends more than 10 seconds on a single > file or folder? (An `ls` on the Go games folder takes 11 minutes.) I guess in that case Baloo does can choke on this directory. And, I guess, this setup is indeed somewhat unusual. I suggest adding this folder to "exclude" list (it's available inside "systemsettings", in "Workspace -> Search -> File Search" category. Does this Go game files have a special mime-type? We have a list of blacklisted mimetypes inside Baloo (which currently includes mostly source-code files), we can blacklist it by default (as it hardly contains useful information for indexing, right?) > Philosophically, it makes more sense to whitelist files by type than to > index files that are unlikely to be properly read. Looking through the > configuration parameters, it looks like files are blacklisted by type. It > would make more sense to whitelist them: there are more file types that are > unreadable than there are supported ones. Most users only care about > indexing of .pdf, .docx and .jpg files, maybe a handful of others. I don't > see a use-case for indexing an .iso file. Yet it is neither in > excludeFilters nor in excludeMimetypes by default. Baloo relies on KFileMetaData framework to index files: if we can extract data from file, we do it. But it only supports use-cases relevant for users (documents, pictures, audio files, etc.). I'm not really sure iso-files are being indexed at all, as KFileMetaData does not support them (well, because there is not much to be extracted that is relevant for user...). > Aside. Is Baloo indexing file paths themselves? It would be both pretty > inefficient and a duplication of effort, since mlocate does it stellarly and > yet unnoticeably. /var/lib/mlocate is 98 MiB and `locate *.pdf` takes about > a second to run. It does. Also, Baloo is supposed to be self-sufficient, it's not supposed to be used together with mlocate, it's a separate indexing system. > Finally, when Baloo does pointless busywork, it would be welcome to have > more debugging tools. > balooctl could have a command to debug what baloo_file is currently indexing. "balooctl monitor" does that. (well, unfortunately it was not possible to print _current_ file being indexed, but this will be fixed in 5.52 release, see https://cgit.kde.org/baloo.git/commit/?id=a9696978322c08d19ece0a67f430aee391e3918d)
Michael, we'd be happy to add the go files to the blacklist if you can 1) confirm that their content is never worth indexing and 2) provide their file extension. Thanks!
Nate, I never found out what baloo was hanging on, Igor Poboiko found the Go game files issue. Is there a way to see what baloo is doing (`balooctl status` isn't working, I guess I can strace?)
FWIW `balooctl status` should work much better in the upcoming KDE Frameworks 5.53, if you can update.
I found a similar problem with baloo 5.54.0 with kde neon 5.14 in a fresh installation. Baloo was consuming 100% of the cpu. I disabled it and the cpu back to normal.
*** Bug 403866 has been marked as a duplicate of this bug. ***
same issue here on a fresh install of opensuse 15.1. It kills my system completely. Makes baloo unusable. Sad. Considering that the report was open one and half years ago I assume we don't can hope for a fix. Sad.
(In reply to p from comment #10) > same issue here on a fresh install of opensuse 15.1. It kills my system > completely. Makes baloo unusable. Sad. Considering that the report was open > one and half years ago I assume we don't can hope for a fix. Sad. So, which files does it choke on? What does `balooctl status` / `balooctl monitor` report?
*** Bug 427819 has been marked as a duplicate of this bug. ***
(In reply to Thaddee Tyl from comment #0) > ... My setup is barely unusual ... May I read that as "My setup is fairly unusual"? 8-) > ... I have some directories with a million small files (records of > Go games obtained from this command: > https://github.com/espadrine/badukjs/blob/master/Makefile#L13) Wow... Maybe some time has gone by and the number of recorded games has crept up but I've just downloaded and unpacked nearly 2 million .sgf files (that end up in a single, flat, directory). That's going to be a torture test! First off. Yes, I see the described behaviour: baloo_file fills RAM and disk for hours with no visible progress This is with the current Neon Unstable... Plasma: 5.22.80 Frameworks: 5.85.0 Qt: 5.15.3 Filesystem: Ext4 This hadn't been marked "Confirmed" but, yes, reproducible... Digging down into the "torture test"; extracting the files from the tar archives overwhelms iNotify. Baloo reports Inotify - too many event - Overflowed Baloo attempts to index the files where it get the notification, but it will only discover "the remainder" on a "balooctl check" or on the next logon. I see "baloo_file" running at 100% and with steadily growing memory use. It's listing all the files it will need to index (it's not got as far as indexing content). However I see the same behaviour with content indexing disabled, so it is an issue with baloo_file and not baloo_file_extractor. It seems that baloo_file wants to build the list of unindexed files as a single transaction. "balooctl check" does not show anything happening; the information is being collected but not appearing on disc. Testing on a VM with 16 GB RAM, I could index 1.4 million files (it took almost an hour, without content indexing) and it was possible to see the memory use creeping up during the process and the results committed to disc right at the end. With the full 2 million files, it filled RAM and swap in 90 minutes and baloo_file hung with what looked like a corrupt/truncated index written to disc (the filesize of index was the size of RAM. Interesting but maybe a coincidence) It was possible to index the full 2 million files if they were copied "in batches" into an indexed directory and baloo_file allowed to catch up after each copy. I think there is something to be fixed here... When baloo is indexing content it does it with batches of files (40 files, then the next 40 and so on) and commit the results after each. It would make sense to batch the initial indexing, something like a commit every 15 seconds perhaps. That would also allow people to see that something was happening with "balooctl status" More speculatively... The "40 file" batches for content indexing is very, very low for the small .sgf files; the full text index would take days (weeks?) to complete. This limit can shrink, maybe it should be allowed to grow as well. I'd place the baloo_file and baloo_file_extractor issues into different pigeon holes here.
I can confirm this is still a problem, RAM usage isn't really that much of an issue, it caps out at 4 GB out of the 16 GB my system has. The problem is both disk usage and one CPU thread pegged at 100%. I never had this issue until recently, but I can pin down exactly what caused Baloo to malfunction. What I did: A couple weeks ago I took up to modding Grim Dawn. To modify anything in that game you're required to fully extract a database file which then becomes about 60k loose text files. I then began using the search function in Kate to give me the ability to do mass edits on those files. Since it was rapid file creation and deletion (Kate creates swap files every time you make a change to something and then deletes those when you save) and overall changes to a ton of files, that's when Baloo started going haywire on my system. I hope this helps in re-enacting the issue with Baloo.
(In reply to hugomaia from comment #14) > ... rapid file creation and deletion (Kate creates swap files every > time you make a change to something and then deletes those when you save) > and overall changes to a ton of files, that's when Baloo started going > haywire on my system. If edit a "testfile.txt" file, I see Kate creating a ".testfile.txt.kate-swp" (checking on Neon Testing) The question is whether these are a problem... If I have content indexing and "Index hidden files and folders" enabled and run "balooctl monitor", I see baloo indexing the ".kate-swp" file periodically as the source file is edited. Checking my ~/.config/baloofilerc file, I see that "*.swp" files are excluded but there's no mention of "*.kate-swp" You might check to see whether you are indexing hidden files and disable this if not needed. It would probably make sense to edit your ~/.config/baloofilerc and add an exclusion for "*.kate-swp". I suppose that if you are modifying "a ton" of files, you might be giving baloo a lot to do. May also be that you are hitting Bug 442453, where baloo is having to delete "large numbers" of files.
(In reply to tagwerk19 from comment #15) > Checking my ~/.config/baloofilerc file, I see that "*.swp" files are > excluded but there's no mention of "*.kate-swp" See also: https://bugs.kde.org/show_bug.cgi?id=269518#c9
(In reply to tagwerk19 from comment #13) > I think there is something to be fixed here... > > When baloo is indexing content it does it with batches of files > (40 files, then the next 40 and so on) and commit the results after > each. It would make sense to batch the initial indexing, something > like a commit every 15 seconds perhaps. That would also allow people > to see that something was happening with "balooctl status" That's been done, there's been a patch to limit memory use with systemd/cgroups: https://invent.kde.org/frameworks/baloo/-/merge_requests/121 Together with a fix speculated about above, to commit regularly during the initial indexing https://invent.kde.org/frameworks/baloo/-/merge_requests/148 I'll leave this "Waiting for Info" but I think it can probably be closed...
Dear Bug Submitter, This bug has been in NEEDSINFO status with no change for at least 15 days. Please provide the requested information as soon as possible and set the bug status as REPORTED. Due to regular bug tracker maintenance, if the bug is still in NEEDSINFO status with no change in 30 days the bug will be closed as RESOLVED > WORKSFORME due to lack of needed information. For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging If you have already provided the requested information, please mark the bug as REPORTED so that the KDE team knows that the bug is ready to be confirmed. Thank you for helping us make KDE software even better for everyone!
🐛🧹 This bug has been in NEEDSINFO status with no change for at least 30 days. Closing as RESOLVED WORKSFORME.