I just upgraded to Kubuntu 14.04 with the latest KDE and it included baloo. Now I don't track KDE development that closely so I didn't know a new indexing/nepomunk system was hitting in the new KDE. Soon after I got my machine setup my fan was going like crazy from Baloo indexing stuff. This is a problem because I have a huge home directory full of source code and data files (many 10s of GBs) for which indexing does no good. I could not find a way to turn it off, suspend it, stop it or anything else besides setting it to exclude my home dir. I did this and it seemed to have no effect. I ended up having to use a hack I found online to disable it. This is just a bad experience. I know there are users out there that may benefit from file indexing but there are many of us who would like some control on what to index and would like to at least be able to disable it. Please add an initial configuration on OS install to help users choose what to index. Reproducible: Always Steps to Reproduce: 1. Have a large home dir 2. Lots of datafiles in plain text, source code, archives 3. Turn on computer Actual Results: CPU core pegged. Lots of junk written to disk. Expected Results: Not indexing my datafiles, source code, git repos, etc...
How long did it take for the initial indexing to take place? Could you provide some estimates on how long there was high cpu usage? Was it just there for a couple of minutes.
I went to lunch and came back (about 45 min) and it was still pegging 1 CPU thread. So my estimate is all of 1 CPU and for at least 45 min.
Do you remember what the exact process name was?
I believe it was baloo but I also saw baloo_file_extractor? and baloo_clean? (i think those where the names). After I added my home dir to the ignore I didn't see baloo any more but both the extractor and the cleaner continued to run. In particular I saw large amounts of io from clean.
I'm guessing the cleaner was from after you disabled it. That's a bug, I've pushed a fix for today. If you are up to it, please enable it again and try to see what is wrong. I've compiled a list of debugging instructions over here - http://community.kde.org/Baloo/Debugging
Ok, do you want me to have it index? Or just turn it back on? I turned it back on currently baloo_file_cleaner looks it is writing as fast as it can. It is using about 10% CPU and writing from 1000 K/s to 33000 K/s.
Total DISK READ : 0.00 B/s | Total DISK WRITE : 1589.09 K/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 7.76 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 487 be/3 root 0.00 B/s 0.00 B/s 0.00 % 40.96 % [jbd2/sda7-8] 7346 idle henderso 0.00 B/s 1589.09 K/s 0.00 % 1.32 % baloo_file_cleaner
The cleaner seems to have some problems. I've improved it a little bit, but those fixes will only be there in 4.13.1 Till then, remove ~/.local/share/baloo/file/* and your config file at ~/.config/baloofilerc, and kill all baloo processes. $ pkill -9 baloo $ baloo_file
ok, rerunning. it seems to be behaving a bit better now, (not using 100% of a core all the time). I will see if it is still running after lunch.
So I decided to re-enable Destop Search in krunner while this was going an it caused a crash. Attached is the stack trace.
Created attachment 86212 [details] Krunner crash This happened after I enabled the Desktop Search plug-in. Doing any search seems to cause the crash (but results from Desktop search seem to be returned in the mean time which is weird).
Ok after about 30 min I am seeing the behaviour which caused me to disable it the first time. `baloo_file_extractor` is using a 100% of a core to do its thing. Unfortunately, I didn't grab the numbers all the numbers before it finished. I am wondering if what is going on it is hitting a bunch of big files and indexing them. I did get an approximate range so I am checking that. How do I convert these ids to file paths?
nvm. I didn't see balooshow
Created attachment 86213 [details] pdf with parsing errors. many of my PDFs had parsing errors. Here is an example.
Ok, as I suspected the problem files are data files. These are plain text files which are large 10s of MBs to GBs. I don't really know what to say other than don't index large files. I don't think anyone realistically wants full text search on very large files. I would say a good rule of thumb would be if the file is over 10 MB is appears to be text don't index it. In general I would black list everything except for a set of know file extensions and still do a sanity check on file size. I know you don't want people to "think about file search" and in order to do that you need to put some limits on what it can do. A user like myself does not get any benefit whatsoever from full text file search. (And I say this as someone who has implemented full text search with regular expression support using both disk based tries and ngram approaches) You need to be sensitive to this fact. KDE is used by the scientific community and they have big data files and will not appreciate this type of behaviour. (You will not do a better job indexing a sequence file (for instance) than the custom built tools for that. You don't want to be doing that. So you should not.)
The errors in the PDF you uploaded are irrelevant. Those come from the underlying library, poppler. Regarding the text files, yes, we definitely need to not index huge data files. I have a few other similar bug reports. I'll add a fix. If there are no objections, then I'll mark this bug as a duplicate of those. I'm not too keen on asking before starting the initial indexing.
> > The errors in the PDF you uploaded are irrelevant. Those come from the > underlying library, poppler. > > Regarding the text files, yes, we definitely need to not index huge data > files. > I have a few other similar bug reports. I'll add a fix. > ok sounds good. > If there are no objections, then I'll mark this bug as a duplicate of > those. > I'm not too keen on asking before starting the initial indexing. > I disagree with this. I also disagree with your approach to configuration. It would be nice to say index this one directory over here instead of saying what not to index. However, it is your decision. I will just offer the feedback that I would prefer to be asked and I would prefer to have more control over what gets indexed. Thanks for spending time on this bug report and working on the software.
Someone else has taken the old Baloo config code (before it was rewritten) and is making an app out of it with all of the advanced options that some users want. That should satisfy you w.r.t configuration.
> Someone else has taken the old Baloo config code (before it was rewritten) > and > is making an app out of it with all of the advanced options that some users > want. That should satisfy you w.r.t configuration. > I mean, the whole point of this is that it is integrated into the desktop and that I can't disable it. What good is such an app? You guys have made it impossible to use any of the PIM software without using all of this stuff. (In fact I have often given up on KOrganizer et. al since KDE 4 because of the issues with Akonadi. I am giving it another shot right now seems like the bugs are finally fixed so the system actually works.) Even if it is a config file, you should make it possible to configure it. KDE and (Linux|BSD) are about being in control of your software.
Dear developers, Please allow users to disable functionality. If users are being hurt by Akonadi, please let users disable it. If users are being hurt by baloo_file, please let users disable it. I certainly recognize your efforts aiming to make things faster in certain circumstances but, if in other circumstances or scenarios the functionality has a side effect which ends up making things worse, then users just need to have the option to simply disable it. My desktop was incredibly slow. I have a very large code base: 75 modules with thousands of files; not rarely, I have 2, 3 or 4 "copies" of this structure on disk due to the workflow we adopted here. The end result is that the computer became unresponsive in certain circunstances. Irresponsive to the point of not even echo what I type onto the screen. Please, please! Let users turn off functionality. Thanks
(In reply to Richard Gomes from comment #20) > Dear developers, > > Please allow users to disable functionality. > If users are being hurt by Akonadi, please let users disable it. Akonadi can be disabled. Also, a bug report about Baloo is not a place to complain about it. > If users are being hurt by baloo_file, please let users disable it. You can disable baloo_file. There is the KCM and `$ balooctl disable` command. How much simpler does it need to be? > > I certainly recognize your efforts aiming to make things faster in certain > circumstances but, if in other circumstances or scenarios the functionality > has a side effect which ends up making things worse, then users just need to > have the option to simply disable it. > How is it worse? And in comparison to what? These vague statements might sound nice, but they do not help me diagnose any actual problem you're having. Unless you just wanted to rant, in which case, please take it somewhere else. > My desktop was incredibly slow. > I have a very large code base: 75 modules with thousands of files; not > rarely, I have 2, 3 or 4 "copies" of this structure on disk due to the > workflow we adopted here. The end result is that the computer became > unresponsive in certain circunstances. Irresponsive to the point of not even > echo what I type onto the screen. Because of ..? > > Please, please! Let users turn off functionality. It can be turned off.
Dear Vishesh Handa, It was not my intention to "just rant" or "just complain". I'm user of KDE 100% of my brain time since 2004, feeling very satisfied with it. I've made myself relevant contributions to FOSS. Let me be very clear and very straight to the point in this communication. 1. Baloo is repeating what Akonadi did before in regards to impact to I/O. If not communicated properly before, it is now: Please be aware of mistakes Akonadi did in past in regards to I/O usage and impact to overall system performance. 2. Running iotop (which monitor I/O usage) in my desktop I found the system nearly blocked waiting for I/O due to activity of baloo_file. I supposed that the system was in fact totally staled before when I was typing and not even beeing able to see characters echoing in the terminal window. 3. I recognize my mistakes: one of those is not being aware of possibility of disabling Baloo. 4. Being unaware that Baloo can be disabled, I've found a blog post which forcibly disables it. http://4nakama.net/2014/04/19/how-to-disable-baloo-in-latest-kde-and-kubuntu-14-04/ 5. I've also removed the cache directory Baloo utilizes. 6. I've restarted the system and it is behaving very well now, as it always was, before I upgraded an old version of Kubuntu. We have evidences now. Thanks a lot,
Thank you for letting me know that some users specially on Kubuntu experience IO issues. [1]. We're fully aware of it, and I was quite annoying by Ubuntu for it. > We have evidences now. To do what? This kind of language does not sound good. [1] https://blogs.kde.org/2014/10/15/ubuntus-linux-scheduler-or-why-baloo-might-be-slowing-your-system-1404 I'm closing this bug. We currently enable Baloo by default and we have no current plans of changing that. It can be easily disabled. Perhaps we can show some kind of progress dialog. But we currently have no plans of asking the user if they want Baloo. Keeping this bug report open isn't helping anyone.
Thank you for pointing out that Ubuntu pushed a scheduler which does not support ionice properly. This information is relevant. Thanks.
I understand what you are doing and why (or at least I think I do) but I still want to comment because I believe this is not a good decision. I upgraded to 14.04 recently and got hurt by this. My computer was so slow I couldn't do anything with it like at all! It took me a long time to figure out that the slowness was due to baloo. You have to understand my situation completely though: once I figured it was baloo and that it was an indexer that was slowing my computer down, I thought : « well, I will let it do its thing and it will stop in a few hours » so I left my computer for a whole 24h running baloo. When I returned it was still using 100% of my computer and totally unresponsive. 24h later, same thing. Using balooctl disable is not doing the trick either. It kicks in baloo_cleaner that makes my computer unresponsive for hours too. I had to kill it. I lost hours because of all of this and it really pissed me off at the time. I'm trying to make a calm statement about this so we can think of a good solution. I concur that asking the user is no good : a desktop should be easy to use and such a question can't be answered by a newbee. It would destroy the user experience and interrupt the user. Bad design too. I think it would be cool if baloo was running only when the user is away from computer or no more that 2-3 consecutive minutes every 30 when he is there. Or anything else in that spirit. But baloo is in dire need to ensure it is not preventing serious work to be done too. Or it also defeats the point of te better UX it should bring. My home is a To big but it is mostly media files plus some Gos of code (around 13). I don't understand why it is disturbing baloo this much but it does.
Oh, I realized that it's totally reasonable to think my problems might be due to a below average computer. I don't think it is… My home is resting on a 5×2To RAID6 array handled by a dedicated card with 512Mo of flash cache giving 3× the bandwidth of a normal 7200rpm disk. The CPU is an Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz and it has 32Go of RAM available to it. I'm saying this because I don't believe an indexing process should make such a computer unresponsive even a sec. It did for 48hours and didn't gave a hint it would stop doing so…