Bug 333695 - Ask before starting initial indexing process.
Summary: Ask before starting initial indexing process.
Status: CLOSED INTENTIONAL
Alias: None
Product: Baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: unspecified
Platform: Kubuntu Linux
: NOR major
Target Milestone: ---
Assignee: Vishesh Handa
URL: http://askubuntu.com/questions/437635...
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-21 17:41 UTC by Tim Henderson
Modified: 2015-05-26 07:44 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Krunner crash (16.54 KB, text/plain)
2014-04-22 16:43 UTC, Tim Henderson
Details
pdf with parsing errors. (1.03 MB, application/pdf)
2014-04-22 16:59 UTC, Tim Henderson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Henderson 2014-04-21 17:41:20 UTC
I just upgraded to Kubuntu 14.04 with the latest KDE and it included baloo. Now I don't track KDE development that closely so I didn't know a new indexing/nepomunk system was hitting in the new KDE. Soon after I got my machine setup my fan was going like crazy from Baloo indexing stuff. This is a problem because I have a huge home directory full of source code and data files (many 10s of GBs) for which indexing does no good. I could not find a way to turn it off, suspend it, stop it or anything else besides setting it to exclude my home dir. I did this and it seemed to have no effect.

I ended up having to use a hack I found online to disable it. This is just a bad experience. I know there are users out there that may benefit from file indexing but there are many of us who would like some control on what to index and would like to at least be able to disable it. Please add an initial configuration on OS install to help users choose what to index.

Reproducible: Always

Steps to Reproduce:
1. Have a large home dir
2. Lots of datafiles in plain text, source code, archives
3. Turn on computer
Actual Results:  
CPU core pegged. Lots of junk written to disk.

Expected Results:  
Not indexing my datafiles, source code, git repos, etc...
Comment 1 Vishesh Handa 2014-04-22 14:33:47 UTC
How long did it take for the initial indexing to take place?

Could you provide some estimates on how long there was high cpu usage? Was it just there for a couple of minutes.
Comment 2 Tim Henderson 2014-04-22 14:47:59 UTC
I went to lunch and came back (about 45 min) and it was still pegging 1 CPU thread. So my estimate is all of 1 CPU and for at least 45 min.
Comment 3 Vishesh Handa 2014-04-22 14:55:26 UTC
Do you remember what the exact process name was?
Comment 4 Tim Henderson 2014-04-22 15:24:44 UTC
I believe it was baloo but I also saw baloo_file_extractor? and baloo_clean? (i think those where the names). After I added my home dir to the ignore I didn't see baloo any more but both the extractor and the cleaner continued to run. In particular I saw large amounts of io from clean.
Comment 5 Vishesh Handa 2014-04-22 15:52:45 UTC
I'm guessing the cleaner was from after you disabled it. That's a bug, I've pushed a fix for today.

If you are up to it, please enable it again and try to see what is wrong. I've compiled a list of debugging instructions over here - http://community.kde.org/Baloo/Debugging
Comment 6 Tim Henderson 2014-04-22 16:05:01 UTC
Ok, do you want me to have it index? Or just turn it back on? I turned it
back on currently baloo_file_cleaner looks it is writing as fast as it can.
It is using about 10% CPU and writing from 1000 K/s to 33000 K/s.
Comment 7 Tim Henderson 2014-04-22 16:07:27 UTC
 Total DISK READ :       0.00 B/s | Total DISK WRITE :    1589.09 K/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       7.76 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
  487 be/3 root        0.00 B/s    0.00 B/s  0.00 % 40.96 % [jbd2/sda7-8]
 7346 idle henderso    0.00 B/s 1589.09 K/s  0.00 %  1.32 % baloo_file_cleaner
Comment 8 Vishesh Handa 2014-04-22 16:14:05 UTC
The cleaner seems to have some problems. I've improved it a little bit, but those fixes will only be there in 4.13.1

Till then, remove ~/.local/share/baloo/file/* and your config file at ~/.config/baloofilerc, and kill all baloo processes.

$ pkill -9 baloo
$ baloo_file
Comment 9 Tim Henderson 2014-04-22 16:23:21 UTC
ok, rerunning. it seems to be behaving a bit better now, (not using 100% of a core all the time). I will see if it is still running after lunch.
Comment 10 Tim Henderson 2014-04-22 16:41:49 UTC
So I decided to re-enable Destop Search in krunner while this was going an it caused a crash. Attached is the stack trace.
Comment 11 Tim Henderson 2014-04-22 16:43:05 UTC
Created attachment 86212 [details]
Krunner crash

This happened after I enabled the Desktop Search plug-in. Doing any search seems to cause the crash (but results from Desktop search seem to be returned in the mean time which is weird).
Comment 12 Tim Henderson 2014-04-22 16:56:08 UTC
Ok after about 30 min I am seeing the behaviour which caused me to disable it the first time. `baloo_file_extractor` is using a 100% of a core to do its thing. Unfortunately, I didn't grab the numbers all the numbers before it finished. I am wondering if what is going on it is hitting a bunch of big files and indexing them. I did get an approximate range so I am checking that. How do I convert these ids to file paths?
Comment 13 Tim Henderson 2014-04-22 16:57:26 UTC
nvm. I didn't see balooshow
Comment 14 Tim Henderson 2014-04-22 16:59:47 UTC
Created attachment 86213 [details]
pdf with parsing errors.

many of my PDFs had parsing errors. Here is an example.
Comment 15 Tim Henderson 2014-04-22 17:17:01 UTC
Ok, as I suspected the problem files are data files. These are plain text files which are large 10s of MBs to GBs. I don't really know what to say other than don't index large files. I don't think anyone realistically wants full text search on very large files. I would say a good rule of thumb would be if the file is over 10 MB is appears to be text don't index it. In general I would black list everything except for a set of know file extensions and still do a sanity check on file size. I know you don't want people to "think about file search" and in order to do that you need to put some limits on what it can do.

A user like myself does not get any benefit whatsoever from full text file search. (And I say this as someone who has implemented full text search with regular expression support using both disk based tries and ngram approaches) You need to be sensitive to this fact. KDE is used by the scientific community and they have big data files and will not appreciate this type of behaviour. (You will not do a better job indexing a sequence file (for instance) than the custom built tools for that. You don't want to be doing that. So you should not.)
Comment 16 Vishesh Handa 2014-04-25 16:07:14 UTC
The errors in the PDF you uploaded are irrelevant. Those come from the underlying library, poppler.

Regarding the text files, yes, we definitely need to not index huge data files. I have a few other similar bug reports. I'll add a fix.

If there are no objections, then I'll mark this bug as a duplicate of those. I'm not too keen on asking before starting the initial indexing.
Comment 17 Tim Henderson 2014-04-25 16:15:18 UTC
>
> The errors in the PDF you uploaded are irrelevant. Those come from the
> underlying library, poppler.
>

>
Regarding the text files, yes, we definitely need to not index huge data
> files.
> I have a few other similar bug reports. I'll add a fix.
>

ok sounds good.


> If there are no objections, then I'll mark this bug as a duplicate of
> those.
> I'm not too keen on asking before starting the initial indexing.
>

I disagree with this. I also disagree with your approach to configuration.
It would be nice to say index this one directory over here instead of
saying what not to index. However, it is your decision. I will just offer
the feedback that I would prefer to be asked and I would prefer to have
more control over what gets indexed.

Thanks for spending time on this bug report and working on the software.
Comment 18 Vishesh Handa 2014-04-25 16:22:17 UTC
Someone else has taken the old Baloo config code (before it was rewritten) and is making an app out of it with all of the advanced options that some users want. That should satisfy you w.r.t configuration.
Comment 19 Tim Henderson 2014-04-25 16:29:10 UTC
> Someone else has taken the old Baloo config code (before it was rewritten)
> and
> is making an app out of it with all of the advanced options that some users
> want. That should satisfy you w.r.t configuration.
>

I mean, the whole point of this is that it is integrated into the desktop
and that I can't disable it. What good is such an app? You guys have made
it impossible to use any of the PIM software without using all of this
stuff. (In fact I have often given up on KOrganizer et. al since KDE 4
because of the issues with Akonadi. I am giving it another shot right now
seems like the bugs are finally fixed so the system actually works.)

Even if it is a config file, you should make it possible to configure it.
KDE and (Linux|BSD) are about being in control of your software.
Comment 20 Richard Gomes 2015-02-10 10:20:34 UTC
Dear developers,

Please allow users to disable functionality.
If users are being hurt by Akonadi, please let users disable it.
If users are being hurt by baloo_file, please let users disable it.

I certainly recognize your efforts aiming to make things faster in certain circumstances but, if in other circumstances or scenarios the functionality has a side effect which ends up making things worse, then users just need to have the option to simply disable it.

My desktop was incredibly slow.
I have a very large code base: 75 modules with thousands of files; not rarely, I have 2, 3 or 4 "copies" of this structure on disk due to the workflow we adopted here. The end result is that the computer became unresponsive in certain circunstances. Irresponsive to the point of not even echo what I type onto the screen.

Please, please! Let users turn off functionality.

Thanks
Comment 21 Vishesh Handa 2015-02-10 11:30:42 UTC
(In reply to Richard Gomes from comment #20)
> Dear developers,
> 
> Please allow users to disable functionality.
> If users are being hurt by Akonadi, please let users disable it.

Akonadi can be disabled. Also, a bug report about Baloo is not a place to complain about it. 

> If users are being hurt by baloo_file, please let users disable it.

You can disable baloo_file. There is the KCM and `$ balooctl disable` command. How much simpler does it need to be?

> 
> I certainly recognize your efforts aiming to make things faster in certain
> circumstances but, if in other circumstances or scenarios the functionality
> has a side effect which ends up making things worse, then users just need to
> have the option to simply disable it.
> 

How is it worse? And in comparison to what? These vague statements might sound nice, but they do not help me diagnose any actual problem you're having. Unless you just wanted to rant, in which case, please take it somewhere else.

> My desktop was incredibly slow.
> I have a very large code base: 75 modules with thousands of files; not
> rarely, I have 2, 3 or 4 "copies" of this structure on disk due to the
> workflow we adopted here. The end result is that the computer became
> unresponsive in certain circunstances. Irresponsive to the point of not even
> echo what I type onto the screen.

Because of ..?

> 
> Please, please! Let users turn off functionality.

It can be turned off.
Comment 22 Richard Gomes 2015-02-10 14:52:25 UTC
Dear Vishesh Handa,

It was not my intention to "just rant" or "just complain". I'm user of KDE 100% of my brain time since 2004, feeling very satisfied with it. I've made myself relevant contributions to FOSS. 

Let me be very clear and very straight to the point in this communication.

1. Baloo is repeating what Akonadi did before in regards to impact to I/O. If not communicated properly before, it is now: Please be aware of mistakes Akonadi did in past in regards to I/O usage and impact to overall system performance.

2. Running iotop (which monitor I/O usage) in my desktop I found the system nearly blocked waiting for I/O due to activity of baloo_file. I supposed that the system was in fact totally staled before when I was typing and not even beeing able to see characters echoing in the terminal window.

3. I recognize my mistakes: one of those is not being aware of possibility of disabling Baloo.

4. Being unaware that Baloo can be disabled, I've found a blog post which forcibly disables it.
http://4nakama.net/2014/04/19/how-to-disable-baloo-in-latest-kde-and-kubuntu-14-04/

5. I've also removed the cache directory Baloo utilizes.

6. I've restarted the system and it is behaving very well now, as it always was, before I upgraded an old version of Kubuntu.

We have evidences now.

Thanks a lot,
Comment 23 Vishesh Handa 2015-02-10 16:45:40 UTC
Thank you for letting me know that some users specially on Kubuntu experience IO issues. [1]. We're fully aware of it, and I was quite annoying by Ubuntu for it.

> We have evidences now.
To do what? This kind of language does not sound good.

[1] https://blogs.kde.org/2014/10/15/ubuntus-linux-scheduler-or-why-baloo-might-be-slowing-your-system-1404

I'm closing this bug. We currently enable Baloo by default and we have no current plans of changing that. It can be easily disabled. Perhaps we can show some kind of progress dialog. But we currently have no plans of asking the user if they want Baloo. 

Keeping this bug report open isn't helping anyone.
Comment 24 Richard Gomes 2015-02-10 20:02:31 UTC
Thank you for pointing out that Ubuntu pushed a scheduler which does not support ionice properly. This information is relevant. Thanks.
Comment 25 Quentin RAYNAUD 2015-05-26 07:32:49 UTC
I understand what you are doing and why (or at least I think I do) but I still want to comment because I believe this is not a good decision. I upgraded to 14.04 recently and got hurt by this.

My computer was so slow I couldn't do anything with it like at all! It took me a long time to figure out that the slowness was due to baloo. You have to understand my situation completely though: once I figured it was baloo and that it was an indexer that was slowing my computer down, I thought : « well, I will let it do its thing and it will stop in a few hours » so I left my computer for a whole 24h running baloo.

When I returned it was still using 100% of my computer and totally unresponsive. 24h later, same thing. Using balooctl disable is not doing the trick either. It kicks in baloo_cleaner that makes my computer unresponsive for hours too. I had to kill it.

I lost hours because of all of this and it really pissed me off at the time. I'm trying to make a calm statement about this so we can think of a good solution.

I concur that asking the user is no good : a desktop should be easy to use and such a question can't be answered by a newbee. It would destroy the user experience and interrupt the user. Bad design too.

I think it would be cool if baloo was running only when the user is away from computer or no more that 2-3 consecutive minutes every 30 when he is there. Or anything else in that spirit.

But baloo is in dire need to ensure it is not preventing serious work to be done too. Or it also defeats the point of te better UX it should bring. My home is a To big but it is mostly media files plus some Gos of code (around 13). I don't understand why it is disturbing baloo this much but it does.
Comment 26 Quentin RAYNAUD 2015-05-26 07:44:05 UTC
Oh, I realized that it's totally reasonable to think my problems might be due to a below average computer. I don't think it is…

My home is resting on a 5×2To RAID6 array handled by a dedicated card with 512Mo of flash cache giving 3× the bandwidth of a normal 7200rpm disk. The CPU is an Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz and it has 32Go of RAM available to it.

I'm saying this because I don't believe an indexing process should make such a computer unresponsive even a sec. It did for 48hours and didn't gave a hint it would stop doing so…