Bug 334060 - baloo_file_cleaner clashes with indexer?
Summary: baloo_file_cleaner clashes with indexer?
Status: RESOLVED FIXED
Alias: None
Product: Baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: unspecified
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Vishesh Handa
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-29 08:32 UTC by Uwe Dippel
Modified: 2014-05-05 13:37 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In: 4.13.1


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Uwe Dippel 2014-04-29 08:32:41 UTC
After a complete eating up of resources at indexing, I removed my HOME from the Desktop Search. Though, it would not stop eating resources. Then I tried to clean baloo, since the configuration was changed:
$ baloo_file_cleaner
That was yesterday around 14:50. It started with low numbers. This morning (I let it run all through the night, with all other applications closed) it stands at some 83000 now at some minutes after 9:00; and still going.

What I find at least strange is the indexer - or what is this (not sure if this is the correct command sequence?):
$ ps ax | grep baloo
14601 pts/9    DN+   15:34 baloo_file_cleaner
$ balooshow 14601
Object::connect: No such signal org::freedesktop::UPower::DeviceAdded(QDBusObjectPath)
Object::connect: No such signal org::freedesktop::UPower::DeviceRemoved(QDBusObjectPath)
14601 /X/Y/Z/0801.BIN
$ 

This 0801.BIN has been displayed yesterday with this command, and is still being displayed today. So what is baloo working on here? It is just a 1 MB BIOS-file!

Is there anything wrong when it works more than 15 hours on this file?




Reproducible: Always
Comment 1 Uwe Dippel 2014-05-02 08:11:04 UTC
Okay, run of baloo_file_cleaner finished. After 56:00 hours (!) of wall time, processing 393326 files (yes, we have a file server), with 1:37 hours (!) of real CPU time.

Puuh. Over! Over? Nooooo, because though the process terminated properly, without error, after a reboot, baloo_file_cleaner starts again, at file 3006, counting up. The CPU running the process is at 100% continuously, the hard disk spinning continuously.

Help! What can I do to finally stop this mess??
Comment 2 Uwe Dippel 2014-05-02 10:08:41 UTC
Now, should this become another bug report on file_cleaner, I really wonder:

The second round of baloo_file_cleaner finished after some 2 hours of wall time, and some 21 minutes of CPU-time.

The third round started somewhere at 30000 and is still running. The indexer is off; in the conf file, and not showing in $ ps ax | grep baloo either. That confirms it: indexer is off, and has been off.
So why does the third round still cleans (deletes?) files in the excluded $HOME; files that it went through earlier?
And why did it start at some 30000? It *looks* like it only really processed those until there (they are not being revisited), and needs to redo the files if there is a large amount? Only guesswork, of course.
Comment 3 Uwe Dippel 2014-05-02 15:59:01 UTC
Okay, done, after the third round and some 61 hours of wall time and more than 2 hours of CPU-time finally, when starting the baloo_file_cleaner for the fourth and subsequent times, the prompt comes back immediately. 

However, the data isn't any smaller than it was before. How can I get confidential information out of the way?:
$ baloo_file_cleaner [This is okay now!]
$ ls -lR
.:
total 4
drwxrwxr-x 2 myname myname 4096 May  2 17:55 file
./file:
total 2186980
-rw-r--r-- 1 myname myname  240780288 May  2 17:45 fileMap.sqlite3
-rw-rw-r-- 1 myname myname          0 May  2 17:55 flintlock
-rw-rw-r-- 1 myname myname         28 Apr 25 12:34 iamchert
-rw-rw-r-- 1 myname myname      19325 May  1 18:35 position.baseA
-rw-rw-r-- 1 myname myname      19325 May  1 18:35 position.baseB
-rw-rw-r-- 1 myname myname 1271439360 May  1 18:35 position.DB
-rw-rw-r-- 1 myname myname       3794 May  1 18:35 postlist.baseA
-rw-rw-r-- 1 myname myname       3794 May  1 18:35 postlist.baseB
-rw-rw-r-- 1 myname myname  412606464 May  1 18:35 postlist.DB
-rw-rw-r-- 1 myname myname        239 May  1 18:35 record.baseA
-rw-rw-r-- 1 myname myname        239 May  1 18:35 record.baseB
-rw-rw-r-- 1 myname myname   14368768 May  1 18:35 record.DB
-rw-rw-r-- 1 myname myname       4521 May  1 18:35 termlist.baseA
-rw-rw-r-- 1 myname myname       4521 May  1 18:35 termlist.baseB
-rw-rw-r-- 1 myname myname  300171264 May  1 18:35 termlist.DB
$
Comment 4 Vishesh Handa 2014-05-02 16:08:29 UTC
Are you sure the cleaner started up on reboot? It is typically is only started if you modify the configuration in any way.

The large amount of time it is taking has been fixed for 4.13.1
Comment 5 Uwe Dippel 2014-05-02 16:17:16 UTC
No, I excluded /home/me. Then I rebooted, no more indexer. Good!!
Now, to get everything un-indexed (we *do* have tens of thousands of confidential files on file servers that I mounted into my $HOME!), it was me to start the cleaner. 
What I don't get: 
1. why did I have to start it thrice until it would not scrub what had already been scrubbed?
2. the database files seem unmodified w.r.t. size. I *need* them to go
I didn't dare to delete them since I read from others, that gwenview and dolphin start throwing exceptions without those files, sort of 
Serious Error:  Permission denied
Couldn't stat '/home/myname/.local/share/baloo/file'
and segfaults on gwenview
Comment 6 Vishesh Handa 2014-05-05 13:37:11 UTC
The scrubber isn't something that has been getting a lot of love. It can probably be improved quite a bit. It's currently quite slow.

However, it does have a number of improvements in 13.1 (releasing next week), so you shouldn't have a problem. It now just deletes .local/share/baloo/file, when you disable indexing.

Marking this bug as fixed with 13.1.