Bug 358548

Summary: baloo_file_extractor high CPU and memory usage
Product: [Frameworks and Libraries] frameworks-baloo Reporter: rgnodev
Component: Baloo File DaemonAssignee: Pinak Ahuja <pinak.ahuja>
Status: RESOLVED FIXED    
Severity: major CC: abderrahman.najjar, aspotashev, bjoernv, eforgeot, gabmen, hyc, josephomorrow, pinak.ahuja, rgnodev, stefan.bruens
Priority: NOR    
Version First Reported In: unspecified   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: my baloofilerc file
My index file.

Description rgnodev 2016-01-25 18:20:48 UTC
Yesterday, I used Krunner in order to search for a pdf file I downloaded, but it wasn't found. I thought maybe baloo wasn't running, so I typed 'baloo' and an entry named 'File Indexer Monitor' was displayed. I clicked on it, but nothing happened. Shortly afterwards, I noticed the laptop fan was spinning a lot, so I opened KSysGuard and found that baloo_file_extractor was using over 50% CPU, and sometimes it reached 100% CPU usage. Things didn't go better over the minutes, and today I'm experiencing a very similar behaviour. Additionally I found that, when I search for some files stored in my laptop using KRunner, not all of them are being found (it only displays a file if I recently opened it).   

Reproducible: Always

Steps to Reproduce:
1. Download Antergos from https://antergos.com/try-it/ , using the 2015.12.29 i686 iso image. 
2. Insltall the system (on virtualbox, for example). 
3. Put some files .(pdf, .ods, .odt, .mp3) inside your /home subfolders (for example, /Documents, /Music), and download some other from the web, but do not open any of them.
4. Try to find any of these files by typing their names or file extension in KRunner.
5. Type 'baloo' in KRunner.
6. Click the 'File Indexer Monitor'.
7. Check for baloo_file_extractor CPU usage.

Actual Results:  
Step 4: no results for your files will be displayed.
Step 6: the Indexer Monitor window will not appear.
Step 7: baloo_file_extractor uses over 50% of your CPU. Sometimes it could reach 100%

Expected Results:  
Step 4: your files should be displayed.
Step 6: a window showing the file indexer's current status/activity should be displayed.
Step 7: this CPU usage should decrease/stop after a while.

1. I ran baloosearch over the not found files, with no results (only information about elapsed time).
2. I ran balooshow over the not found files, with no results (only a message telling me that no information about the index was found).
3. I ran ps aux | grep baloo_file_extractor, and then ran balooshow over the numbers this command returned. Got an error message (the <<fileID>> is not identical to the Baloo's real <<fileID>>. This is a bug).
4. I typed the name of some of the files I have in my laptop, in KRunner (with and without file extension). I get some results: under an 'Audio' or a 'File' tag (depending on if I typed an .ods, .odt or an .mp3 file name), an icon or a list of icons is displayed, without any additional information alongside. These icons are all the same: small windows with a '?' inside. If I clic on any of them I get this message: wrong URL format.
5. I checked if baloo_file process was running, and it was.

Here are the outputs I got from terminal:

[roberto@roberto-pc ~]$ ps aux | grep baloo_file_extractor roberto    582 37.0  2.4 1182224 42936 ?       RNl  12:58  11:24 /usr/bin/baloo_file_extractor roberto    799  0.0  0.1   4724  2360 pts/1    S+   13:29   0:00 grep baloo_file_extractor 
[roberto@roberto-pc ~]$ ps aux | grep baloo_file_extractor roberto    582 37.1  2.4 1182224 42936 ?       RNl  12:58  11:43 /usr/bin/baloo_file_extractor roberto    801  0.0  0.1   4724  2288 pts/1    S+   13:30   0:00 grep baloo_file_extractor 
[roberto@roberto-pc ~]$ balooshow 582  El «fileID» no es idéntico al «fileID» real de Baloo Esto es un fallo GivenID: 582 ActualID: 0 GivenINode: 0 ActualINode: 0 GivenDeviceID: 582 ActualDeviceID: 0 No se ha encontrado información del índice 
[roberto@roberto-pc ~]$ balooshow 37.0 No se ha encontrado información del índice 
[roberto@roberto-pc ~]$ balooshow 2.4 No se ha encontrado información del índice 
[roberto@roberto-pc ~]$ balooshow 1182224 El «fileID» no es idéntico al «fileID» real de Baloo Esto es un fallo GivenID: 1182224 ActualID: 0 GivenINode: 0 ActualINode: 0 GivenDeviceID: 1182224 ActualDeviceID: 0 No se ha encontrado información del índice 
[roberto@roberto-pc ~]$ balooshow 42936 El «fileID» no es idéntico al «fileID» real de Baloo Esto es un fallo GivenID: 42936 ActualID: 0 GivenINode: 0 ActualINode: 0 GivenDeviceID: 42936 ActualDeviceID: 0 No se ha encontrado información del índice 
[roberto@roberto-pc ~]$ balooshow ? No se ha encontrado información del índice 
[roberto@roberto-pc ~]$ balooshow 799 El «fileID» no es idéntico al «fileID» real de Baloo                                                                                                                                             Esto es un fallo                                                                                                                                                                                 GivenID: 799 ActualID: 0                                                                                                                                                                         GivenINode: 0 ActualINode: 0                                                                                                                                                                     GivenDeviceID: 799 ActualDeviceID: 0                                                                                                                                                             No se ha encontrado información del índice                                                                                                                                                       [roberto@roberto-pc ~]$ balooshow 0.0 No se ha encontrado información del índice                                                                                                                                                       [roberto@roberto-pc ~]$ balooshow 0.1 No se ha encontrado información del índice                                                                                                                                                       [roberto@roberto-pc ~]$ balooshow 4724 El «fileID» no es idéntico al «fileID» real de Baloo                                                                                                                                             Esto es un fallo                                                                                                                                                                                 GivenID: 4724 ActualID: 0                                                                                                                                                                        GivenINode: 0 ActualINode: 0                                                                                                                                                                     GivenDeviceID: 4724 ActualDeviceID: 0                                                                                                                                                            No se ha encontrado información del índice                                                                                                                                                       [roberto@roberto-pc ~]$ balooshow 2360 El «fileID» no es idéntico al «fileID» real de Baloo Esto es un fallo GivenID: 2360 ActualID: 0 GivenINode: 0 ActualINode: 0 GivenDeviceID: 2360 ActualDeviceID: 0 No se ha encontrado información del índice 
[roberto@roberto-pc ~]$ 

[roberto@roberto-pc ~]$ balooshow /home/roberto/Servet.pdf
No se ha encontrado información del índice
[roberto@roberto-pc ~]$ baloosearch pdf

Elapsed: 0.407895 msecs
[roberto@roberto-pc ~]$ baloosearch Servet
Elapsed: 0.965624 msecs
[roberto@roberto-pc ~]$ baloosearch Servet.pdf
Elapsed: 28.6488 msecs
[roberto@roberto-pc ~]$ /home/roberto/Documents/Bases\ de\ Datos/Asignaturas.csv
bash: /home/roberto/Documents/Bases de Datos/Asignaturas.csv: Permiso denegado
[roberto@roberto-pc ~]$ baloosearch /home/roberto/Documents/Bases\ de\ Datos/Asignaturas.csv
Elapsed: 23.2182 msecs
[roberto@roberto-pc ~]$ balooshow /home/roberto/Documents/Bases\ de\ Datos/Asignaturas.csv
No se ha encontrado información del índice
[roberto@roberto-pc ~]$ baloorearch /home/roberto/Documents/Bases\ de\ Datos/Cursos.csv
bash: baloorearch: no se encontró la orden
[roberto@roberto-pc ~]$ cd 
[roberto@roberto-pc ~]$ cd /home/roberto/Documents/Bases\ de\ Datos
[roberto@roberto-pc Bases de Datos]$ cd ..
[roberto@roberto-pc Documents]$ cd ..
[roberto@roberto-pc ~]$ baloorearch Tempest
bash: baloorearch: no se encontró la orden
[roberto@roberto-pc ~]$ baloosearch Tempest

Elapsed: 1.17574 msecs
[roberto@roberto-pc ~]$ baloosearch mp3
Elapsed: 0.981139 msecs

Leaving it marked as 'Major', because Plasma's searching capabilities are a very useful feature of this desktop environment.
Comment 1 rgnodev 2016-01-25 18:24:20 UTC
Created attachment 96838 [details]
my baloofilerc file
Comment 2 rgnodev 2016-01-28 18:18:43 UTC
Created attachment 96890 [details]
My index file.
Comment 3 rgnodev 2016-01-28 18:24:48 UTC
I ran 'balooctl status' several times, even after doing 'balooctl disable' followed by system restart and 'balooctl enable'. It reported that 40 of 1140 files was indexed, and these numbers remained to be the same every time I ran 'balooctl status'. Maybe it's an issue affecting the underlying database integrity, so I uploaded my 'index' file.
¿Should I open a new bug report?
Comment 4 Gabriele Menna 2016-04-30 14:38:20 UTC
Hello. Here, too, I experiment a high CPU and RAM usage by baloo_file_extractor process. 

ps aux | grep baloo:
gabo      1063  0.7  0.5 5608116 33196 ?       SNl  16:15   0:08 /usr/bin/baloo_file
gabo      1163  8.1 34.6 6224476 2085264 ?     RNl  16:16   1:25 /usr/bin/baloo_file_extractor

It looks like no parameter was passed to baloo_file_extractor process. How to find out which file is being indexed?

Can I provide further information, making fixing this easier?
Comment 5 Stefan Brüns 2016-07-04 17:58:12 UTC
at least the idfilename db is partially corrupt, e.g the /home directory appears several times, although keys should be unique.

mdb_dump -n -p -s idfilename index.lmdb | grep -B1 home
 \02\08\00\00)\f4\05\00
 \00\00\00\00\00\00\00\00home
--
 \02\08\00\00)\f4\05\00
 \00\00\00\00\00\00\00\00home
--
 \02\08\00\00)\f4\05\00
 \00\00\00\00\00\00\00\00home
--
 \02\08\00\00)\f4\05\00
 \00\00\00\00\00\00\00\00home
--
 \02\08\00\00)\f4\05\00
 \00\00\00\00\00\00\00\00home
--
 \02\08\00\00)\f4\05\00
 \00\00\00\00\00\00\00\00home
--
 \02\08\00\00)\f4\05\00
 \00\00\00\00\00\00\00\00home
--
 \02\08\00\00)\f4\05\00
 \00\00\00\00\00\00\00\00home
--
 \02\08\00\00)\f4\05\00
 \00\00\00\00\00\00\00\00home
--
 \02\08\00\00)\f4\05\00
 \00\00\00\00\00\00\00\00home
Comment 6 Stefan Brüns 2016-07-04 18:00:14 UTC
Fun fact:
LMDB on-disk format seems to be dependent on sizeof(ptr_t) - I can open the db index file on my 32bit RPi1, but not on my x86_64.
Comment 7 Howard Chu 2016-07-06 22:58:05 UTC
(In reply to Stefan Brüns from comment #6)
> Fun fact:
> LMDB on-disk format seems to be dependent on sizeof(ptr_t) - I can open the
> db index file on my 32bit RPi1, but not on my x86_64.

Correct, LMDB files are architecture-dependent. If you want to avoid this word-size dependency, you should define MDB_VL32 when building on 32bit arches - then it will be 64bit clean and identical to the 64bit build.
Comment 8 Howard Chu 2016-07-06 23:00:02 UTC
(MDB_VL32 is not in a public release, only in mdb.master.)
Comment 9 Christoph Cullmann 2016-09-12 11:46:03 UTC
*** Bug 358956 has been marked as a duplicate of this bug. ***
Comment 10 Christoph Cullmann 2016-09-12 11:48:32 UTC
*** Bug 361696 has been marked as a duplicate of this bug. ***
Comment 11 Eric Forgeot 2016-11-29 09:41:15 UTC
I've upgraded my Linuxmint 17.3 to linuxmint 18, installed the new KDE plasme (based on qt5) and cleaned my previous KDE settings. 

KDE5 looks so great, but it is very unresponsive because of baloo which eat most of the CPU. It took up to 10 seconds to open a new window in konsole for example. At first I've removed the file search for my home folder, and it seemed to be ok. I've waited a couple of days then rebooted the computer. Then baloo started to eat most of the CPU and it lasted for most than 30 minutes but it was so annoying because I couldn't use the computer (15 seconds for a new tab on firefox)

Then I've completely disabled the file search in ~/.config/baloofilerc and now everything is very responsive and fine.

(my computer is quite powerful, quadcore with 8 GB ram and ssd hard drive)


Baloo should just NOT be enabled by default on KDE.
Comment 12 Stefan Brüns 2016-11-29 15:12:33 UTC
(In reply to Eric Forgeot from comment #11)
> I've upgraded my Linuxmint 17.3 to linuxmint 18, installed the new KDE
> plasme (based on qt5) and cleaned my previous KDE settings. 
> 
> KDE5 looks so great, but it is very unresponsive because of baloo which eat
> most of the CPU. It took up to 10 seconds to open a new window in konsole
> for example. At first I've removed the file search for my home folder, and
> it seemed to be ok. I've waited a couple of days then rebooted the computer.
> Then baloo started to eat most of the CPU and it lasted for most than 30
> minutes but it was so annoying because I couldn't use the computer (15
> seconds for a new tab on firefox)
> 
> Then I've completely disabled the file search in ~/.config/baloofilerc and
> now everything is very responsive and fine.
> 
> (my computer is quite powerful, quadcore with 8 GB ram and ssd hard drive)
> 
> 
> Baloo should just NOT be enabled by default on KDE.

Mint 18 has a severely outdated KDE/KF5 5.6, blame Mint.
Comment 13 Stefan Brüns 2018-10-31 18:03:00 UTC
Infinnite loops are fixed with https://phabricator.kde.org/D12335