Bug 437197

Summary: Baloo seems to be too aggressive and blocks disc access
Product: [Frameworks and Libraries] frameworks-baloo Reporter: Ian Proudler <i.proudler>
Component: Baloo File DaemonAssignee: Stefan Brüns <stefan.bruens>
Status: RESOLVED FIXED    
Severity: wishlist CC: baloo-bugs-null, nate, tagwerk19
Priority: NOR    
Version: 5.68.0   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Ian Proudler 2021-05-16 11:40:08 UTC
SUMMARY

This is not a bug per se. I hope this is the correct place to mention it.

I don't have a big issue with baloo. It's search capability is great. Note I have content indexing switched on. 

Whenever I change a file's content, baloo immediately re-indexes it. The problem I have is that my files are on a HDD and baloo seems to be able to block (perhaps by just being first) access to the disk. This causes a problem with GNU Octave. It uses the timestamp on the file to decide if it needs to 'recompile' a script. Sometimes Octave decides the file does not have to be 'recompiled' even though I have edited it. I suspended baloo and the problem with Octave seems to have gone.

Also when I first login to my machine, baloo springs in to life and block access to the HDD for several seconds. 

Incidentally, I found that, with content indexing on, baloo was very active almost all of the time when I used dolphin. Then I added '.directory' to 'exclude filters'. (I believe '.directory' is something to do with dolphin.) I think it might help to make this the default (for KDE anyway).

Possibly 'block' is too strong a word. It also might have a technical connotation.  'Slow down access' might be a better description.

STEPS TO REPRODUCE
1. just using my computer and watching the disk access and balooctl monitor


OBSERVED RESULT
baloo accesses the disk immediately after a file have been saved.

EXPECTED RESULT
Perhaps it would help if baloo waited for a few seconds?


SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Kubuntu 20.04, kernel 5.4.0-73-generic
KDE Plasma Version: 5.18.5
KDE Frameworks Version: 5.86.0
Qt Version: 5.12.8

ADDITIONAL INFORMATION
Comment 1 tagwerk19 2021-05-17 08:56:14 UTC
Did you want to index hidden files and folders?

You might be able to cut down the number of files "looked at" on startup if you exclude the .cache directory and wastebasket (ref: Bug 434705)
Comment 2 Stefan Brüns 2021-05-17 11:47:41 UTC
(In reply to tagwerk19 from comment #1)
> Did you want to index hidden files and folders?
> 
> You might be able to cut down the number of files "looked at" on startup if
> you exclude the .cache directory and wastebasket (ref: Bug 434705)

Adding .directory to the exclude list is definitely the wrong approach.

Either something is rewriting .directory files recurrently, then there is something wrong with the application, or a change is detected although there is none.
Comment 3 tagwerk19 2021-05-17 19:36:33 UTC
(In reply to Stefan Brüns from comment #2)
> Either something is rewriting .directory files recurrently ...
> ... or a change is detected although there is none.
Possible way forward (if focusing on the .directory files), is to install inotify-tools and run

    inotifywait -mr ~ | grep .directory

in one window and compare this to what baloo is indexing by running:

    balooctl monitor

in a second...

If I do this; running dolphin with per-folder settings enabled and pressing Ctrl-H a few times to show and hide "hidden" files, iNotify shows a lock and temporary file being created, the .directory file being deleted and the tempfile renamed back to .directory.

For me that seems sensible - and it's nice to see inotify showing the actions

In parallel, balooctl monitor reindexes the .directory file each time I change the config.

So, for me, it's behaving. Do you see anything suspicious?

    Neon Unstable
    Plasma: 5.22.80
    Frameworks: 5.83.0
    Qt: 5.15.2
Comment 4 Ian Proudler 2021-05-18 14:03:20 UTC
(In reply to tagwerk19 from comment #3)
> (In reply to Stefan Brüns from comment #2)
> > Either something is rewriting .directory files recurrently ...
> > ... or a change is detected although there is none.
> Possible way forward (if focusing on the .directory files), is to install
> inotify-tools and run
> 
>     inotifywait -mr ~ | grep .directory
> 
> in one window and compare this to what baloo is indexing by running:
> 
>     balooctl monitor
> 
> in a second...
> 
> If I do this; running dolphin with per-folder settings enabled and pressing
> Ctrl-H a few times to show and hide "hidden" files, iNotify shows a lock and
> temporary file being created, the .directory file being deleted and the
> tempfile renamed back to .directory.
> 
> For me that seems sensible - and it's nice to see inotify showing the actions
> 
> In parallel, balooctl monitor reindexes the .directory file each time I
> change the config.
> 
> So, for me, it's behaving. Do you see anything suspicious?
> 
>     Neon Unstable
>     Plasma: 5.22.80
>     Frameworks: 5.83.0
>     Qt: 5.15.2

I don't see anything wrong as such. Baloo is doing what it is supposed to do. It's just that it leaps into action immediately rather than waiting for the PC to idle. 

When baloo reads files it seems, on occasions, to slow down access to the HDD. I was just suggesting that it would be nice if baloo had a lower priority than the user when it came to accessing the disk. 

Although I'm not 100% sure, I feel that baloo's aggressive access to the disk is interfering with the operation of Octave. It certainly slows down the PC when I first login.

Thanks to everyone for their input.
Comment 5 Stefan Brüns 2021-05-18 15:12:52 UTC
Baloo already uses lowest priorities for both I/O and CPU. It also slows itself down by sleeping between files while the computer is used interactively. It delays more resource intensive operations on login, so other resource hungry applications like firefox can start faster.

Baloo does everything possible do be friendly to other applications. But every now an then, it *has* to commit its data to disk.
Comment 6 tagwerk19 2021-05-19 07:22:49 UTC
Here's another guess, maybe worth mentioning...

You are doing content indexing - and indexing a lot of data? How big has your .local/share/baloo/index file become? What does

    balooctl indexSize

say? It could be that baloo is wanting more bits of the database in memory than there is memory available. In that case there'll be lots of going back and forth to disc. Having a look at your system load (memory, swap) might give some clues
Comment 7 Ian Proudler 2021-05-20 13:30:43 UTC
(In reply to Stefan Brüns from comment #5)
> Baloo already uses lowest priorities for both I/O and CPU. It also slows
> itself down by sleeping between files while the computer is used
> interactively. It delays more resource intensive operations on login, so
> other resource hungry applications like firefox can start faster.
> 
> Baloo does everything possible do be friendly to other applications. But
> every now an then, it *has* to commit its data to disk.

My apologies. Perhaps my issue lies with the Linux kernel.
Comment 8 Ian Proudler 2021-05-20 13:34:27 UTC
(In reply to tagwerk19 from comment #6)
> Here's another guess, maybe worth mentioning...
> 
> You are doing content indexing - and indexing a lot of data? How big has
> your .local/share/baloo/index file become? What does
> 
>     balooctl indexSize
> 
> say? It could be that baloo is wanting more bits of the database in memory
> than there is memory available. In that case there'll be lots of going back
> and forth to disc. Having a look at your system load (memory, swap) might
> give some clues

Thanks for the suggestion.

I have 8GB RAM. I have conky running on the desktop showing RAM and swap useage. but I've never noticed baloo causing swap useage. I will take more notice in future.

balooctl indexSize:

File Size: 2.58 GiB
Used:      1.61 GiB

           PostingDB:     495.27 MiB    29.957 %
          PositionDB:     886.12 MiB    53.598 %
            DocTerms:     257.66 MiB    15.585 %
    DocFilenameTerms:       4.32 MiB     0.262 %
       DocXattrTerms:       4.00 KiB     0.000 %
              IdTree:     744.00 KiB     0.044 %
          IdFileName:       3.21 MiB     0.194 %
             DocTime:       1.82 MiB     0.110 %
             DocData:       2.57 MiB     0.155 %
   ContentIndexingDB:            0 B     0.000 %
         FailedIdsDB:            0 B     0.000 %
             MTimeDB:       1.54 MiB     0.093 %
Comment 9 Ian Proudler 2021-05-22 09:47:53 UTC
(In reply to tagwerk19 from comment #6)
> Here's another guess, maybe worth mentioning...
> 
> You are doing content indexing - and indexing a lot of data? How big has
> your .local/share/baloo/index file become? What does
> 
>     balooctl indexSize
> 
> say? It could be that baloo is wanting more bits of the database in memory
> than there is memory available. In that case there'll be lots of going back
> and forth to disc. Having a look at your system load (memory, swap) might
> give some clues

Just in case it helped, I decided to 'purge' the index. I'm happy to report that so far, baloo is is indexing files and contents without any obvious impact on my system.  

So perhaps there was something wrong with my index file. The last time I reset it was some years ago.
Comment 10 tagwerk19 2021-05-22 11:30:41 UTC
Good news!

Yes, there were reports of trouble with 'older' indexes where reindexing was a solution (Bug 431664).

Sounds like it's OK to close.