Bug 427455 - Have baloo default to XDG folders instead of user home
Summary: Have baloo default to XDG folders instead of user home
Status: RESOLVED INTENTIONAL
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR wishlist
Target Milestone: ---
Assignee: Stefan Brüns
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-08 15:58 UTC by Thiago Sueto
Modified: 2023-01-01 19:24 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thiago Sueto 2020-10-08 15:58:32 UTC
I think it would make sense to set the xdg user directories (Documents, Downloads, Music, Pictures, Videos and Desktop) as the default folders for baloo search instead of $HOME.

* Terminals default to home, so when you git clone, wget/curl or use something like kdesrc-build, it will likely be there and be scanned.
* VirtualBox and some other applications default their general folders there too.
* With Dolphin now defaulting to the last used folder, users won't necessarily see $HOME as their default folder when running Dolphin, so there would be less random files scattered around $HOME.

Such things might create unnecessary initial checks for files to be indexed.

Advantages:

* Lower amount of files to scan > quicker scan > less time of CPU usage > less user complaints.
* Less bug reports regarding unexpected files provided by random applications that shouldn't be indexed to begin with.
* Less file types that could cause issues to baloo.
* The search KCM won't have a one- or two-item list under "Folder specific configuration".
* Because the search KCM is more populated, users would feel more compelled to check the interface and improve baloo filters since it's just two clicks in a combobox.
* For further making it easier for users, adding the folder .config as "Not indexed" to "Folder specific configuration" would be useful both for the above and to make the setting "Index hidden files and folders" more meaningful.
* Technical users and developers won't suffer from this since they can just configure specific folders to index.
Comment 1 Thiago Sueto 2020-10-08 16:17:33 UTC
In my case, for instance, the files I want indexed are those in Documents. With this setting, I have 1748 files there from the last six months, and I'm a heavy baloo user who relies on it for work. It takes a bit more than two minutes to scan all of it on an SSD.

I also localize KDE software and sometimes dabble with kdesrc-build, and enabling $HOME instantly renders over 177,000 files.

Not indexing the localization folders drops the number to 169,000.

If I instead don't index the kdesrc-build folders, it gets down to 17,000 files.

So it's quite the significant improvement.
Comment 2 Stefan Brüns 2020-10-08 20:06:21 UTC
(In reply to Thiago Sueto from comment #0)
> I think it would make sense to set the xdg user directories (Documents,
> Downloads, Music, Pictures, Videos and Desktop) as the default folders for
> baloo search instead of $HOME.
> 
> * Terminals default to home, so when you git clone, wget/curl or use
> something like kdesrc-build, it will likely be there and be scanned.
> * VirtualBox and some other applications default their general folders there
> too.
> * With Dolphin now defaulting to the last used folder, users won't
> necessarily see $HOME as their default folder when running Dolphin, so there
> would be less random files scattered around $HOME.
> 
> Such things might create unnecessary initial checks for files to be indexed.
> 
> Advantages:
> 
> * Lower amount of files to scan > quicker scan > less time of CPU usage >
> less user complaints.

No. Users already complain if baloo does not find files below '/'.

> * Less bug reports regarding unexpected files provided by random
> applications that shouldn't be indexed to begin with.

These files may be located anywhere.

> * Less file types that could cause issues to baloo.

Unrelated. If there is a problem with some file type, file a bug. Again, these files may be also located unter e.g. Documents.

> * The search KCM won't have a one- or two-item list under "Folder specific
> configuration".

Why is this bad?

> * Because the search KCM is more populated, users would feel more compelled
> to check the interface and improve baloo filters since it's just two clicks
> in a combobox.
> * For further making it easier for users, adding the folder .config as "Not
> indexed" to "Folder specific configuration" would be useful both for the
> above and to make the setting "Index hidden files and folders" more
> meaningful.
> * Technical users and developers won't suffer from this since they can just
> configure specific folders to index.

You can just disable indexing $HOME and add Documents etc.

For me this just looks like want to make *your personal preference* the default ...
Comment 3 Thiago Sueto 2020-10-08 21:43:29 UTC
Well, I can't argue with that: it is a personal preference. I just thought it would work well as opposed to something like "Just disable default content indexing", which is what openSUSE has been doing for a while.

Baloo has been in a vicious cycle where a common workaround is to disable it entirely instead of reporting bugs, and without reporting bugs users keep complaining about issues they face, further harming its image and motivating others to disable it. So I assumed changing the defaults could improve this situation, even if not my specific suggestion, so I don't mind if this bug report is closed if you think the current situation is alright.

As for your points:

> No. Users already complain if baloo does not find files below '/'.
I recall that one, but that's definitely something that takes quite a while to index and uses a bunch of resources.
I wouldn't personally see it as a good move for defaults since this might render at least half an hour of indexing after login (haven't really tested this to see the time spent on indexing). It's more the kind of thing that the user should choose to do knowing the consequences instead.

> > * Less bug reports regarding unexpected files provided by random
> > applications that shouldn't be indexed to begin with.
> 
> These files may be located anywhere.
I agree, but my point was more about priority/likelihood.
I lack the data, but I'd assume people populate their Documents and Music folders with searchable items more than home itself.
I'd also assume that config files are less likely to be searched.
If priority/likelihood is irrelevant here, my whole suggestion is irrelevant too.

> > * Less file types that could cause issues to baloo.
> 
> Unrelated. If there is a problem with some file type, file a bug. Again,
> these files may be also located unter e.g. Documents.
I agree, it can happen anywhere. Just the likelihood might be different, as mentioned above.

> > * The search KCM won't have a one- or two-item list under "Folder specific
> > configuration".
> 
> Why is this bad?
This is not bad, but it could be friendlier. Having more default suggested folders that can be toggled by two clicks is certainly more inviting to use than adding each individual folder manually. Extra folders could still be added. If anything, I'd like to see this specific suggestion, even if the rest isn't done. What do you think?

> > * Because the search KCM is more populated, users would feel more compelled
> > to check the interface and improve baloo filters since it's just two clicks
> > in a combobox.
> > * For further making it easier for users, adding the folder .config as "Not
> > indexed" to "Folder specific configuration" would be useful both for the
> > above and to make the setting "Index hidden files and folders" more
> > meaningful.
> > * Technical users and developers won't suffer from this since they can just
> > configure specific folders to index.
> 
> You can just disable indexing $HOME and add Documents etc.
I am one of those technical users, so it is not an issue to me personally. I'm just suggesting things.
Comment 4 Nate Graham 2020-10-12 17:56:09 UTC
In general, it's best to use bug reports to describe a problem rather than proposing a solution, because there's the danger (from your perspective) that the solution is rejected and  you're left frustrated because you still have the same problem. :)

---

And we have to reject your proposed solution, sorry. :)

Basically you are proposing using a whitelist instead of a blacklist (which is the current approach). GNOME's Tracker uses a whitelist and behaves as you're describing, as a point of comparison.

The problem with using a whitelist is that it puts the burden on the user to include all of the files they care about being indexed which may be outside of the standard XDG dirs. For example, if the user creates a folder in ~ called "Books" and puts their books and epubs etc. in it, those files won't be indexed under your proposal, and the user will need to know that they have to manually include this location. That's not very user friendly. In my experience, a lot of regular users totally ignore the ~/Documents folder and make up their own file organization structure. Sometimes they put their thing son the Desktop, which is an XDG location, so their files would be indexed. Sometimes they make new folders inside ~, which results in them not being inside XDG locations, so they don't get indexed.

---

FWIW Baloo currently has virtual disk related stuff and git repos in its blacklist, so your examples aren't actually problems right now (unless there's a bug). Using kdesrc-build is a corner case since by default it does out-of-source builds (good) which results in files being created outside of the git repos, which means they will be indexed.

For this situation, we have a few options:
1. Improve our developer documentation to specifically recommend adding ~/kde to the "don't index these paths" list in the Search KCM
2. Have kdesrc-build do the above automatically the first time it's run

If you personally want to use the whitelist approach exclusively, you can add your home folder to the exclusions list and then add ~/Documents to the inclusions list.
Comment 5 Dennis Schridde 2023-01-01 11:57:06 UTC
Fedora defaults to this behaviour: https://src.fedoraproject.org/rpms/kf5-baloo/blob/f37/f/baloo-5.67.0-baloofile_config.patch
Comment 6 tagwerk19 2023-01-01 19:24:42 UTC
(In reply to Dennis Schridde from comment #5)
> Fedora defaults to this behaviour:
> https://src.fedoraproject.org/rpms/kf5-baloo/blob/f37/f/baloo-5.67.0-baloofile_config.patch
This introduces a bundle of eccentricities in Dolphin. Dolphin "falls back" to a more basic search (a "there and then" recursive scan down into your folders) if you move to an unindexed folder in Dolphin and then do a "From Here" search - and as in Fedora your $HOME is not indexed, you will normally end up with the slower/basic search.

You can go down quite a rabbit hole if you want to work out exactly when Dolphin search makes use of baloo and when not, see:

    https://bugs.kde.org/show_bug.cgi?id=424871#c4

and there's another layer of complication if you have symlinks, see Bug 447119.