Bug 434919

Summary: Baloo does not exclude folders correctly if $HOME has a trailing slash
Product: [Frameworks and Libraries] frameworks-baloo Reporter: Oded Arbel <oded>
Component: generalAssignee: Stefan BrĂ¼ns <stefan.bruens>
Status: RESOLVED FIXED    
Severity: normal CC: nate, tagwerk19
Priority: NOR    
Version: 5.80.0   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In: 5.82

Description Oded Arbel 2021-03-25 07:21:19 UTC
SUMMARY
When $HOME is set with a trailing slash (something caused /etc/passwd to be set like that, not sure what, pretty sure it wasn't manual), the exclude folder list generated by the Baloo System Settings configuration module contains paths with double slash components:

----8<----
$ grep exclude\ folders ~/.config/baloofilerc 
exclude folders[$e]=$HOME/.cache/,$HOME/mnt/,$HOME/snap/
$ balooctl config show excludeFolders
kf.baloo: Folder cache: std::vector("/home/odeda//.cache/": excluded, "/home/odeda//snap/": excluded, "/home/odeda//mnt/": excluded, "/home/odeda/": included)
/home/odeda//.cache/
/home/odeda//snap/
/home/odeda//mnt/
----8<----

When that happens, baloo ignores the exclude folders configuration and scans  the excluded paths.

STEPS TO REPRODUCE
1. Setup /etc/password with a slash at the end of your home directory, log out and back in
2. Use System Settings to remove all excluded folders and then add some back in
3. Wait for baloo to start indexing files

OBSERVED RESULT
Baloo indexes files in the excluded folders.

EXPECTED RESULT
Baloo should understand that `//` isn't referring to a directory with an empty name and should be canonicallized in the same way that coreutils' realpath does it, and not scan excluded folders.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 
KDE Plasma Version: 5.21.80
KDE Frameworks Version: 5.81.0
Qt Version: 5.15.2

ADDITIONAL INFORMATION
Comment 1 tagwerk19 2021-03-25 08:02:35 UTC
Can confirm...
Comment 2 Nate Graham 2021-03-25 21:57:47 UTC
Interesting. I'll investigate.
Comment 3 Oded Arbel 2021-03-25 23:09:05 UTC
(In reply to Nate Graham from comment #2)
> Interesting. I'll investigate.

This change solved the problem for me: https://invent.kde.org/frameworks/baloo/-/merge_requests/43
Comment 4 Nate Graham 2021-03-25 23:31:55 UTC
Oh nice!

An alternative or supplemental solution might be to sanitize the input paths entered using the KCM, which lives in plasma-desktop.
Comment 5 Oded Arbel 2021-03-26 05:19:23 UTC
(In reply to Nate Graham from comment #4)
> Oh nice!
> 
> An alternative or supplemental solution might be to sanitize the input paths
> entered using the KCM, which lives in plasma-desktop.

That wouldn't work as an alternative in this case as the KCM sets up the paths using $HOME/  (and I think that's correct) so in my case KCM would still cause the config file to contain the double slash.

That code change, does cause the KCM to display the configured paths as "~//folder name", which is a bit weird, so some modification there might also be in order - I'll take a look.
Comment 6 Oded Arbel 2021-04-21 17:45:29 UTC
Git commit 7b0cab3aa4b1c8b1eb6a393c2226630923998da5 by Oded Arbel.
Committed on 21/04/2021 at 07:45.
Pushed by ngraham into branch 'master'.

When adding a folder to configuration, normalize the path semantically

The folder path is later used in string matches so it must be normalized
by removing double directory sepearators, up dir and other valid Unix directory
spec parts that are valid but confuse `canBeSearched()`.

M  +1    -1    src/file/fileindexerconfig.cpp

https://invent.kde.org/frameworks/baloo/commit/7b0cab3aa4b1c8b1eb6a393c2226630923998da5