Bug 490981 - baloosearch6: --directory switch results in no matches
Summary: baloosearch6: --directory switch results in no matches
Status: RESOLVED WORKSFORME
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 6.4.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-29 15:10 UTC by Eugene Shalygin
Modified: 2025-02-21 03:46 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eugene Shalygin 2024-07-29 15:10:38 UTC
baloosearch6 <string> returns a list of files withing the home directory (BTW, incomplete). baloosearch6 -d <home_directory> <same_string> returns nothing.


STEPS TO REPRODUCE
1.  baloosearch6 -d $HOME <whatever> 


SOFTWARE/OS VERSIONS
Operating System: Gentoo Linux 2.15
KDE Plasma Version: 6.1.3
KDE Frameworks Version: 6.4.0
Qt Version: 6.7.2
Kernel Version: 6.10.1-gentoo (64-bit)
Comment 1 Nate Graham 2024-07-30 19:40:15 UTC
Cannot reproduce; works for me exactly as described with current git master.
Comment 2 tagwerk19 2024-08-16 19:38:38 UTC
Likely
    https://bugs.kde.org/show_bug.cgi?id=474973#c27

That used to happen when you had system running BTRFS (such as OpenSUSE and latterly Fedora), the system gave no guarantee that the BTRFS partition was mounted with the same device number as it had on the previous boot... Files were then indexed multiple times and filtering for files under a given folder quite often failed....

The solution is to summon up patience and purge and reindex clean. That was my mistake not noticing the effect of the patch on non-BTRFS systems, see Bug 477068
Comment 3 Eugene Shalygin 2024-08-25 09:47:26 UTC
Although I use BTRFS, this behaviour can be observed without remounting the partition between indexing and queering.
Comment 4 tagwerk19 2024-08-25 22:50:31 UTC
(In reply to Eugene Shalygin from comment #3)
> Although I use BTRFS, this behaviour can be observed without remounting the
> partition between indexing and queering.
If you want to step through the tests... Maybe there's an effect we've not identified....

For one-of-your-files.txt, try:

    $ baloosearch -i one-of-your-files.txt

You might get more then one "hit". The "-i" asks baloosearch to show the internal DocID that Baloo uses, if you get more than one result with different DocIDs and the same directory/filename then you'd had (now or sometime in the past) the BTRFS device number problem.

You can check the details for the file on your filesystem with "stat":

    $ stat one-of-your-files.txt
    
and make a note of the "Device" line. It will have the Device Number and iNode. If you reboot the device number may change. Previously Baloo was confused by this change, now it digs deeper and reads the FSID of the filesystem (which does not change) and bases the DocId on that.

If you reboot and see the stat details change and baloosearch give the same results as before, then you are on safe ground. You will however need to reindex:

   $ systemctl --user stop kde-baloo
   $ baloctl purge
   $ systemctl --user start kde-baloo

and you can watch the reindexing progress with "balooctl monitor". It might take a while.

If your balooserch results continue to change, we'll have to dig down and see what filesystem is (anything layered on top of BTRFS?)
Comment 5 tagwerk19 2025-01-20 19:52:19 UTC
(In reply to tagwerk19 from comment #4)
> If you want to step through the tests... Maybe there's an effect we've not identified....
Did you find anything?
Comment 6 Eugene Shalygin 2025-01-22 12:22:02 UTC
baloosearch6 - d $HOME seems to be working, but content search does not return matches I'd expect it to include. For example, it finds a certain string inside a PDF file, but does not find it in .c source files...
Comment 7 tagwerk19 2025-01-22 14:38:46 UTC
(In reply to Eugene Shalygin from comment #6)
> ... but does not find it in .c source files ...
There's a set of exclusions for code, they are defined as mime-type exclusions which means that Baloo indexes the filenames and metadata but not the content. You can see what's excluded with:
    $ balooctl6 config list excludeMimetypes | sort
and remove the exclusion for C with
    $ balooctl6 config rm excludeMimetypes text/x-csrc
and C headerfiles with
    $ balooctl6 config rm excludeMimetypes text/x-chdr
You can do similar for C++ if you wish.

... You may find you need "balooctl" rather than "balooctl6"
Comment 8 Eugene Shalygin 2025-01-22 21:26:37 UTC
Thank you! This is really unexpected that baloo does exclude something by default, and there is no hint to that in the KCM page. May I ask why does it exclude basically all the file types I would like it to index?
Comment 9 tagwerk19 2025-01-22 23:13:26 UTC
(In reply to Eugene Shalygin from comment #8)
> ... This is really unexpected that baloo does exclude something by
> default, and there is no hint to that in the KCM page. May I ask why does it
> exclude basically all the file types I would like it to index? ...
I agree it's pretty well concealed, it was a surprise to me.

My guess of "why" is that it is easy to download (and delete) large code repositories and this could really slug Baloo, although deleting repositories is still more work than it should be...

Two things have changed, you download to SSD's now rather than HDD's and there's been a strict cap introduced to the memory use of Baloo. It can now index without affecting the rest of the system.
Comment 10 Bug Janitor Service 2025-02-06 03:46:57 UTC
🐛🧹 ⚠️ This bug has been in NEEDSINFO status with no change for at least 15 days. Please provide the requested information, then set the bug status to REPORTED. If there is no change for at least 30 days, it will be automatically closed as RESOLVED WORKSFORME.

For more information about our bug triaging procedures, please read https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging.

Thank you for helping us make KDE software even better for everyone!
Comment 11 Bug Janitor Service 2025-02-21 03:46:51 UTC
🐛🧹 This bug has been in NEEDSINFO status with no change for at least 30 days. Closing as RESOLVED WORKSFORME.