Bug 437189 - Allow "index file content" per file type so it can be enabled for media but not for documents.
Summary: Allow "index file content" per file type so it can be enabled for media but n...
Status: REPORTED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Engine (show other bugs)
Version: 5.82.0
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Stefan Brüns
URL:
Keywords:
: 437396 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-05-16 06:58 UTC by Munzir Taha
Modified: 2021-05-30 17:46 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Munzir Taha 2021-05-16 06:58:33 UTC
SUMMARY
Searching for metadata is very useful and many would prefer having it but not necessarily have their whole documents contents indexed fully

Example:

baloosearch "type:Audio" "Duration>300"
would show no result unless I enabled "Also index file content" which would index my 1TB documents in the folder that contains the Audio files

STEPS TO REPRODUCE
1. uncheck "Also index file content"
2. baloosearch "type:Audio" "Duration>1"

OBSERVED RESULT
No results shown

EXPECTED RESULT
There should be a way to disable indexing file content but still enable metadata lookup

SOFTWARE/OS VERSIONS
KDE Plasma Version: 5.21.5 
KDE Frameworks Version: 5.82.0
Qt Version: 5.15.2+kde+r192

ADDITIONAL INFORMATION
I found this
https://bugs.kde.org/show_bug.cgi?id=417170
But it's marked as resolved and couldn't figure out why it's closed as fixed.
Comment 1 tagwerk19 2021-05-16 10:30:27 UTC
(In reply to Munzir Taha from comment #0)
> ADDITIONAL INFORMATION
> I found this
> https://bugs.kde.org/show_bug.cgi?id=417170
> But it's marked as resolved and couldn't figure out why it's closed as fixed.
I would suspect that "Duration" is a tag embedded in the audio (MP3?) file. Embedded tags mean that baloo has to open the file, find the tags and index them.

If you run something like EasyTag it can show you these ID3 tags

Baloo also indexes tags, comments, ratings that are held as file attributes - not internal to the file but held together with other file attributes such as modification times (roughly). You can query these extended attributes with the getfattr command:
    getfattr -d yourfile
If you tag something in Dolphin or give it a rating, you can see these extended attributes changing. You can add/change these attributes without making any change _to_ your files and baloo indexes them without requiring "content indexing"

Watch out, not all filesystems are able to hold extended attributes. If you've added a tag to a file on an Ext4 or BTRFS filesystem and then copied the file to a USB formatted with FAT, the extended attributes are lost.

Have to say
    https://api.kde.org/frameworks/baloo/html/searching.html
doesn't make it clear that tags can come from different sources (but I don't think that if baloo reads a tag, it remembers where it read it from...)

I've not tried it but if you want to index audio and not documents, you might be able to make use of "exclude mimetypes"
    https://community.kde.org/Baloo/Configuration#Exclude_Mimetypes
Comment 2 Munzir Taha 2021-05-16 12:08:08 UTC
> EasyTag it can show you these ID3 tags

Yes, I know how to display the Duration of the file, e.g

$ mediainfo M.mp3 |grep Duration
Duration                                 : 18 s 468 ms

But then how can a normal user show all the files in my system with duration > 18s?

If baloo considers such attributes file contents, that's fine but then we need "index contents" to be per file type. So, one can enable "index file content" for Audio but not for Text, Documents, PDF, ...
Comment 3 Munzir Taha 2021-05-16 12:17:52 UTC
Strange! Seems baloo does index those attributes without enabling content indexing in tmpfs but not in ext4.

~> balooctl config list contentIndexing
no

~> balooshow ~/Music/MtoF.mp3 
27e32fc00000805 2053 41825020 /home/munzir/Music/MtoF.mp3
        Mtime: 1219511595 2008-08-23T20:13:15
        Ctime: 1547560464 2019-01-15T16:54:24

~> cp ~/Music/MtoF.mp3 /tmp/

~> balooctl index /tmp/MtoF.mp3 
Indexing /tmp/MtoF.mp3
File(s) indexed

~> balooshow /tmp/MtoF.mp3
22b00000025 37 555 /tmp/MtoF.mp3
        Mtime: 1621167208 2021-05-16T15:13:28
        Ctime: 1621167208 2021-05-16T15:13:28
        Cached properties:
                Bitrate: 128000
                Channels: 2
                Duration: 18
                Sample Rate: 44100

~> df -hT ~/Music/ /tmp/
Filesystem     Type   Size  Used Avail Use% Mounted on
/dev/sda5      ext4   1.4T  1.2T   68G  95% /
tmpfs          tmpfs  3.9G   80M  3.8G   3% /tmp
Comment 4 tagwerk19 2021-05-16 18:37:17 UTC
(In reply to Munzir Taha from comment #3)
> Strange! Seems baloo does index those attributes without enabling content
> indexing in tmpfs but not in ext4.
OK, there's some wierdness here....

> ~> balooctl config list contentIndexing
> no
No content indexing by default, but ...

> ~> balooctl index /tmp/MtoF.mp3 
> Indexing /tmp/MtoF.mp3
> File(s) indexed
... you've asked that /tmp/MtoF.mp3 be indexed.

It seems that "balooctl index" requests a content index even if the default is "off". I think it's the request rather than anything related to ext4 versus tmpfs.

However, there does seem to be a glitch with: 
    balooctl index ~/Music/MtoF.mp3 
    balooshow -x ~/Music/MtoF.mp3 
You don't get the tags.

Whereas clearing the entry and requesting an index:
    balooctl clear ~/Music/MtoF.mp3 
    balooctl index ~/Music/MtoF.mp3 
    balooshow -x ~/Music/MtoF.mp3 
Seems to give you the tags...
Comment 5 Munzir Taha 2021-05-18 18:56:23 UTC
> It seems that "balooctl index" requests a content index even if the default is "off".

Which is definitely a bug.
balooctl staus after balooctl check always shows:
Files waiting for content indexing: 0

I tried to check with balooctl monitor which used to show the files being indexed and it doesn't even show any result however hard I tried. Weirdness++
Comment 6 tagwerk19 2021-05-18 21:45:31 UTC
(In reply to Munzir Taha from comment #5)
> Which is definitely a bug.
Well, unexpected :-)

> balooctl staus after balooctl check always shows:
> Files waiting for content indexing: 0
I don't think anything appears in the 'Files waiting' count if you are not indexing content. "Basic" indexing is done immediately. Baloo queues up files for content indexing and sends them off in batches to a separate process.

> I tried to check with balooctl monitor which used to show the files being
> indexed and it doesn't even show any result however hard I tried. Weirdness++
Balooctl monitor lists the files as the "content indexing" is done. If you have not selected content indexing, you just get an "Indexing modified files" when baloo is told of a change.

However, it does seem that if you manually ask for something to be indexed:

    balooctl index alonglongfile

this doesn't appear in "balooctl monitor"

This doesn't answer your original question of 

> But then how can a normal user show all the files in my system with duration > 18s?
I'm not sure there's a way without enabling content indexing as you need to read the files to read the "embedded" tags.
Comment 7 Munzir Taha 2021-05-20 10:54:28 UTC
> However, it does seem that if you manually ask for something to be indexed:
>    balooctl index alonglongfile
> this doesn't appear in "balooctl monitor"

Exactly, what I meant by mentioning *monitor* not working. So, to summarize:

1. balooctl index should obey the config files or there should be another command to do that.
2. If balooctl montor only shows files when "content indexing" is requested. It shouldn't differentiate between manually and automatically. Though I don't see the reason why not monitor basic indexing too.
3. Allow content indexing per file type, so one can enable it for audio and video but not for text or documents as an example.
Comment 8 tagwerk19 2021-05-20 11:49:48 UTC
*** Bug 437396 has been marked as a duplicate of this bug. ***