SUMMARY Searching for metadata is very useful and many would prefer having it but not necessarily have their whole documents contents indexed fully Example: baloosearch "type:Audio" "Duration>300" would show no result unless I enabled "Also index file content" which would index my 1TB documents in the folder that contains the Audio files STEPS TO REPRODUCE 1. uncheck "Also index file content" 2. baloosearch "type:Audio" "Duration>1" OBSERVED RESULT No results shown EXPECTED RESULT There should be a way to disable indexing file content but still enable metadata lookup SOFTWARE/OS VERSIONS KDE Plasma Version: 5.21.5 KDE Frameworks Version: 5.82.0 Qt Version: 5.15.2+kde+r192 ADDITIONAL INFORMATION I found this https://bugs.kde.org/show_bug.cgi?id=417170 But it's marked as resolved and couldn't figure out why it's closed as fixed.
(In reply to Munzir Taha from comment #0) > ADDITIONAL INFORMATION > I found this > https://bugs.kde.org/show_bug.cgi?id=417170 > But it's marked as resolved and couldn't figure out why it's closed as fixed. I would suspect that "Duration" is a tag embedded in the audio (MP3?) file. Embedded tags mean that baloo has to open the file, find the tags and index them. If you run something like EasyTag it can show you these ID3 tags Baloo also indexes tags, comments, ratings that are held as file attributes - not internal to the file but held together with other file attributes such as modification times (roughly). You can query these extended attributes with the getfattr command: getfattr -d yourfile If you tag something in Dolphin or give it a rating, you can see these extended attributes changing. You can add/change these attributes without making any change _to_ your files and baloo indexes them without requiring "content indexing" Watch out, not all filesystems are able to hold extended attributes. If you've added a tag to a file on an Ext4 or BTRFS filesystem and then copied the file to a USB formatted with FAT, the extended attributes are lost. Have to say https://api.kde.org/frameworks/baloo/html/searching.html doesn't make it clear that tags can come from different sources (but I don't think that if baloo reads a tag, it remembers where it read it from...) I've not tried it but if you want to index audio and not documents, you might be able to make use of "exclude mimetypes" https://community.kde.org/Baloo/Configuration#Exclude_Mimetypes
> EasyTag it can show you these ID3 tags Yes, I know how to display the Duration of the file, e.g $ mediainfo M.mp3 |grep Duration Duration : 18 s 468 ms But then how can a normal user show all the files in my system with duration > 18s? If baloo considers such attributes file contents, that's fine but then we need "index contents" to be per file type. So, one can enable "index file content" for Audio but not for Text, Documents, PDF, ...
Strange! Seems baloo does index those attributes without enabling content indexing in tmpfs but not in ext4. ~> balooctl config list contentIndexing no ~> balooshow ~/Music/MtoF.mp3 27e32fc00000805 2053 41825020 /home/munzir/Music/MtoF.mp3 Mtime: 1219511595 2008-08-23T20:13:15 Ctime: 1547560464 2019-01-15T16:54:24 ~> cp ~/Music/MtoF.mp3 /tmp/ ~> balooctl index /tmp/MtoF.mp3 Indexing /tmp/MtoF.mp3 File(s) indexed ~> balooshow /tmp/MtoF.mp3 22b00000025 37 555 /tmp/MtoF.mp3 Mtime: 1621167208 2021-05-16T15:13:28 Ctime: 1621167208 2021-05-16T15:13:28 Cached properties: Bitrate: 128000 Channels: 2 Duration: 18 Sample Rate: 44100 ~> df -hT ~/Music/ /tmp/ Filesystem Type Size Used Avail Use% Mounted on /dev/sda5 ext4 1.4T 1.2T 68G 95% / tmpfs tmpfs 3.9G 80M 3.8G 3% /tmp
(In reply to Munzir Taha from comment #3) > Strange! Seems baloo does index those attributes without enabling content > indexing in tmpfs but not in ext4. OK, there's some wierdness here.... > ~> balooctl config list contentIndexing > no No content indexing by default, but ... > ~> balooctl index /tmp/MtoF.mp3 > Indexing /tmp/MtoF.mp3 > File(s) indexed ... you've asked that /tmp/MtoF.mp3 be indexed. It seems that "balooctl index" requests a content index even if the default is "off". I think it's the request rather than anything related to ext4 versus tmpfs. However, there does seem to be a glitch with: balooctl index ~/Music/MtoF.mp3 balooshow -x ~/Music/MtoF.mp3 You don't get the tags. Whereas clearing the entry and requesting an index: balooctl clear ~/Music/MtoF.mp3 balooctl index ~/Music/MtoF.mp3 balooshow -x ~/Music/MtoF.mp3 Seems to give you the tags...
> It seems that "balooctl index" requests a content index even if the default is "off". Which is definitely a bug. balooctl staus after balooctl check always shows: Files waiting for content indexing: 0 I tried to check with balooctl monitor which used to show the files being indexed and it doesn't even show any result however hard I tried. Weirdness++
(In reply to Munzir Taha from comment #5) > Which is definitely a bug. Well, unexpected :-) > balooctl staus after balooctl check always shows: > Files waiting for content indexing: 0 I don't think anything appears in the 'Files waiting' count if you are not indexing content. "Basic" indexing is done immediately. Baloo queues up files for content indexing and sends them off in batches to a separate process. > I tried to check with balooctl monitor which used to show the files being > indexed and it doesn't even show any result however hard I tried. Weirdness++ Balooctl monitor lists the files as the "content indexing" is done. If you have not selected content indexing, you just get an "Indexing modified files" when baloo is told of a change. However, it does seem that if you manually ask for something to be indexed: balooctl index alonglongfile this doesn't appear in "balooctl monitor" This doesn't answer your original question of > But then how can a normal user show all the files in my system with duration > 18s? I'm not sure there's a way without enabling content indexing as you need to read the files to read the "embedded" tags.
> However, it does seem that if you manually ask for something to be indexed: > balooctl index alonglongfile > this doesn't appear in "balooctl monitor" Exactly, what I meant by mentioning *monitor* not working. So, to summarize: 1. balooctl index should obey the config files or there should be another command to do that. 2. If balooctl montor only shows files when "content indexing" is requested. It shouldn't differentiate between manually and automatically. Though I don't see the reason why not monitor basic indexing too. 3. Allow content indexing per file type, so one can enable it for audio and video but not for text or documents as an example.
*** Bug 437396 has been marked as a duplicate of this bug. ***