SUMMARY After receiving suggestions to my request (bug: 450006), I was able to work with the side bar (F11) to check suggested tags from a list and also to add new tags to this list. But then I had a problem with duplicate entries of the same file names marked with tags. This looks like (e.g.): -The_Monster.mp4 -The_Monster.mp4(1) Both films are listed in the tags:/ folder ... located in /mnt/NAS/Filme/ though the file is existing only ONE time. I had no idea, why. So, I tried to resume and restart baloo but this didn't help. No change. Finally I decided to delete the index and baloo file in my home directory ~/.local/share/baloo. After doing this I resumed baloo, which then rebuilt the index. I thought, so far, so good. It looked fine. Then when I went on tagging files again, I also added new tags, which where check automatically and therefore accepted for the file I tagged. I also pressed F5 for within the tags:/ folder. But when I then wanted to tag my next file(s), the just recent added new tag is not being proposed for selection. On the other hand when I open the file menu (mouse right click) I can open entry 'Tags' and here my recent added tag IS existing. I don't understand this really. And I would really like to know, how can I manually restart an indexing and how to update the tag list entries? Where is the location of the tag folder, I don't mean tags:/, I mean the real location on disk. STEPS TO REPRODUCE Using Dolphin. Pressing F11 (once) Adding file. Mark this file. Select 'tags' from side bar. Add a new tag (->checked automatically) Save Pressing F5 ... Adding file. Mark file. Select 'tags' from side bar. Looking for recent added tag *** not listed *** ??? When I add the same tag again, this file (and the one before as well) are listed in the tags:/ folder! OBSERVED RESULT Looking for recent added tag *** not listed *** ??? When opening file menu, Tags, the added tags are shown, regardless if checked or not Expected RESULT Recent added tag should be listed to check for future files Linux: Arch Linux, Kernel 5.16.8 KDE Plasma Version: Plasma 5.24.0 (08/02/2022) KDE Frameworks Version: 5.90.0 Qt Version: 5.15.2 ADDITIONAL INFORMATION Consulted Bug: Bug 311204, but not similar. I consulted bug Bug 401019. I confess that the baloo topic is hard to understand for me, because of different expressions to my native language. I know a little about balooctl, and there is a built in baloo search handling in KDE settings as well. Can I completely reset baloo, then restart it again with my (old) assigned folders to receive a successfully working baloo keeping my recently entered tags in the assigned data folders? What do I need to do to achieve this? -Linuxfluesterer
When you add a tag, you are adding something to the "metadata" for the file. Think of it as something like the modification time; it's there in the filesystem but not part of the file (and not written to another file/folder somewhere else). (You will find that there _are_ also tags embedded in files, EXIF and ID3 tags. A good example being "Duration"...) If you've tagged; The_Monster.mp4 say with "Test", you can see the tags with the command line tool getfattr $ getfattr -d The_Monster.mp4 ... user.xdg.tags="Test" OK, that's the first stage. A point to watch here is that the Linux filesystems (such as ext2/3/4, btrfs) can deal with the "metadata" but a FAT system on a USB stick cannot. So, need to be careful copying files to a USB stick... Second stage is that Baloo "is told that" there's been a change and goes and indexes the new information. Here it will read the info for "The_Monster.mp4" and write it to its database (which as you've found, is a big file under .local/share/baloo). You asked "where is the location of the tag folder", the answer is that it when you look at, say tags:/Test, Baloo is doing a lookup in its index and giving you the results. There quite some magic behind the scenes. Assuming that Baloo has indexed the file, you can ask what info it has on it $ balooshow -x The_Monster.mp4 This will give you a load of data, that should include "Test" and also the other metadata "embedded" in the file. This should work with the file on your local disc. It's possible (quite possible) it won't if the file is on a NAS - so that's the first test to try, compare the behaviour "locally" and on the NAS. (In reply to linuxfluesterer from comment #0) > But then I had a problem with duplicate entries of the same file names > marked with tags. This looks like (e.g.): > -The_Monster.mp4 > -The_Monster.mp4(1) This is with Dolphin? I know I've seen the same (earlier times, not so often now) You can see what Baloo gives you with a $ baloosearch -i The_Monster.mp4 If you get a single line, then Dolphin is generating the "ghost file" with the "(1)", if there's more than one line then it's a problem with Baloo - that it has seen the file more than once and not forgotten the earlier version. That's the second test to try.... > Finally I decided to delete the index and baloo file in my home directory > ~/.local/share/baloo. After doing this I resumed baloo, which then rebuilt > the index. I thought, so far, so good. The cleanest way of getting Baloo to restart the indexing "from scratch" is $ balooctl purge That stops the indexing processes, deletes the index and restarts the processes. I think I'll pause here and let you try things out, we can then work out what to do next. Wishing you luck
(In reply to tagwerk19 from comment #1) > When you add a tag, you are adding something to the "metadata" for the file. > Think of it as something like the modification time; it's there in the > filesystem but not part of the file (and not written to another file/folder > somewhere else). > > (You will find that there _are_ also tags embedded in files, EXIF and ID3 > tags. A good example being "Duration"...) > > If you've tagged; > The_Monster.mp4 > say with "Test", you can see the tags with the command line tool getfattr > $ getfattr -d The_Monster.mp4 > ... > user.xdg.tags="Test" > > OK, that's the first stage. A point to watch here is that the Linux > filesystems (such as ext2/3/4, btrfs) can deal with the "metadata" but a FAT > system on a USB stick cannot. So, need to be careful copying files to a USB > stick... > > Second stage is that Baloo "is told that" there's been a change and goes and > indexes the new information. Here it will read the info for > "The_Monster.mp4" and write it to its database (which as you've found, is a > big file under .local/share/baloo). You asked "where is the location of the > tag folder", the answer is that it when you look at, say tags:/Test, Baloo > is doing a lookup in its index and giving you the results. There quite some > magic behind the scenes. > > Assuming that Baloo has indexed the file, you can ask what info it has on it > > $ balooshow -x The_Monster.mp4 > > This will give you a load of data, that should include "Test" and also the > other metadata "embedded" in the file. > > This should work with the file on your local disc. It's possible (quite > possible) it won't if the file is on a NAS - so that's the first test to > try, compare the behaviour "locally" and on the NAS. > > (In reply to linuxfluesterer from comment #0) > > But then I had a problem with duplicate entries of the same file names > > marked with tags. This looks like (e.g.): > > -The_Monster.mp4 > > -The_Monster.mp4(1) > This is with Dolphin? I know I've seen the same (earlier times, not so often > now) > > You can see what Baloo gives you with a > $ baloosearch -i The_Monster.mp4 > > If you get a single line, then Dolphin is generating the "ghost file" with > the "(1)", if there's more than one line then it's a problem with Baloo - > that it has seen the file more than once and not forgotten the earlier > version. That's the second test to try.... > > > Finally I decided to delete the index and baloo file in my home directory > > ~/.local/share/baloo. After doing this I resumed baloo, which then rebuilt > > the index. I thought, so far, so good. > The cleanest way of getting Baloo to restart the indexing "from scratch" is > $ balooctl purge > That stops the indexing processes, deletes the index and restarts the > processes. > > I think I'll pause here and let you try things out, we can then work out > what to do next. > > Wishing you luck Hallo, Tagwerk... thank you very much for these detailed information about baloo. I've read it, I will have to read it again and test out to learn and understand about how to deal with baloo and the tools I can handle the files, especially its tags. But before I've read your information here, I've checked my baloo assigned directory to work with the example tags I've added yesterday afternoon, when the described problem with the missing tags appeared. So, I finally have decided to stop baloo with 'disable', then delete the baloo database and index.lock file. Then I started baloo again, only with the home directory assigned (in KDE System Settings, ->Search. It took a while until baloo began to index my home directory then. Ok... Then 'I've added my intended data directory to be indexed, which I've added tags to its files within in yesterday morning and baloo began to index this directory as well (I've tested the size of baloo database). In the 'directory' tags:/ then all tags of my data directory where listed. This looked fine. Now today the bad news: While baloo remembers my home directory tagged files, it 'forgots' my data directory. No additional tags where shown in tags:/ ! Only the tags of my home are listed. Even pressing F5 in tags:/ directory didn't help. In System Settings, ->Search, my data directory is marked as 'indexed'. I have to mention, that my data directory is mounted in a network. After mounting, I tag my files within the remote directory. I've in mind, that tagging network mounted devices should be actually possible. Am I wrong? Does baloo not automatically load its database after booting resp. starting KDE Plasma? I'm not eager to always purge the baloo database to restart indexing from scratch. -Linuxfluesterer
... Some additional information: Yes, I had multiple shadow files (1) with dolphin. And I still have not the same tag list, when I click on 'edit tags' for a file on the side bar (F11, meta information???). Here appear only the already assigned tags to the actually marked file. When I instead work with the file menu, ->tags entry, I RECEIVE ALL possible tags. So, my question is: Why are the possible tags are not listed to check when I work with the F11 side bar? How can I get them (back)? Is it possible to 'pull' the tag entries from 'tags:/' into suggestions when working with F11 editing tags'? -Linuxfluesterer
Created attachment 146778 [details] Tag Dialog (second example..)
(In reply to linuxfluesterer from comment #3) > ... I still have not the same tag list, when I click on 'edit tags' > for a file on the side bar (F11, meta information???). Here appear only the > already assigned tags to the actually marked file. When I instead work with > the file menu, ->tags entry, I RECEIVE ALL possible tags ... You mean, rather than: https://bugsfiles.kde.org/attachment.cgi?id=146601 you see just the tags you've selected: https://bugsfiles.kde.org/attachment.cgi?id=146778 Is that right? Baloo builds it list of tags "from the tags it has seen". If there's a file flagged with "Comedy", then you'll get "Comedy" appearing in the "All Tags" folder and in the pop up lists. If you delete that file, then the choice "Comedy" will disappear from the lists (with the proviso that it may take a moment for Baloo and Dolphin to notice, and maybe an F5). It's also possible, if you have purged and are rebuilding the index, that you've caught Baloo in the middle of its reindexing. You get different lists depending on whether you right click on the file and look at the "Assign Tags" pop up list or if you press F11 and look at the tags in the Information Panel? I will see if I can see the same...
(In reply to linuxfluesterer from comment #2) > ... After mounting, I tag my files within the remote directory. I've in mind, that > tagging network mounted devices should be actually possible. Am I wrong? ... This is what the "getfattr", "balooshow" and "baloosearch" tests should find out. I don't have experience running baloo on a NAS (and there are different sorts of NAS) but I've read it described as not supported. There's a couple of reasons why the indexing might not work: Firstly, Baloo doesn't see the tags (as the information is not visible in the NAS): checking with "getfattr" should tell you this: $ getfattr -d The_Monster.mp4 Secondly, Baloo might be seeing the "file identifiers" changing: If you try: $ baloosearch -i The_Monster.mp4 look to see whether you get a long hex number before the matching filename. You might get something like: 1d5a0000fc11 /home/test/Videos/The_Monster.mp4 If you have multiple results listed, do they come with different numbers? If you log out and back in again or you reboot your systems, do you get different numbers?
Hallo Tagwerk, I will have to investigate with your suggested commands. But one thing I can tell you here. I've built my own NAS with a 'real' ARCH Linux on an Intel Pentium like processor. So, it's just an ordinary 64-bit Linux Computer with 2 hard disks inside. I'm using x11vnc, html, mariadb, ftp and nfs servers. The file system of the device of my data directory I want to index is ext4. I mount the remote directory via nfs and I'm trying to index this mounted directory with baloo from my host. -Linuxfluesterer
(In reply to linuxfluesterer from comment #7) > ... The file system of the > device of my data directory I want to index is ext4. I mount the remote > directory via nfs and I'm trying to index this mounted directory with baloo > from my host ... Baloo should have little trouble with Ext4... Baloo builds its internal "file identifier" from the inode and the device number. With Ext2/3/4 the inode is stable and only in unusual situations (when adding/removing discs or changing the order they are mounted) do you see the Device number change. If you are mounting with NFS, watch to see whether you get different inode values each time you mount. It would be interesting to see what you see... There's an additional command line tool you can try to get extra information about a file "on disc": $ stat The_Monster.mp4 That should give you a line including the "Device:" and "Inode:" numbers. If these keep changing then baloo is going to have trouble :-/ Good Luck...
Hallo Tagwerk, you've asked for the output of: baloosearch --id The_Monster.mp4 6e5300000050 /mnt/NAS/Filme/The_Monster.mp4 6e5300000055 /mnt/NAS/Filme/The_Monster.mp4 Verstrichen: 1,45639 msec I have more duplicate results as above. On some files (e.g. foo.bar(1)), I can untick the tag, where 'foo.bar' is listed twice. Randomly foo.bar(1) has disappeared after pressing F5, sometimes not. No idea, why. At any rate, even foo.bar(1) disappears, F5 then in folder tags:/, the command: baloosearch --id foo.bar 6e5300000050 /mnt/NAS/Filme/foo.bar 6e5300000055 /mnt/NAS/Filme/foo.bar Verstrichen: 1,45639 msec ... still has two entries. To your question to the tag proposals: "I see my tags I've selected: https://bugsfiles.kde.org/attachment.cgi?id=146601 but I'm missing the tag, I've added (new tag!) for the file just before. -Linuxfluesterer
(In reply to linuxfluesterer from comment #9) > ... I have more duplicate results as above ... Thanks for the results! I don't know how NFS mounts behave but, I have to say, this is not what I was expecting: > baloosearch --id The_Monster.mp4 > 6e5300000050 /mnt/NAS/Filme/The_Monster.mp4 > 6e5300000055 /mnt/NAS/Filme/The_Monster.mp4 > > baloosearch --id foo.bar > 6e5300000050 /mnt/NAS/Filme/foo.bar > 6e5300000055 /mnt/NAS/Filme/foo.bar both files have the same ID, which could possibly make sense if the file was renamed but I'm guessing you didn't do that. It's possible to see though that the NFS disc has been mounted with different Device Numbers, "50" and "55" and your files indexed twice... Apologies, I think your going to need a different tool, Baloo is not going to cope...
Thank you for quick reply, Tagwerker, but to prevent misunderstand here: > baloosearch --id foo.bar > 6e5300000050 /mnt/NAS/Filme/foo.bar > 6e5300000055 /mnt/NAS/Filme/foo.bar The foo.bar was just an example just like 'The_Monster.mp4' due to the circumstance, that there are some more (I don't know, how many at all), have been listed like 'The_Monster.mp4'. The other files have different Ids, of course. Some time ago, I've been working with file Ids in order to make space saving backups with 'rsync' before this option was added to 'rsync' later. It was called 'versioning backup'. The point is just, that there with baloo indexing, there are two Ids for one and the same file, in the same location, and this for several files. -Linuxfluesterer
(In reply to linuxfluesterer from comment #11) > ... The point is just, that there with baloo indexing, there are two Ids for one > and the same file, in the same location, and this for several files ... Maybe it's worth checking with $ stat foo.bar as this gives you the Device Number/Inode values and seeing if these change with a reboot. Baloosearch is saying it has seen the foo.bar with the two ID's; can be at different times so maybe 6e5300000050 one day and 6e5300000055 after a reboot.
Sorry. Tagwerker, I can no more send comments (replies), due to the circumstance, that my comment is 'Automatically blocked' because its content is assumed as spam. I should contact the administrator. I haven't expected such a blocking and I don't know, who is the administrator and to tell him, my comment is seriously. -Linuxfluesterer
(In reply to tagwerk19 from comment #12) > (In reply to linuxfluesterer from comment #11) > > ... The point is just, that there with baloo indexing, there are two Ids for one > > and the same file, in the same location, and this for several files ... > Maybe it's worth checking with > $ stat foo.bar > as this gives you the Device Number/Inode values and seeing if these change > with a reboot. > > Baloosearch is saying it has seen the foo.bar with the two ID's; can be at > different times so maybe 6e5300000050 one day and 6e5300000055 after a > reboot. If this information is not blocked, here the essence of my investigation: ONLY, but essentially I registered, that the Device (id) has changed from before shutdown and after reboot. Before: Datei: /mnt/NAS/Filme/The_Monster.mp4 Größe: 200912874 Blöcke: 392408 EA Block: 1048576 reguläre Datei Device: 0,83 Inode: 18827 Links: 1 After: Datei: /mnt/NAS/Filme/The_Monster.mp4 Größe: 200912874 Blöcke: 392408 EA Block: 1048576 reguläre Datei Device: 0,73 Inode: 18827 Links: 1 Hope, this helps. -Linuxfluesterer
(In reply to linuxfluesterer from comment #14) > ... > Before: > Device: 0,83 Inode: 18827 Links: 1 > > After: > Device: 0,73 Inode: 18827 Links: 1 > ... OK, things are falling into place :-) The Inode isn't changing, which is good, but the mount appears under a different device number. Baloo sees "two" files (as it builds its index on the ID - a combination of the device number and inode), Baloosearch gives two results (or one file, twice, with the different ID's). Dolphin is doing the same sort of search and presents two results - although it appends a "(1)" so it can display they two. No way to guess how Dolphin manages to edit tags in this case, it keeps information cached (and you see that you can need an F5 to refresh things). If thinks it has two files? Hmm.. dunno. Do you set up the NFS mount in /etc/fstab? (and are you mounting more than one share?), if so that might explain why you get different device numbers. It depends on which share is mounted first.
(In reply to tagwerk19 from comment #15) > (In reply to linuxfluesterer from comment #14) > > ... > > Before: > > Device: 0,83 Inode: 18827 Links: 1 > > > > After: > > Device: 0,73 Inode: 18827 Links: 1 > > ... > OK, things are falling into place :-) The Inode isn't changing, which is > good, but the mount appears under a different device number. > > Baloo sees "two" files (as it builds its index on the ID - a combination of > the device number and inode), Baloosearch gives two results (or one file, > twice, with the different ID's). > > Dolphin is doing the same sort of search and presents two results - although > it appends a "(1)" so it can display they two. > > No way to guess how Dolphin manages to edit tags in this case, it keeps > information cached (and you see that you can need an F5 to refresh things). > If thinks it has two files? Hmm.. dunno. > > Do you set up the NFS mount in /etc/fstab? (and are you mounting more than > one share?), if so that might explain why you get different device numbers. > It depends on which share is mounted first. Thanks again for reply, Tagwerk. I don't have my nfs device in my host /etc/fstab, because not always I'm using my Nas, I always shut down the Nas and power it off, when shutting down my host. So, sometimes, I power on my Nas BEFORE booting my host, sometimes I power on (boot) the Nas WHILE I'm already working with my host. In both cases, I manually mount the nfs share in a Konsole with: 'mount 192.168.178.xx:/srv/nfs4/sda5/nfs/Daten /mnt/NAS' Btw, today, after booting again, the stat command shows, that the Device Id has changed again. Now it is: Device: 0,82 Inode: 18827 Links: 1 -Linuxfluesterer
(In reply to linuxfluesterer from comment #16) > (In reply to tagwerk19 from comment #15) > ... today, after booting again, the stat command shows, that the Device Id > has changed again. Now it is: > Device: 0,82 Inode: 18827 Links: 1 What's needed is a way of saying to the mount, "use a given minor device number" and I've not seen that described anywhere. It has been requested and apparently oft' discussed: https://github.com/util-linux/util-linux/issues/1562 and https://debbugs.gnu.org/cgi/bugreport.cgi?bug=53212 ... hat tip to Joachim Wagner.
(In reply to tagwerk19 from comment #17) > (In reply to linuxfluesterer from comment #16) > > (In reply to tagwerk19 from comment #15) > > ... today, after booting again, the stat command shows, that the Device Id > > has changed again. Now it is: > > Device: 0,82 Inode: 18827 Links: 1 > What's needed is a way of saying to the mount, "use a given minor device > number" and I've not seen that described anywhere. > > It has been requested and apparently oft' discussed: > https://github.com/util-linux/util-linux/issues/1562 > and > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=53212 > > ... hat tip to Joachim Wagner. Hallo Tagwerk, hallo Joachim, yesterday, first I have created a 'dummy' file on my home directory with all the tags I assumed to be needed when tagging my data directory (due to the circumstance, that the home directory must be indexed by default. Then I have been going on with testing adding tags to files in my mounted nfs share. In the morning today, it looked well, there seemed to be no duplicate entries within a list of files with a named tag despite the fact, that the 'dummy' file appeared finally three times in 'baloosearch Alle_Tags'. And later this day, more and more tagged files on my data directory appear twice. So, I completely reset baloo again, deleted and restarted. Then I searched with Startpage again, but changed my keywords: 'linux kde baloo indexing mounted nfs directories' ---- and found this interesting link, which confirms the problem with the Device ID: https://www.reddit.com/r/kde/comments/dhd14c/baloo_cant_index_nfs_mount/ In a German forum there was a hint to use sshfs file system to mount instead of nfs. I could mount my share as sshfs, but after I have added this new mount point in System Settings, ->Search, Index path, then I can not tag any file under this mount point, because it is not offered to add tags at all, though it is the same shared directory as it was before with nfs. -Linuxfluesterer
(In reply to linuxfluesterer from comment #18) > ... the 'dummy' file appeared finally three times in 'baloosearch Alle_Tags' ... The problem is Baloo is on shifting sands, it "sees" files reappear with different ID's and thinks they are new files. > ... this interesting link, which confirms the problem with the Device ID: > https://www.reddit.com/r/kde/comments/dhd14c/baloo_cant_index_nfs_mount/ Baloo is critically dependent on file-locking, there's the indexing process writing information to the file and other processes wanting to read the index in order to do searches. Locking has *got* to work. There's also a lot of information that has to be written quickly to the index during the indexing (particularly content indexing); the .local/share/baloo folder has to be on a local disc; better if the local disc is an SSD. Maybe it's possible to index remote filesystems. Although, as you see, there's a labyrinth of different traps and pitfalls to navigate. With your sshfs mount, did you watch to see whether the file ID's (device number/inode) stayed "fixed" or did they also change? There's a missing piece of the puzzle; it should be that whatever NAS you are using can index the files itself/locally and what the client does is ask the server for the search results. I've no idea whether anyone is doing that, it would be a big project... Sorry!
(In reply to tagwerk19 from comment #19) > (In reply to linuxfluesterer from comment #18) > > ... the 'dummy' file appeared finally three times in 'baloosearch Alle_Tags' ... > The problem is Baloo is on shifting sands, it "sees" files reappear with > different ID's and thinks they are new files. > > > ... this interesting link, which confirms the problem with the Device ID: > > https://www.reddit.com/r/kde/comments/dhd14c/baloo_cant_index_nfs_mount/ > Baloo is critically dependent on file-locking, there's the indexing process > writing information to the file and other processes wanting to read the > index in order to do searches. Locking has *got* to work. > > There's also a lot of information that has to be written quickly to the > index during the indexing (particularly content indexing); the > .local/share/baloo folder has to be on a local disc; better if the local > disc is an SSD. > > Maybe it's possible to index remote filesystems. Although, as you see, > there's a labyrinth of different traps and pitfalls to navigate. With your > sshfs mount, did you watch to see whether the file ID's (device > number/inode) stayed "fixed" or did they also change? > > There's a missing piece of the puzzle; it should be that whatever NAS you > are using can index the files itself/locally and what the client does is ask > the server for the search results. I've no idea whether anyone is doing > that, it would be a big project... > > Sorry! Hallo all, especially Tagwerk19, no need to say sorry, It seems, that when I'm rebuilding my indexes again after disabling baloo and resetting (deleting ~.local/share/baloo directory) the new indexing is done much faster, and despite the work of resetting baloo, there is no other problem appearing in the new indexes. Ok, when I'm adding tags to files after resetting, then the virtual tag directories are not updated, but it seems they will be after refresh (pressing F5). If this behavior after a reset of baloo does no more destroy my indexes, I can live with it. As I said, it seems, rebuilding indexes on a reset (deleted index directory) of baloo is much faster, some few seconds only. Btw., Arch, KDE and baloo as well are installed onto an M.2 NVMe, connected to PCIe 3 slot on board. Anyway, I will thank you for your efforts to find a solution. If I'll find something interesting new, I'll let you know. Nevertheless, I would like to keep this thread opened, if you agree... -Linuxfluesterer
(In reply to linuxfluesterer from comment #20) > (In reply to tagwerk19 from comment #19) > ...I will thank you for your efforts to find a solution. If I'll find > something interesting new, I'll let you know ... Thank you for your work and patience as well. It was a a bit of exploration and we found a few more bits of jigsaw to try to put together :-) > Nevertheless, I would like to keep this thread opened, if you agree... I think that's fair - we didn't find a solution