SUMMARY STEPS TO REPRODUCE 1. Reboot system OBSERVED RESULT baloo_file_extractor goes through all indexed folders and reindexes everything EXPECTED RESULT It should only index new/changed files SOFTWARE/OS VERSIONS Linux: 4.15.0 KDE Plasma Version: 5.14.4 KDE Frameworks Version: 5.53.0 Qt Version: 5.11.2
Yes, I have been observing very similar behaviour with many previous baloo versions for years, and up to the latest 5.57 (I use OpenSuSE Leap 15.0 with latest KDE packages from the "KDE Frameworks 5" repositories). To make it clear : it is not merely resetting the index, it is endlessly accumulating duplicates of it. Very symptomatic : upon _each_ reboot, the file counter (in "balooctl status") is _exactly_ _increased_ by the actual number of files, and indexing seems to painstakingly rebuild and append a new duplicate index each time, not realizing that these are indeed the same files it had already indexed before the reboot. I suspect it would behave the same after launching a "balooctl check" but I never tried this. I should mention that I use a BTRFS file system so this might be part of the problem (and why so few people seem to experience the same issue). Maybe there is some misunderstanding between baloo and the way BTRFS reports or sets file attributes to identify them as "already indexed" ? I can't remember for sure, but I don't seem to remember seeing this problem before I switched to the BTRFS file system (I was using EXT4 previously, more than 2 years ago) : there were definitely problems with baloo at the time (crashes and such), but I don't remember seeing the same symptom of duplicate indexing. I have seen that there is a distinct bug report about suspiciously similar BTRFS problems : https://bugs.kde.org/show_bug.cgi?id=401863 Of course, reboot after reboot, this behaviour triggers a never ending increase of resource bloat, not mentioning hours of slowdown after each reboot due to high CPU and memory load while the indexer browses again all the files to add an Nth duplicate to its index. At this stage, I would at least like to work around the problem by performing only a first-time indexing run and then stop the file content indexer while still being able to search the index (in krunner of dolphin). But unfortunately I never found any working way to reliably stop the file content indexer and still use the search engine (shouldn't one of "balooctl stop" or "balooctl suspend" allow this ?), so the only option seems to disable baloo completely and lose its search abilities. Moreover, in my experience, once the file content indexer has been started, the only way to really stop it is to kill the "baloo_file" process manually : otherwise it will survive all "balooctl" commands including "balooctl disable". Maybe this part belongs in a separate bug report : here also I found other similar bug reports complaining about "balooctl" not actually stopping or suspending "baloo_file" operation : https://bugs.kde.org/show_bug.cgi?id=404121 https://bugs.kde.org/show_bug.cgi?id=353559 But none of these reports mentions that the search engine should remain usable even after the file content indexer has been stopped / killed. Also the file _content_ search should remain operational even after the file indexer has been instructed to stop indexing file content (in my experience, disabling the "file content indexing" option also immediately reduces the search scope to file names only, despite the existence of a recent content index). Or did I miss something about baloo search usage ?
I have also noticed similar behavior. I am hypothesizing that the problem is because I instructed baloo to also index a secondary drive which is using encrypted ZFS. Unfortunately on my last reboot my encrypted ZFS partition did not automatically mount, but after mounting it baloo started to reindex it. The reindexing takes hours and a lot of resources that slows down the system. This has been a recurring issue.
KF 5.53 and even 5.57 is way too old. Update to a current version.
(In reply to Stefan Brüns from comment #3) > KF 5.53 and even 5.57 is way too old. Update to a current version. Thanks a lot for following up on this. I haven't been stuck on 5.57 : I regularly update my system, currently I am running KDE Frameworks 5.68.0 with Plasma 5.18.3 (under OpenSuSE Leap 15.1 if that matters). The situation with baloo re-indexing is still *exactly* the same as last year, and it has been the same with (almost) every single KF release in the interval (not sure I tested with all of them but pretty close). No need to go into details again : they are still exactly the same as in my earlier post and the bug title summarizes it in a self-explanatory fashion. First indexing run seems to go all right, then _after_reboot_ it does apparently think that all files are new and re-indexes everything, not realizing they are exactly the same files. Reboot after reboot, this ends up with ever increasing index size and multiplying the number of indexed files vs the number of actual files. As I said in the previous post, I suspect that this may be due to some specificity of the BTRFS file system I use, but I have no way to test that hypothesis. Something with the BTRFS operation (system reboot related) may induce baloo into thinking that the files are new / not already indexed...? At least I don't see any other peculiarity of my system so, BTRFS aside, this problem would likely affect many users and wouldn't have gone almost unnoticed with so few reports : there would be tons of bug reports about the same thing. HOWEVER, regarding the second part of my previous comment, I noticed that the command "balooctl suspend" now behaves as expected : it stops the frantic indexer, but I am still able to use the search function. So that's at least some substantial recovered functionality that makes baloo much much better than the dead weight it was before for me ! Thanks a lot to whoever improved this behavior !
You can thank Stefan for putting in tons and tons of work into Baloo. :) As of 5.76 I no longer have any problems like this. It currently gets stuck on one of my files, but now notices this and skips that file, preventing this kind of endless re-indexing behavior. Are you still seeing it with Frameworks 5.75 or later?
(In reply to Nate Graham from comment #5) > You can thank Stefan for putting in tons and tons of work into Baloo. :) > > As of 5.76 I no longer have any problems like this. It currently gets stuck > on one of my files, but now notices this and skips that file, preventing > this kind of endless re-indexing behavior. Are you still seeing it with > Frameworks 5.75 or later? Of course, Stefan deserves zillions of thanks for working on this. Over the last few months and years, Baloo has definitely become much better. Even in my case it is now completely usable, provided I issue a "balooctl suspend" each time I open a new session to prevent this strange "re-indexing" behaviour. Currently I am still on KF 5.75 (OpenSuSE Leap 15.2). I have just run a test overnight after reading your message, and unfortunately I must confirm that the behaviour is still present and Baloo is still re-indexing everything after each reboot. I am looking forward to 5.76 to see if this is actually fixed.
Thanks for the info!
My system seems to be affected too, in my case / is BTRFS while /home/MyUser/Desktop or /home/MyUser/Music (and all other user folders) are on separated ZFS datasets, none of them are encrypted, but both use compression among other features. Baloo will reindex all /home/MyUser/ZFSMonutedDirectories on reboot. I'll leave some additional specs of another system (mine) in hopes that it could be helpful: Operating System: KDE neon 5.20 KDE Plasma Version: 5.20.2 KDE Frameworks Version: 5.75.0 Qt Version: 5.15.0 Kernel Version: 5.4.0-52-generic OS Type: 64-bit Processors: 4 × Intel® Core™ i5-4670 CPU @ 3.40GHz Memory: 15.6 GiB of RAM Graphics Processor: GeForce GTX 1060 3GB/PCIe/SSE2 btrfs-progs v5.4.1 zfs-0.8.3-1ubuntu12.4 zfs-kmod-0.8.3-1ubuntu12.4 ZFS INFO for an affected dataset -------------------- zfs get all Link/Home/Music NAME PROPERTY VALUE SOURCE Link/Home/Music type filesystem - Link/Home/Music creation sáb jun 6 16:13 2020 - Link/Home/Music used 20.8G - Link/Home/Music available 655G - Link/Home/Music referenced 20.8G - Link/Home/Music compressratio 1.01x - Link/Home/Music mounted yes - Link/Home/Music quota none default Link/Home/Music reservation none default Link/Home/Music recordsize 128K default Link/Home/Music mountpoint /home/MyUser/Music local Link/Home/Music sharenfs off default Link/Home/Music checksum on default Link/Home/Music compression on inherited Link/Home/Music atime on default Link/Home/Music devices on default Link/Home/Music exec on default Link/Home/Music setuid on default Link/Home/Music readonly off default Link/Home/Music zoned off default Link/Home/Music snapdir hidden default Link/Home/Music aclinherit restricted default Link/Home/Music createtxg 261 - Link/Home/Music canmount on default Link/Home/Music xattr on default Link/Home/Music copies 1 default Link/Home/Music version 5 - Link/Home/Music utf8only off - Link/Home/Music normalization none - Link/Home/Music casesensitivity sensitive - Link/Home/Music vscan off default Link/Home/Music nbmand off default Link/Home/Music sharesmb off default Link/Home/Music refquota none default Link/Home/Music refreservation none default Link/Home/Music guid 14467266995499749484 - Link/Home/Music primarycache all default Link/Home/Music secondarycache all default Link/Home/Music usedbysnapshots 51.8M - Link/Home/Music usedbydataset 20.8G - Link/Home/Music usedbychildren 0B - Link/Home/Music usedbyrefreservation 0B - Link/Home/Music logbias latency default Link/Home/Music objsetid 167 - Link/Home/Music dedup off default Link/Home/Music mlslabel none default Link/Home/Music sync standard default Link/Home/Music dnodesize legacy default Link/Home/Music refcompressratio 1.01x - Link/Home/Music written 0 - Link/Home/Music logicalused 21.1G - Link/Home/Music logicalreferenced 21.0G - Link/Home/Music volmode default default Link/Home/Music filesystem_limit none default Link/Home/Music snapshot_limit none default Link/Home/Music filesystem_count none default Link/Home/Music snapshot_count none default Link/Home/Music snapdev hidden default Link/Home/Music acltype off default Link/Home/Music context none default Link/Home/Music fscontext none default Link/Home/Music defcontext none default Link/Home/Music rootcontext none default Link/Home/Music relatime on inherited Link/Home/Music redundant_metadata all default Link/Home/Music overlay off default Link/Home/Music encryption off default Link/Home/Music keylocation none default Link/Home/Music keyformat none default Link/Home/Music pbkdf2iters 0 default Link/Home/Music special_small_blocks 0 default Link/Home/Music com.sun:auto-snapshot on inherited
ZFS is not supported.
(In reply to Nate Graham from comment #5) > You can thank Stefan for putting in tons and tons of work into Baloo. :) > > As of 5.76 I no longer have any problems like this. It currently gets stuck > on one of my files, but now notices this and skips that file, preventing > this kind of endless re-indexing behavior. Are you still seeing it with > Frameworks 5.75 or later? Just had an update to 5.76 today. Unfortunately the problem doesn't seem to be solved in my case (BTRFS). I just did a test run (balooctl disable; balooctl purge; balooctl enable), waited for the indexing to finish (ca. 500k files), rebooted... and unfortunately, after scanning for new files to index it starts re-indexing everything just like before (pushing the "total" number of indexed files to 1 million). So again I suspended the indexer. If there is anything I can test or submit to help diagnose the root cause, just let me know.
Still same behaviour with frameworks 5.81.0 (OpenSuSE Leap 15.2, BTRFS file system).
(In reply to Pierre Baldensperger from comment #11) > Still same behaviour with frameworks 5.81.0 (OpenSuSE Leap 15.2, BTRFS file > system). It's on a reboot and not when you logout and back in again? Try a simple test... Maybe set up a test user so you don't have to reindex everything. Create a test file and check its details... echo "Hello Penguin" > testfile.txt stat testfile.txt balooshow -x testfile.txt I get: $ stat testfile.txt File: testfile.txt Size: 14 Blocks: 8 IO Block: 4096 regular file Device: 38h/56d Inode: 5089 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 1001/ test) Gid: ( 100/ users) Access: 2021-04-26 23:38:06.214398262 +0200 Modify: 2021-04-26 23:38:06.214398262 +0200 Change: 2021-04-26 23:38:06.214398262 +0200 Birth: 2021-04-26 23:38:06.214398262 +0200 $ balooshow -x testfile.txt 13e100000038 56 5089 testfile.txt [/home/test/testfile.txt] Mtime: 1619473086 2021-04-26T23:38:06 Ctime: 1619473086 2021-04-26T23:38:06 Internal Info Terms: Mplain Mtext T5 T8 File Name Terms: Ftestfile Ftxt XAttr Terms: Keep an eye on the "Device:" number (the 38 Hex, 56 decimal above) Reboot and run the stat and balooshow again. Interesting to know if the device number has changed, and whether the balooshow details have also changed...
This might also explain the instances of a search finding many copies or the same file. Looking at the filesystem with many subvols: $ df Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 1994516 0 1994516 0% /dev tmpfs 2006844 0 2006844 0% /dev/shm tmpfs 802740 1368 801372 1% /run tmpfs 4096 0 4096 0% /sys/fs/cgroup /dev/vda2 31447040 8928724 22190748 29% / /dev/vda2 31447040 8928724 22190748 29% /.snapshots /dev/vda2 31447040 8928724 22190748 29% /root /dev/vda2 31447040 8928724 22190748 29% /var /dev/vda2 31447040 8928724 22190748 29% /srv /dev/vda2 31447040 8928724 22190748 29% /home /dev/vda2 31447040 8928724 22190748 29% /opt /dev/vda2 31447040 8928724 22190748 29% /usr/local /dev/vda2 31447040 8928724 22190748 29% /boot/grub2/x86_64-efi /dev/vda2 31447040 8928724 22190748 29% /boot/grub2/i386-pc tmpfs 2006848 0 2006848 0% /tmp tmpfs 401368 36 401332 1% /run/user/1001 and rebooting half a dozen times, I get: $ baloosearch "Hello Penguin" /home/test/testfile.txt /home/test/testfile.txt /home/test/testfile.txt /home/test/testfile.txt /home/test/testfile.txt Elapsed: 1.27381 msecs It seems clear that these files are reindexed after the system has been rebooted. Seems also to be the case that files in the index whose internal Id's do not match up to anything existant on the filesystem are not cleaned up. SOFTWARE/OS VERSIONS openSUSE Tumbleweed 20210325 Plasma: 5.21.3 Frameworks: 5.80.0 Qt: 5.15.2
(In reply to tagwerk19 from comment #12) > > Try a simple test... > > (...) > > Interesting to know if the device number has changed, and whether the > balooshow details have also changed... Thank you very much for the helpful hints in diagnosing this. You are spot on !! Indeed the device number changes after every reboot. $ diff stat1.log stat2.log < Périphérique : 35h/53d Inœud : 24588954 Liens : 1 --- > Périphérique : 37h/55d Inœud : 24588954 Liens : 1 And there is also a corresponding change in balooshow. $ diff baloo1.log baloo2.log < 177329a00000035 53 24588954 testfile.txt [/home/test/testfile.txt] --- > 177329a00000037 55 24588954 testfile.txt [/home/test/testfile.txt] A baloosearch returns the same file twice. And I do indeed have a bunch of subvols. Now hopefully somebody who knows the internals of baloo deduplication criteria might be able to understand where this behaviour is coming from, and confirm that this is likely a BTRFS-specific problem.
Can probably flag this as CONFIRMED then...
(In reply to Pierre Baldensperger from comment #14) > Now hopefully somebody who knows the internals of baloo deduplication > criteria might be able to understand where this behaviour is coming from, > and confirm that this is likely a BTRFS-specific problem. I think it's a question of "levels of indirection", BTRFS adds an extra level, could be that other filesystems do so as well. I have a feeling this is going to be awkward. Looking at a system with two BTRFS discs, 'vda1' and 'vdb1', they also appear with different minor device numbers - the same as subvols on a single disc. Hmmm....
Looks as if this is a long term issue... Scroll down to the last posts in Bug 404057
(In reply to tagwerk19 from comment #17) > Looks as if this is a long term issue... > Scroll down to the last posts in Bug 404057 Yep, this problem has already been deeply analyzed and is well understood. The referenced bug report includes a lot of thoughts, possible solutions, and also a few real improvements as patches - some of those were merged. There are also links to phabricator with extended discussion. I suggest to read that entirely to understand the problem (some later comments re-decide on previous thoughts). Sadly, I mostly lost interest in this issue in favor of other more important or personal stuff. I simply ditched baloo since then as I wasn't really using it anyway that much. But if anyone wants to take the effort in crafting any patches, they might want to start with implementing the mapping table from volume/subvolume UUID to a virtual device number - and that virtual device number would than be used instead of the real one. This way, a distinct file system would always show up as the same device number in baloo - no matter on which device node it appeared. It solves almost all of the problems mentioned here. I volunteer to mentor/help with such an implementation, I'm just too bad with Qt/KDE APIs to kickstart that myself. Later improvements should look at access patterns and how to optimize that, maybe LMDB can be used in a better way to optimize it for background desktop access patterns, otherwise it may need to be replaced with some other backend that's better at writing data to the database (aka, less scattering of data over time): LMDB is optimized for reads and appends, much less for random writes (but the latter is the most prominent access pattern for a desktop search index). So if we stay with LMDB, baloo needs to be optimized to prevent rewrites in favor of appends - without blowing up the DB size too much. It may mean to purge still existing data from the LMDB mmap in favor of a bigger continuous block of free DB memory. Also, aggressive write coalescing is needed to avoid fragmentation access patterns in filesystems.
BTW: Such a UUID-to-deviceId mapping table would allow baloo to properly support most yet unsupported filesystems, probably also zfs. With such an idea implemented, the only requirement left to a supported filesystem would be that it has stable inodes across re-mounts/re-boots (most have, some don't) and supports reverse lookups (inode to path). The problematic design decision is how baloo identifies files: each file is assigned a devId/inodeId number (each lower 32-bit only, combined into a 64-bit fileId). If this magic number changes (that happens in zfs, btrfs, nfs...), the file appears as new. But neither Linux nor POSIX state anywhere that this can be used as an id to uniquely identify files - unless you never remount or reboot. Also, re-used inode numbers (especially after clipping at 32 bit) will completely mess up and confuse baloo. So this needs are multi-step fix: First (and most importantly) introduce virtual deviceIds by implementing a mapping table "volume/subvolId <-> virtualDeviceId" where virtualDeviceId would be a monotonically increasing number used uniquely throughout the index as a device id. Next step: Enlarge fileIds from 64 to 128 bit, so it can be crafted from 64-bit devid/inode without clipping/wraparound. On the pro side, such a mapping table would also allow to properly clean up index data from the DB for file systems no longer needed. Currently, baloo never knows if a file system would appear or doesn't. This could be implemented in one of the later steps as some sort of housekeeping optimizations.
(In reply to Kai Krakow from comment #18) > ... I suggest to > read that entirely to understand the problem ... I've done my best :-) Thank you for the info! In: https://bugs.kde.org/show_bug.cgi?id=404057#c35 You have the the idea of an "Index per Filesystem" but then the idea seems to have been put to the side. You mention "storage path" as a problem? Would the way "local wastebaskets" are managed on mounted filesystems be a model? They have to deal with the same issues as you've listed. https://phabricator.kde.org/T9805 Has a mention of "... inside encrypted containers", see this also in Bug 390830. As background thoughts... Things like "Tags:" folders in Dolphin and incremental searches when you type into Krunner depend on baloosearch being lightning fast. It would be a shame to lose the ability to search for phrases as in baloosearch Hello_Penguin as opposed to baloosearch "Hello Penguin" I'm guessing BTRFS usage is going to grow.
As a workround in openSuse, my test install had a /etc/fstab: UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c / btrfs defaults 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /var btrfs subvol=/@/var 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /usr/local btrfs subvol=/@/usr/local 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /srv btrfs subvol=/@/srv 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /root btrfs subvol=/@/root 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /opt btrfs subvol=/@/opt 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /home btrfs subvol=/@/home 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /boot/grub2/x86_64-efi btrfs subvol=/@/boot/grub2/x86_64-efi 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /boot/grub2/i386-pc btrfs subvol=/@/boot/grub2/i386-pc 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /.snapshots btrfs subvol=/@/.snapshots 0 0 UUID=a500b70e-4811-4db2-ab79-34e99b142b57 swap swap It seems that the BTRFS mounts are performed in parallel and there seems to be no option to specify that specific mounts appear with fixed device numbers. It is however possible to add "x-systemd.requires" options in the /etc/fstab that suggest "an order" that mounts are done in - and the device numbers seem to be allocated in the order of the mounts. This can only be described as a hack and quite likely fragile. If "/home" is set to depend on "/" and /.snapshots" and the other BTRFS subvols set to depend on "/home" then mount order is better defined and the device number allocated for /home *seems* stable. With the "x-systemd.required"s added, my /etc/fstab looks like: UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c / btrfs defaults 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /var btrfs subvol=/@/var,x-systemd.requires=/home 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /usr/local btrfs subvol=/@/usr/local,x-systemd.requires=/home 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /srv btrfs subvol=/@/srv,x-systemd.requires=/home 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /root btrfs subvol=/@/root,x-systemd.requires=/home 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /opt btrfs subvol=/@/opt,x-systemd.requires=/home 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /home btrfs subvol=/@/home,x-systemd.requires=/,x-systemd.requires=/.snapshots 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /boot/grub2/x86_64-efi btrfs subvol=/@/boot/grub2/x86_64-efi,x-systemd.requires=/home 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /boot/grub2/i386-pc btrfs subvol=/@/boot/grub2/i386-pc,x-systemd.requires=/home 0 0 UUID=19af1d21-f5c2-4518-a839-a3f2afdb199c /.snapshots btrfs subvol=/@/.snapshots 0 0 UUID=a500b70e-4811-4db2-ab79-34e99b142b57 swap swap defaults 0 0
(In reply to tagwerk19 from comment #20) > (In reply to Kai Krakow from comment #18) > > ... I suggest to > > read that entirely to understand the problem ... > I've done my best :-) Thank you for the info! > > In: > > https://bugs.kde.org/show_bug.cgi?id=404057#c35 > > You have the the idea of an "Index per Filesystem" but then the idea seems I didn't... I explained why that would not work. > to have been put to the side. You mention "storage path" as a problem? Would > the way "local wastebaskets" are managed on mounted filesystems be a model? > They have to deal with the same issues as you've listed. The problem is that you would have do deal with proper synchronization when multiple databases are used. That is not just "find a writeable storage location and register this location somewhere". Also, you would need to have all these different DBs opened at the same time, and LMDB is a memory mapped database with random access patterns. So you'd multiply the memory pressure with each location, and that will dominate the filesystem cache. > https://phabricator.kde.org/T9805 This mentions "store an identifier per tracked device, e.g the filesystem UUID" which is probably my idea. Instead of using dev_id directly, the database should have a lookup table where filesystem UUIDs are stored as a simple list. The index of this list can be used as the new dev_id for the other tables. > Has a mention of "... inside encrypted containers", see this also in Bug > 390830. Encrypted containers should never be indexed in a global database as that would leak information from the encrypted container. The easiest solution would be to just not index encrypted containers unless the database itself is stored in an encrypted container - but that's also just an bandaid. Maybe encrypted containers should not be stored at all. Putting LMDB on an encrypted containers may have very bad side-effects on the performance side. > As background thoughts... > > Things like "Tags:" folders in Dolphin and incremental searches > when you type into Krunner depend on baloosearch being lightning fast. Having multiple databases per filesystem can only make this slower by definition because you'd need to query multiple databases. From my personal experience with fulltext search engines (ElasticSearch) I can only tell you that querying indexes and recombining results properly is a huge pita, and it's going to slow things way down. So the multiple database idea is probably a dead end. > It would be a shame to lose the ability to search for phrases as in > baloosearch Hello_Penguin > as opposed to > baloosearch "Hello Penguin" > > I'm guessing BTRFS usage is going to grow. The point is: Neither Linux nor POSIX state anywhere that a dev_id from stat() is unique across reboots or remounts. This is even less true for inode numbers with some remote filesystems or non-inode filesystems (where inode numbers are virtual and may be allocated from some runtime state). Those are not stable ids. At least for native Linux-filesystems we can expect inode numbers to be stable as those are stored inside the FS itself (the dev_id isn't but UUID is). On a side-note: In this context it would make sense to provide baloo as a system-wide storage and query service shared by multiple users, with an indexer running per user (to index encrypted containers). It's the only way to support these ideas: - safe access to encrypted containers - the database can be isolated from being readable by users (prevents information leakage) - solves the problem of multiple users indexing the same data multiple times - has capabilities to properly read UUIDs from filesystems/subvolumes (some FS only allow this for root) - can guard/filter which results are returned to users (by respecting FS ACLs and permission bits) - shared index location (e.g. /usr/share/docs) would be indexed just once On the contra side: - needs some sort of synchronization between multiple indexers (should work around race conditions that multiple indexers do not read and index the same files twice), could be solved by running the indexer within the system-wide service, too, but access to encrypted containers needs to be evaluated
There is nothing new to add here. Please refrain from any further comments, if it does not any new information. 1. Baloo uses inodes and device ids as document IDs. inodes can be considered stable on all supported file systems, while device ids will (may) change when adding additional drives, plugging external storage in varying order, and apparently also for BTRFS subvolumes. This is a current design limitation. All this has been known for years, and also mentioned in some Phabricator tasks. Nothing new here. 2. Work has been under way to fix this for quite some time, and several places where db storage layer and filesystem layer were tightly coupled have already been cleaned up, though this is work in progress. 3. This restructuring takes time, and I only do this in my spare time. Tons of rude and abusive comments on e.g. reddit/kde and Phoronix have taken its toll, and I no longer spent the amount of time on Baloo (and KFileMetadata for the extractors) I once did. Lack of review(ers) also does not help. If you really want to support development and show some appreciation, I have a Liberapay account: https://liberapay.com/StefanB/donate
(In reply to tagwerk19 from comment #21) > It is however possible to add "x-systemd.requires" options in the /etc/fstab > that suggest "an order" that mounts are done in - and the device numbers > seem to be allocated in the order of the mounts. > > ... quite likely fragile ... With further tests, too fragile :-( Need either a way of specifying that a mount uses a given device number or baloo can adapt to running on such "shifting sands"
(In reply to tagwerk19 from comment #24) > With further tests, too fragile :-( Maybe the "subvolid" that findmnt gives you is better: findmnt -T testfile.txt reports the mount point, the filesystem type and a "subvolid" (in the case of BTRFS). It seems that "subvolid" is stable. It is possible to change it but it doesn't change on it's own (as far as I can tell...)
Getting a stable ID is not the hard part, but changing everything in the internals to use an indirection layer is.
While the issue seems to be clear now, I'd like to add a baloo log message supporting that. This should also make this bug more discoverable when people are searching for it. In my system journal, I get the following message for each indexed file: kf.baloo: "/home/some/file" id seems to have changed. Perhaps baloo was not running, and this file was deleted + re-created This is in line with the brought up internal ID changes.
@Kai: Is there a bug / feature request for the system-wide indexing you mention? I'd like to add more to the contra side. @Stefan: Rather than changing code in baloo to implement the mapping from UUID+subvolumeID to internal-fs-ID, how about executing baloo in a wrapper that redefines `stat()` to modify st_dev? Yes, this is a hack but it may be enough while waiting for https://github.com/util-linux/util-linux/issues/1562 .
An alternative to relying on UUIDs and sub-volume IDs is to assume mount points of filesystems do not change and to proceed as follows: * Have a persistent table `I` mapping mount points to internal filesystem ID (currently the device number stat.st_dev). * In each run, start with an empty table `M` mapping device numbers to mount points. * During indexing, query stat.st_dev as usual. If stat.st_dev is not yet in `M` find out what the mount point is and add it. Otherwise, obtain the mount point from `M`. (We could do without table `M` but that would be slow and table `M` is expected to stay tiny. If in doubt, use proper cache logic to limit the size of `M`.) * Look up our internal filesystem ID in `I` with the mount point. If not in `I` yet allocate a new ID for it.
(In reply to Joachim Wagner from comment #28) > https://github.com/util-linux/util-linux/issues/1562 . Like :-) Seems that there are two options here: a fix to filesystem/mount to permit a "specified" device number (minor device number...) or a reengineering of baloo to use a longer-and-unique Disc/Partition ID. I'll pick out Stefan's Comment #26: > Getting a stable ID is not the hard part, but changing everything in the internals to use an indirection layer is. Stefan has stepped down as maintainer. Baloo improved *massively* on his watch and thanks are due. A new enthusiast would however be welcome.
(In reply to Joachim Wagner from comment #29) > An alternative to relying on UUIDs and sub-volume IDs is to assume mount > points of filesystems do not change and to proceed as follows... Apologies, I fear you'll have to step through your process for me. I'm somehow missing something... If I look on Tumbleweed I can see results from: stat testfile stat -f testfile findmnt -nT testfile These give me the major/minor device numbers + inode of the testfile, the "filesystem ID" and mount point (with BTRFS subvol/subvolid). The minor device number jumps around with reboots. The filesystem ID, subvol and subvolid seem solid. Snippets of my last reboots and updates: Device: 0,40 Inode: 2506 ID: cef844b93a5a00ff BTRFS: subvolid=263,subvol=/@/home Device: 0,39 Inode: 2506 ID: cef844b93a5a00ff BTRFS: subvolid=263,subvol=/@/home Device: 0,46 Inode: 2506 ID: cef844b93a5a00ff BTRFS: subvolid=263,subvol=/@/home Device: 0,40 Inode: 2506 ID: cef844b93a5a00ff BTRFS: subvolid=263,subvol=/@/home Device: 0,41 Inode: 2506 ID: cef844b93a5a00ff BTRFS: subvolid=263,subvol=/@/home Device: 0,43 Inode: 2506 ID: cef844b93a5a00ff BTRFS: subvolid=263,subvol=/@/home At the moment, with every reboot, baloo indexes or reindexes the testfile "under" it's new docID (device number/inode) and over time, it gathers quite a collection of entries: baloosearch -i testfile 9ca00000027 /home/test/testfile 9ca0000002f /home/test/testfile 9ca0000002e /home/test/testfile 9ca0000002d /home/test/testfile 9ca0000002c /home/test/testfile 9ca0000002b /home/test/testfile 9ca0000002a /home/test/testfile 9ca00000029 /home/test/testfile 9ca00000028 /home/test/testfile The mapping would have to work when indexing (going from full filename to an invariant, unique, internal docID) and when searching (going from the docID to the canonical filename).
(In reply to tagwerk19 from comment #31) > Apologies, I fear you'll have to step through your process for me. I'm > somehow missing something... > [...] > The mapping would have to work when indexing (going from full filename to an > invariant, unique, internal docID) I only described the indexing part. The docID is the pair (filesystemID, inode_number) where filesystemID := I(mount_point(filepath)). M is only introduced to make determining mount_point(filepath) more efficient by using cached values M(stat.st_dev(filepath)). The number of cache entries never exceeds the number of mounted filesystems. > and when searching (going from the docID > to the canonical filename). To get from docID to the filepath, without storing the filepath, one can maintain a reverse map of I to get the mount point for a given internal filesystem ID. Once one has the mount point, one can get the current stat.st_dev for the filesystem which is currently used to get the filepath for a given inode_number. I am suggesting this alternative as the current proposal requires filesystem-specific code such as looking for the special string "subvolid" in findmnt output. Another filesystem may call it something else. One doesn't want to write code for each possible filesystem and update it each time somebody publishes a new filesystem.
(In reply to Joachim Wagner from comment #32) > ... a given internal filesystem ID ... Maybe that's where I'm getting muddled... statvfs and "stat -f" give a 64 bit "Filesystem ID" and I was imagining you were talking about that. If I've followed the breadcrumbs right this comes from the UUID (for BTRFS). Ref: http://lkml.iu.edu/hypermail/linux/kernel/0809.0/0593.html It looks straightforward to get the filesystem ID for a file. However, it needs more space than a device number and thus a lookup table. > ... One doesn't want > to write code for each possible filesystem and update it each time somebody > publishes a new filesystem ... Perhaps the f_fsid field is sufficient
(In reply to tagwerk19 from comment #33) > statvfs and "stat -f" give a 64 bit "Filesystem ID" and I was imagining you > were talking about that. No, I meant "baloo-internal filesystem ID", a sequentially allocated number as in the proposal discussed before. Difference in my proposal is that a new mount point triggers the allocation, rather than a new UUID+subvolid pair that may be difficult to obtain. > http://lkml.iu.edu/hypermail/linux/kernel/0809.0/0593.html It says "For bfs and xfs it's the block device". This means the ID from stat- f it is NOT suitable as a filesystem ID as the block device major:minor can change. Examples: (1) 2 or more NVMe SSDs: While the first SSD is always /dev/nvme0n1 and the 2nd /dev/nvme1n1, it is random which one gets 259:0. (2) 2 ore more dm-crypt devices with same iter-time: It is random which one becomes /dev/dm-0, which always is 254:0. > It looks straightforward to get the filesystem ID for a file. I haven't seen yet anywhere here a filesystem ID that is stable across restarts and accessible in a standardised way for any filesystem type. Hence my proposal to move away from system-provided IDs and to use the mount point as an identifier instead.
(In reply to Joachim Wagner from comment #34) > ... Hence my proposal > to move away from system-provided IDs and to use the mount point > as an identifier instead ... Accepted. Although I think we need to look at "what we can trust most". If ext2/3/4, BTRFS, NTFS give a stable filesystem ID, we should make the most of it to help when mounting storage on a different mount point (saying, yes, we know this disc) or when mounting different storage on a fixed mount point (this isn't the disc it used to be). If the mount point and Filesystem ID disagree, provided it's a reliable Filesystem ID, we should go with that Filesystem ID This would mean including the filesystem ID in your "I" table and careful making judgements when a disc is seen to move, vanish or reappear. Having kept tabs on baloo issues for a couple of years, the majority of the "reindexing" or "duplicated results" issues have been from OpenSUSE and thus BTRFS with multiple subvols. I don't remember seeing any reports mentioning XFS but then you are not prompted for filesystem type when submitting a bug report. Maybe there were some that mentioned Mandriva but I never got to the bottom of those. I don't know the status with ZFS. If we wanted an intellectual challenge to shake out the edge cases, we can think how to deal with symbolic links 8-]
(In reply to tagwerk19 from comment #35) > ... mentioned Mandriva ... Maybe Manjaro ...
(In reply to tagwerk19 from comment #35) > Accepted. > Although I think we need to look at "what we can trust most". > [...] > This would mean including the filesystem ID in your "I" table and careful > making judgements when a disc is seen to move, vanish or reappear. Yes, a hybrid approach would be a good default as long as the filesystem ID does not change with the major:minor of the block device. For filesystems for which baloo does not know how to get a filesystem ID the ID could be "N/A" and any transition between N/A and a proper ID would also mean that the filesystem is new. > Having kept tabs on baloo issues for a couple of years, the majority of the > "reindexing" or "duplicated results" issues have been from OpenSUSE [...] The openSUSE installer uses btrfs by default. > [...] I don't remember seeing any reports mentioning > XFS [...] I don't know the status with ZFS. I'd think XFS users typically either have a simple setup or use LVM on top of a complex storage setup and LVM seems to allocate the /dev/dm-* devices in a predictable order; at least I was using this setup for many years without baloo reindexing repeatedly. > think how to deal with symbolic links 8-] This should go into a separate feature or documentation request. I see 4 decision to make, either hard-coded or configurable: (1) Symlinks to other folders: If the target folder is indexed anyway the link can be ignored. If not, the default probably should be not to follow the link as the target folder is under a folder that the user specifically excluded from indexing. (follow yes/no) (2) Indexing of the path of the target: One could index symlinks treating them like text files that contain just the target path as plain text. (index path yes/no) (3) Content indexing for symlinks to files: If the target is indexed anyway question is whether to enter the symlink as a duplicate result under a different name. If not, like for folders, the default probably should be not to index the file but this probably should be configurable as users may want to use symlinks to bring otherwise excluded files into the index.
(In reply to Joachim Wagner from comment #37) > (1) Symlinks to other folders: If the target folder is indexed anyway the > link can be ignored. If not, the default probably should be not to follow > the link as the target folder is under a folder that the user specifically > excluded from indexing. Symlinks provide a bit of an edge case :-) There are a stream of issues reported. At the moment baloo deliberately avoids following symlinks when indexing whereas dolphin searches do follow them. There's a summary under Bug 447119 A commonly reported scenario is that people have a separately mounted disc with a symlink to it (as a way to give extra space for ~/Pictures, ~/Videos or whatever). What might (should?) happen here if we look at mount points? I can see: stat -f ~/symlinkto/myfile or: findmnt -nT ~/symlinkto/myfile give the Filesystem ID and mount point for the destination disc and ignore the fact that you have followed a symlink to get to it. I'd say it makes sense to deal with the canonical names (on the destination device) while indexing and do any adjustments to search results wrt symlinks when returning search results. Does the "mount point" idea work here?
(In reply to tagwerk19 from comment #38) > Symlinks provide a bit of an edge case :-) > [...] > Does the "mount point" idea work here? I don't know the internals of the indexer implementation so I cannot say for sure. I would have thought the current indexer calls `stat()` on every file and therefore will have no problem noticing it is on a different filesystem after following a symlink. If following symlinks would pose a problem to the current indexer this means the indexer works differently than I thought. Switching to using the mount point, filesystem ID and subvolid, I'd again have assumed these three are queried for every file to be index (using a volatile cache with stat.st_dev as the key to speed things up). If this check is performed for every file to be indexed I don't see how there would be any problem when following symlinks, other than surprising users who thought that adding a folder to "Do not search in these locations" (GUI) will exclude its contents from the index.
My baloo index file is 32GiB large right now, more than any other folder on my file system, and my file system is filled up by 100%, my PC crashed during an update and doesn't boot anymore because there is no linux kernel. Thanks baloo.
(In reply to Lukas Ba. from comment #40) > My baloo index file is 32GiB large right now, more than any other folder on > my file system, and my file system is filled up by 100%, my PC crashed > during an update and doesn't boot anymore because there is no linux kernel. > Thanks baloo. OpenSUSE? (and multiple BTRFS subvolumes)?
(In reply to tagwerk19 from comment #41) > (In reply to Lukas Ba. from comment #40) > > My baloo index file is 32GiB large right now, more than any other folder on > > my file system, and my file system is filled up by 100%, my PC crashed > > during an update and doesn't boot anymore because there is no linux kernel. > > Thanks baloo. > OpenSUSE? (and multiple BTRFS subvolumes)? ArchLinux, with multiple BTRFS subvolumes, my setup is described here https://wiki.archlinux.org/title/Snapper#Suggested_filesystem_layout
(In reply to Lukas Ba. from comment #42) > ... ArchLinux, with multiple BTRFS subvolumes ... You could try the tests in Comment 12, Comment 13 Very likely that baloo is "seeing" your home drive mounted with different minor device numbers and assuming that all the files it sees are new files. Not good. There's a possible mitigation in Comment 21 that may be worth building on. It is a hack, fragile and I suspect I've seen the Minor Device Number jump even with it in place but I think it's an improvement. ... I missed the fact that Arch runs with BTRFS
> [...] home drive mounted with different > minor device numbers This phrasing is likely to cause confusion. Better to refer to the value `stat.st_dev` that is used by baloo and that does not have a "minor": * Block devices have major and minor device numbers that used to be 8 bits each but have been extended to a wider range about 15 years ago. These are stable across restarts for hard drive partitions but are allocated dynamically for device mapper devices, e.g. LUKS encryption layers. NVMe SSDs have been observed to receive a single major such that the minors of the second SSD and its partitions change (at next restart) when the number of partitions on the first SSD is modified. * Filesystems have a device number that some (single volume) filesystems derive from the device number and other filesystems, e.g. btrfs, set in some other way. They are supposed to be unique for each filesystem over the uptime of a system but may change at each restart. The stat() system call returns this value as `stat.st_dev`. Filesystems with subvolumes produce a different device number for each subvolume.
(In reply to Joachim Wagner from comment #44) > ... This phrasing is likely to cause confusion ... Accepted. There are many layers (and history) here, thanks for the explanation. The challenge is to find a solid procedure for troubleshooting (as in compare the results from "stat" and "balooshow") and simple terms to use when describing them :-/
Thank you, Joachim Wagner and tagwerk19@innerjoin.org for your insightful comments, and Stefan Brüns and all the contributors for your efforts. My inputs: We need a way to list all the filesystems that are part of the index. (This would increase visibility into what is going on for bugreports and users understanding of what baloo is doing.) Ideally the command would show the date when the file system was last mounted. Files on filesystems that are not mounted should not be the result of a search. However, these files should remain on the index, to support the indexing of removable drives that may or may not be mounted at each boot, and should not be cleaned up automatically. We need a command to clean certain file systems from the index. Also a form of this command to clean all the file systems that are not currently mounted. Some removable drives may never come back and we don't need them on the index anymore, let the user decide if they want to delete them from the index.
This might be *slightly* off-topic, but here's a hacky workaround that I can live until this is properly fixed. 1. run baloo as a systemd service which has been (io)niced as much as possible 2. use `ExecStartPre` to run a script that checks if the device id of $HOME/.config has changed and if so, blow away the baloo index and rebuild. This means that after most restarts, it will be about 20 minutes before everything has been indexed, but that's totally acceptable (for me). This is the script: ``` #!/usr/bin/env bash set -eEuo pipefail STAT_FILE=$XDG_CACHE_HOME/baloo.state test -f $STAT_FILE || touch $STAT_FILE if [ "$(stat --format='%Hd,%d' $HOME/.config)" != "$(cat $STAT_FILE)" ]; then echo "Device number changed. Resetting baloo index" balooctl suspend balooctl disable pkill -9 baloo_file || true balooctl purge balooctl enable balooctl suspend stat --format='%Hd,%d' $HOME/.config > $STAT_FILE fi # in case it was started outside systemd pkill -9 baloo_file || true ```
Created attachment 159031 [details] Patch making Baloo use the FSID as device ID Until somebody implements one of the fancy solutions proposed in the comments above, here is a fairly trivial stop-gap fix for the issue. With this patch, Baloo derives the device ID from the FSID instead of the device number. This fixes the issue completely for Btrfs and ext4 (where the FSID is derived from the UUID of the filesystem plus subvolume ID on Btrfs) while it does not hurt XFS and similar filesystems (where the FSID is just the device number, so nothing really changes). I'm going to submit a Gitlab merge request for this shortly, but here's the patch if anyone wants to try it out. For inspiration, here's a patched RPM built on Copr for Fedora: https://copr.fedorainfracloud.org/coprs/tootea/earlybird/build/5922301/ After installing a patched Baloo, things will be reindexed once (because all the device IDs will change) and then never again. It's best to run balooctl purge to first get rid of all the old duplicates with various device IDs (and logout to pick up the new libraries and make search work again).
Nice, please do submit it as a merge request!
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/131
(In reply to Nate Graham from comment #49) > Nice, please do submit it as a merge request! Does sound good, not least because it's starting to affect Fedora BTRFS systems. Used to be nearly always OpenSUSE that had trouble. Is there a list anywhere of "what the FSID is"? We will definitely meet NTFS (both as the Fuse NTFS-3G and the Paragon code), we will meet encrypted volumes and remote filesystems (Samba and NFS), ExFat as well. We might meet ZFS and also possibly(?) OverlayFS. Baloo is not supported for all of these but it doesn't stop people trying :-) I agree, it would be prudent to "clean and start afresh", at least if people are content indexing. It might also clear out existing corruptions (Bug 464226). Heads up though that there's been a recent change to severely limit the amount of memory baloo can use: https://invent.kde.org/frameworks/baloo/-/commit/db223ed1fe300c0809b743711eda55364a67a491 that could impact reindex (currently it's initial "what files are where" scan is a single transaction: Bug 394750 and wander down to the 13th comment) A note that Igor Pobioko has been doing stuff and that https://invent.kde.org/poboiko/baloo/-/commit/b17e96700f3da2202e27bf80fe423c458c3cc62d might be useful. Wishing you good luck! ~
I've been testing the patch on NixOS on kf 5.106, and it's working great across 3 reboots of a couple of mounted subvolumes.
(In reply to André M from comment #52) > I've been testing the patch on NixOS on kf 5.106, and it's working great > across 3 reboots of a couple of mounted subvolumes. Can you share how you've done this? At the moment I have baloo from unstable: let nixos-unstable = import <nixos-unstable> {}; in { environment.systemPackages = with pkgs; [ nixos-unstable.libsForQt5.baloo nixos-unstable.libsForQt5.baloo-widgets ]; }; I'm only at the beginning of my Nixos journey though 8-/
(In reply to tagwerk19 from comment #53) > (In reply to André M from comment #52) > > I've been testing the patch on NixOS on kf 5.106, and it's working great > > across 3 reboots of a couple of mounted subvolumes. > Can you share how you've done this? At the moment I have baloo from unstable: From what you said, it looks like you're on nixos stable (22.11 I suppose), and trying unstable for specific packages; baloo itself is a kdeFrameworks, so it's harder to override, because it gets pulled indirectly by other kde packages when you `services.xserver.desktopManager.plasma5.enable = true`, therefore, the best way I know would be to replace the whole plasma5Packages scope using an overlay (it'll still trigger several recompilations, because again baloo is an indirect dependency of several packages): final: prev: let nixos-unstable = import <nixos-unstable> {inherit (prev) system;}; in { plasma5Packages = nixos-unstable.plasma5Packages.overrideScope' (finalx: prevx: let kdeFrameworks = prevx.kdeFrameworks.overrideScope' (finaly: prevy: { baloo = prevy.baloo.overrideAttrs (attrs: { version = "${attrs.version}-patched"; patches = (attrs.patches or []) ++ [ # baloo in btrfs (final.fetchpatch { url = "https://bugsfiles.kde.org/attachment.cgi?id=159031"; sha256 = "sha256-hCtNXUpRhIP94f7gpwTGWWh1h/7JRRJaRASIwHWQjnY="; }) ]; }); }); in (kdeFrameworks // {inherit kdeFrameworks;})); }
(In reply to André M from comment #54) > ... the best way I know would be to replace the whole plasma5Packages scope using an overlay ... Thank you. That looks way, way deeper than anything I've tried before :-/
(In reply to André M from comment #52) > I've been testing the patch on NixOS on kf 5.106, and it's working great FWIW, I've been running with this patch for some time now (also on NixOS) and it works beautifully with btrfs on dm-crypt on LVM on NVMe.
*** Bug 470665 has been marked as a duplicate of this bug. ***
For those of us who cannot install a patch, what version of KDE do we need to wait for? Will it go into 5.27.6?
There has been nothing official from KDE about this patch getting merged, so it’s too early to say anything about which release it will be in. That being said, we will start carrying this in NixOS as it is a huge step up on the user experience front for anyone on btrfs.
Git commit c735faf5a6a3ef3d29882552ad0a9264a294e038 by Nate Graham, on behalf of Tomáš Trnka. Committed on 07/09/2023 at 17:52. Pushed by ngraham into branch 'master'. Use the FSID as the device identifier where possible The device number returned by stat() in st_dev is not persistent in many cases. Btrfs subvolumes or partitions on NVMe devices are assigned device numbers dynamically, so the resulting device ID is typically different after every reboot, forcing Baloo to repeatedly reindex all files. Fortunately, filesystems like Btrfs or ext4 return a persistent unique filesystem ID as f_fsid from statvfs(), so we can use that when available. Other filesystems like XFS derive the FSID from the device number of the underlying block device, so switching to the FSID does not change anything. Related: bug 471289 M +21 -1 src/engine/idutils.h https://invent.kde.org/frameworks/baloo/-/commit/c735faf5a6a3ef3d29882552ad0a9264a294e038
Git commit 7a1c09ed1b7aa9a7a093f8d715b42a4aedd0f7b6 by Nate Graham, on behalf of Tomáš Trnka. Committed on 07/09/2023 at 17:54. Pushed by ngraham into branch 'kf5'. Use the FSID as the device identifier where possible The device number returned by stat() in st_dev is not persistent in many cases. Btrfs subvolumes or partitions on NVMe devices are assigned device numbers dynamically, so the resulting device ID is typically different after every reboot, forcing Baloo to repeatedly reindex all files. Fortunately, filesystems like Btrfs or ext4 return a persistent unique filesystem ID as f_fsid from statvfs(), so we can use that when available. Other filesystems like XFS derive the FSID from the device number of the underlying block device, so switching to the FSID does not change anything. Related: bug 471289 (cherry picked from commit c735faf5a6a3ef3d29882552ad0a9264a294e038) M +21 -1 src/engine/idutils.h https://invent.kde.org/frameworks/baloo/-/commit/7a1c09ed1b7aa9a7a093f8d715b42a4aedd0f7b6
Given that this is a pretty annoying bug, will it be backported to KF5 or will all Btrfs users need to wait for KF6 and Plasma 6 to become stable before they can use Baloo (in > 6 months from now)? I realise that it is all hands on deck for KF6 now, but as this is a bug, and the patch clearly works on KF5 already, it would be really nice if KF5’s Baloo got fixed too.
(In reply to Matija Šuklje from comment #62) > ... will it be backported to KF5 ... There were two commit messages... the second mentions kf5 (In reply to Nate Graham from comment #61) > Pushed by ngraham into branch 'kf5'. So I guess so :-)
Fixed in 5.111, sweet! Thank you! :D
Thank you dearly for fixing this long standing issue!
Worth being aware of Bug 475919 The "one off" reindexing done after the fix also includes machines that did not previously reindex the folders, basic ext4 setups for example.