SUMMARY With the file indexer enabled and idling, creating multiple directories and files at once in a monitoring path (for example as a result of a tarball extraction), file indexer crashes with this in the log: ----8<---- Apr 03 09:52:55 vesho baloo_file[3346599]: kf.baloo: "/home/odeda/Documents/bigor/test/foo/" EventCreate Apr 03 09:52:55 vesho baloo_file[3346599]: kf.baloo: "/home/odeda/Documents/bigor/test/a" EventCreate Apr 03 09:52:55 vesho baloo_file[3346599]: ASSERT: "!filePath.endsWith(QLatin1Char('/'))" in file /home/odeda/cvs/baloo/src/file/newfileindexer.cpp, line 38 Apr 03 09:52:56 vesho plasmashell[3346599]: KCrash: Attempting to start /usr/bin/baloo_file Apr 03 09:52:56 vesho plasmashell[3346599]: KCrash: crashing... crashRecursionCounter = 2 Apr 03 09:52:56 vesho plasmashell[3346599]: KCrash: Application Name = baloo_file path = /usr/bin pid = 3346599 Apr 03 09:52:56 vesho plasmashell[3346599]: KCrash: Arguments: /usr/bin/baloo_file Apr 03 09:52:56 vesho plasmashell[3346599]: KCrash: Attempting to start /usr/lib/x86_64-linux-gnu/libexec/drkonqi ----8<---- STEPS TO REPRODUCE 1. Configure file indexer to monitor a smaller directory structure, so initial indexing can complete quickly (my test has just a few thousands text, pdf and image files). 2. Make sure the indexer is stopped (`killall baloo_file baloo_file_extractor`) 3. Purge the database (`balooctl purge`) 4. Start the file indexer (I use the Info Center's File Index Monitor) 5. Quickly dump a few new files and directories into the monitored directory. I use this repro script: (mkdir ~/TESTDIR/test -p; base64 -d | tar -C ~/TESTDIR/test -jx) <<<QlpoOTFBWSZTWfR8wfsAAI97hcCAAIBAAf+AAzN/AJ4AAAEAEAAIMADWxCKRMU0AGg09QAMVNDQA9QNAACKmpGjQxGIAxH6pkDB8E5T2JBOI+AJTwSQaq2ZB4BmZJIyiC0dg7wbxoIEDQYHMYGZSrXIoxAVcQdAId8+1732iFgchknCJg5TOQzGBlfYYE4SoyZI4DAFWpgMwItPa83EgGoVLOLO1IhoDSivkc4H6NyYeweoNAaggKx0H6GYEjIyNYgfxdyRThQkPR8wfsA== This creates the directory structure test/foo/bar with a single empty file in each directory. OBSERVED RESULT Baloo daemon crashes. EXPECTED RESULT Baloo should not crash. SOFTWARE/OS VERSIONS Linux/KDE Plasma: KDE Plasma Version: 5.21.80 KDE Frameworks Version: 5.81.0 Qt Version: 5.15.2
Fix submitted in https://invent.kde.org/frameworks/baloo/-/merge_requests/48
Originally posted this to Bug 435161... Not sure whether this is any way related, it's a glitch I get "every so often". When I've missed capturing it, I've not found a way of reproducing it. This time though.... cd ~/Documents mkdir temp echo "Hello Penguin" > temp/file1.txt balooshow -x temp/file1.txt gave: 144da00000fc01 64513 1330592 temp/file1.txt [/home/test/.kde/share/config/kdeglobals] Mtime: 1617136526 2021-03-30T22:35:26 Ctime: 1617136526 2021-03-30T22:35:26 Cached properties: Line Count: 158 Internal Info Terms: 0 0.025 ... Mplain Mtext T5 T8 X20-158 activebackground activeblend activeforeground backgroundalternate background ... widgetstyle window wm File Name Terms: Fkdeglobals XAttr Terms: lineCount: 158 That is, the details of something else completely (a different file, indexed at a different time. Had to trim the list of terms as the full list looked too much like spam). The "Balooctl monitor" does not show "file1.txt" being indexed. After a reboot I get the "expected" 144da00000fc01 64513 1330592 temp/file1.txt [/home/test/Documents/temp/file1.txt] Mtime: 1617270546 2021-04-01T11:49:06 Ctime: 1617270546 2021-04-01T11:49:06 Cached properties: Line Count: 1 Internal Info Terms: Mplain Mtext T5 T8 X20-1 hello penguin File Name Terms: Ffile1 Ftxt XAttr Terms: lineCount: 1 I'm not sure whether this is close enough that I can set the status to CONFIRMED Neon Unstable Plasma: 5.21.80 Frankeworks: 5.81.0 Qt: 5.15.2 Journalctl(*) shows a matching... Apr 01 12:21:38 localhost.localdomain baloo_file[1100]: 5714849124383745 "/home/test/Documents/temp/file1.txt" renaming "kdeglobals" to "file1.txt" Apr 01 12:21:38 localhost.localdomain baloo_file[1100]: 5643818955242497 "/home/test/.kde/share/config/kdeglobals" renaming "Trolltech.conf" to "kdeglobals" I'm guessing these "renamings" are not real. If I look back through the logs there are some large batches of these messages. (Don't seem able to filter baloo messages with "journalctl -u baloo_file", is this at all possible?)
(In reply to tagwerk19 from comment #2) > Originally posted this to Bug 435161... > > Not sure whether this is any way related, it's a glitch I get "every so > often". When I've missed capturing it, I've not found a way of reproducing > it. This time though.... > > cd ~/Documents > mkdir temp > echo "Hello Penguin" > temp/file1.txt > balooshow -x temp/file1.txt > > gave: > > 144da00000fc01 64513 1330592 temp/file1.txt > [/home/test/.kde/share/config/kdeglobals] This seems completely unrelated. Its better if you open a new ticket. Please email me with the detail of the new ticket and I will try to take a look at it.
Done, see Bug 435434 Cannot add anything new I'm afraid. "It happens" but not able to reproduce it at will.
For what it's worth, I have now seen this.... It happens in certain situations; running the test code "wasn't sufficient" to trigger the assertion. For me, it did not seem necessary to force a race condition, that is to create the folder tree quickly. Once baloo_file had crashed and was restarted, a single "mkdir" was enough to cause it to crash again. Didn't need to recreate the same folder with a mkdir, creating a different folder also caused a crash. It seemed deterministic enough that a 'balooctl purge' and rebuild gets you to back to a situation where you could trigger the assertion again with a "mkdir" It was however not reproduceable to the extent that you could cause a crash (by following the same sequence of commands) in a different user/home directory. Not sure how much this helps, but I think it's enough to flag a CONFIRM.
(In reply to tagwerk19 from comment #2) > Journalctl(*) shows a matching... > > Apr 01 12:21:38 localhost.localdomain baloo_file[1100]: 5714849124383745 > "/home/test/Documents/temp/file1.txt" renaming "kdeglobals" to "file1.txt" > Apr 01 12:21:38 localhost.localdomain baloo_file[1100]: 5643818955242497 > "/home/test/.kde/share/config/kdeglobals" renaming "Trolltech.conf" to > "kdeglobals" > > I'm guessing these "renamings" are not real. If I look back through the logs > there are some large batches of these messages. If this is thread creep, apologies... I want to flag the issue here but maybe should follow it on Bug 435434 Having had a look through batches of "renaming" failures/messages, the files mentioned are newly created. I suspect deleted and rewritten, with newly created files reusing "just released" inodes. Is baloo proof against this? Gut feeling is it makes life that *little* bit more difficult. I am assuming if you delete a file and create a directory, you don't get the same inode. Could be though that that assumption is not correct in all cases :-/ I'm looking on Neon Testing (and an ext4 filesystem). I don't see any "renaming" messages on Fedora (BTRFS) ... Information from https://stackoverflow.com/questions/51019714/does-recreating-a-file-reuse-the-previous-inode
(In reply to tagwerk19 from comment #6) > I want to flag the issue here but maybe should follow it on Bug 435434 Yea, it looks like follow up from the issue that you reported in Bug 435434, so it should probably go there as well. > I suspect deleted and rewritten, with newly > created files reusing "just released" inodes. > > Is baloo proof against this? Baloo doesn't deal in inodes - it only cares about file paths, so unlikely any shenanigans with inodes will cause baloo to misbehave.
(In reply to Oded Arbel from comment #7) > Baloo doesn't deal in inodes - it only cares about file paths, so unlikely > any shenanigans with inodes will cause baloo to misbehave. Let's say "I'm not sure about that" Just looked at Opensuse and that's running with BTRFS so I think you won't see the same issues as I see on Neon (although it would be interesting if you see any "renaming" messages) Baloo deals with file renamings so has to "know" inodes - or some other invariant.
(In reply to tagwerk19 from comment #8) > (In reply to Oded Arbel from comment #7) > > Baloo doesn't deal in inodes - it only cares about file paths, so unlikely > > any shenanigans with inodes will cause baloo to misbehave. > Let's say "I'm not sure about that" ... > Baloo deals with file renamings so has to "know" inodes - or some other > invariant. I'm pretty sure about it. Baloo uses inotify (https://man7.org/linux/man-pages/man7/inotify.7.html) to track file system events. inotify sends notification about rename events, that includes the original path text and the new path text. inotify may handle inodes internally, but it does not expose that information to user-space application - including Baloo. Here is the relevant code: https://invent.kde.org/frameworks/baloo/-/blob/master/src/file/kinotify.h
(In reply to Oded Arbel from comment #7) > Baloo doesn't deal in inodes - it only cares about file paths, so unlikely > any shenanigans with inodes will cause baloo to misbehave. Thats wrong. Inotify uses pathes exclusively, so does any code in baloo based on inotify. The storage backend in baloo uses inodes.
(In reply to Stefan Brüns from comment #10) > Thats wrong. Inotify uses pathes [...] > The storage backend in baloo uses inodes. Yes, regards what I meant - I'm pretty sure that tagwerk19 was worried about using inodes for detecting renames, which is what I meant by "not using inodes".
(In reply to Stefan Brüns from comment #10) > Thats wrong. Inotify uses pathes exclusively, so does any code in baloo > based on inotify. > > The storage backend in baloo uses inodes. I see "id's" listed in the journalctl messages, the 5714849124383745 and 5643818955242497 in Comment 2 above... I was assuming these were hashes of the filename or something in combination with a modification time but it seems that they are a construct of the dev number and inode. I assume the "renaming" messages appear because baloo has missed the inotify changes and, after having woken up, discovered that a file it thought it knew about is really a different file. So, not so much "detecting" a rename but realising that one has happened. What I don't know is whether this is bad(tm) (Thanks for the conversation, it helps...)
(In reply to tagwerk19 from comment #12) > What I don't know is whether this is bad(tm) Oooooo... There could be a problem in that device numbers are not necessarily fixed. Looking at openSuse that has a set of BTRFS subvols. /Home is a subvol and its minor device number can (and does) change on reboot. You can see this by creating a testfile, "stat" it to get the device number, check that that device number appears in the balooshow results. Then reboot and stat the file again. I get, for example: $ stat testfile.txt File: testfile.txt Size: 14 Blocks: 8 IO Block: 4096 regular file Device: 38h/56d Inode: 5089 Links: 1 ... followed by, after a reboot: $ stat testfile.txt File: testfile.txt Size: 14 Blocks: 8 IO Block: 4096 regular file Device: 35h/53d Inode: 5089 Links: 1 ... I think baloo sees the change and reindexes the files... I've asked on Bug 402154 "Baloo reindexes everything after every reboot" to do this test and see if they see the device number changing.... It maybe that there's also a connection to Bug 401863
I'm not sure whether the original issue is fixed or still reproduceable... The issues with BTRFS have however been fixed: For KF6 https://invent.kde.org/frameworks/baloo/-/merge_requests/131 and cherrypicked for KF5 https://invent.kde.org/frameworks/baloo/-/merge_requests/169 Will leave this a "Waiting for Info" in case you want to keep the call alive
I can no longer reproduce the issue using the original reproducer on my current Plasma 6.1.3 setup. That's mostly what I can say about the situation.
Thank you!