Summary: | Filesearch runner does not find files that don't have any category assigned | ||
---|---|---|---|
Product: | [Plasma] krunner | Reporter: | Schlaefer <openmail+kde> |
Component: | filesearch | Assignee: | baloo-bugs-null |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | alexander.lohnau, bonirar716, natalie_clarius, nate, plasma-bugs, tagwerk19, yvan |
Priority: | NOR | ||
Version: | 5.25.4 | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
See Also: | https://bugs.kde.org/show_bug.cgi?id=464583 | ||
Latest Commit: | Version Fixed In: | 6.0 | |
Sentry Crash Report: |
Description
Schlaefer
2022-08-05 12:19:09 UTC
I do not know how to fix this issue since I do not see a way to get all results and find the category for them. We for sure want to use the categories to group the results in the UI. Maybe sb. who knows baloo better can give this a look. I think the problem is that Baloo does not assign the file any category, and consequently it won't show up in the file search results that specifically query the individual categories in the code you linked to. Note the information in Terms: > $ touch myfile.foobar > $ balooshow -x myfile.foobar > 94995300010303 66307 9738579 yourfile.foobar [/home/natalie/myfile.foobar] > Mtime: 1660051514 2022-08-09T15:25:14 > Ctime: 1660051514 2022-08-09T15:25:14 > > Internal Info > Terms: Mapplication Moctet Mstream > File Name Terms: Ffoobar Fmyfile > XAttr Terms: As opposed to a .txt file: > $ touch theirfile.txt > $balooshow -x theirfile.txt > 94984b00010303 66307 9738315 test.foobar [/home/natalie/theirfile.txt] > Mtime: 1660050158 2022-08-09T15:02:38 > Ctime: 1660050158 2022-08-09T15:02:38 > > Internal Info > Terms: Mplain Mtext T5 T8 > File Name Terms: Ftheirfile Ftxt > XAttr Terms: "T5" and "T8" indicate categories Document and Text, see https://invent.kde.org/frameworks/kfilemetadata/-/blob/master/src/types.h#L20. "Mplain" and "Mtext" indicate the mime type. > file --mime-type theirfile.txt > theirfile.txt text/plain When a file is created empty with an unknown file extension, it gets mime type "inode/x-empty": > $ file --mime-type myfile.foobar > myfile.foobar: inode/x-empty This does not get mapped to any category by Baloo: https://github.com/KDE/baloo/blob/509dcd8da6b2e21723838f27003ac72d9a267a1a/src/file/basicindexingjob.cpp#L63 When the file has text content, even with an unknown file extension, it gets mime type "text/plain": > $ echo "test" > myfile.foobar > $ file --mime-type myfile.foobar > myfile.foobar: text/plain But for some reason Baloo does not represent this in the term information, the *.foobar has mime type ("M") "application/octet/stream" and no categories ("T"), even if the file is created non-empty and has mime type text/plain to begin with. So something seems to go wrong already on the side of Baloo for not setting the terms information correctly. But independently of that, it may be useful to be able to also retrieve results that do not have any category. Either an additional query in the runner for Baloo results with empty type, if such a query is possible. Or by adding to Baloo a generic fallback type for any file that has not received any other type, and adding that in the runner. A possibly relevant merge request was started @ https://invent.kde.org/plasma/plasma-workspace/-/merge_requests/2006 Probably the same: https://bugs.kde.org/show_bug.cgi?id=442898 https://bugs.kde.org/show_bug.cgi?id=420339 *** Bug 442898 has been marked as a duplicate of this bug. *** A possibly relevant merge request was started @ https://invent.kde.org/frameworks/kfilemetadata/-/merge_requests/62 A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/85 There's an extra twist... You did a: touch myfile.foobar balooshow -x myfile.foobar and got Terms: Mapplication Moctet Mstream If I try this (on Neon), I get: Terms: Mapplication Mx Mzerosize If I do the same for myfile.txt touch myfile.txt balooshow -x myfile.txt I still get: Terms: Mapplication Mx Mzerosize The "twist" seems to be if baloo is doing content indexing it flags all(?) empty files as "application/x-zerosize". If I purge and reindex without content, I get "application/octet-stream" for the empty myfile.foobar (and "text/plain" for myfile.txt) If I look a bit closer with kmimetypefinder, I get: $ kmimetypefinder myfile.txt text/plain $ kmimetypefinder myfile.foobar application/x-zerosize and also: $ echo "Hello Penguin" > myfile.foobar $ kmimetypefinder myfile.foobar text/plain and that seems reasonable: For an empty file, if the filename indicates a mimetype, use it; if not, say application/x-zerosize. I'd say baloo ought to give the same results here irrespective of whether it is content indexing or not and it would probably make sense if it follows kmimetypefinder logic, so: "text/plain" for an empty myfile.txt "application/x-zerosize" for an empty unrecognised filetype (myfile.foobar, in this case) "text/plain" for an unrecognised filetype with (text) content Krunner would then list an empty myfile.txt but not an empty myfile.foobar. Maybe this is good enough? or am I missing something? (In reply to Natalie Clarius from comment #4) > Probably the same: > https://bugs.kde.org/show_bug.cgi?id=442898 > https://bugs.kde.org/show_bug.cgi?id=420339 Yes, I think so. Thanks, I'll flag as a duplicate. *** This bug has been marked as a duplicate of bug 420339 *** Is this a duplicate? The original issues doesn't depend on an empty file, that's just an coincidence of the simplified example? Yes. The issue is that the files don't have a category assigned. One of the cases where this happens is for empty files; matlab, IPython notebook files and the like are are other instances of the same problem: Baloo doesn't assign a T term, so KRunner doesn't retrieve them. I think there should be an easy way to "open up" the search criteria in krunner to show all results, something like "Show more?". It makes me a bit uncomfortable that krunner and baloosearch can give different sets of answers, for me that goes against the principle of "least surprise". The behaviour with empty files muddles the issue and it would be nice to sort out (within baloo) (In reply to Natalie Clarius from comment #11) > ... matlab, IPython notebook files and the like ... At the risk of going down rabbit holes and on the assumption that others have a better understanding of what's happening: Matlab files: Ought to be recognised although "*.m" also matches "text/x-objcsrc" in the freedesktop.org mimetype list. kmimefiletype depends on "magic" For a file "test.m", baloo indexes content and baloosearch finds the file by name and content. Krunner lists it as text (Neon Testing) IPython Notebook files: kmimefiletype shows a "test.ipynb" file as "application/x-ipynb+json" Baloo does not index content and baloosearch only finds the file by name. Krunner does not list the file. That's what we're trying to do. That some files are missing is a bug which the open MR is intended to solve, not an intentional restriction which should be kept or worked around by complicating the UI.(In reply to tagwerk19 from comment #12) > I think there should be an easy way to "open up" the search criteria in > krunner to show all results, something like "Show more?". It makes me a bit > uncomfortable that krunner and baloosearch can give different sets of > answers, for me that goes against the principle of "least surprise". > > The behaviour with empty files muddles the issue and it would be nice to > sort out (within baloo) That's what we're trying to do. That some files are missing is a bug which the open MR is intended to solve, not an intentional restriction which should be kept or worked around by further complicating the UI. Of course there is still the fact that KRunner will cap the overall amount of results shown, but that's not specific to the Baloo runner and a topic for a different thread if at all (In reply to tagwerk19 from comment #13) > Matlab files: > > Krunner lists it as text (Neon Testing) Ah, right, that was a different bug (files of category "text" not found) that got fixed with https://invent.kde.org/plasma/plasma-workspace/-/merge_requests/1658. (In reply to Natalie Clarius from comment #16) > ... Ah, right, that was a different bug (files of category "text" not found) ... Think there might still be some bits to untangle... For a "test.m" file, an empty one to start with, and without indexing file content, in an up to date Neon Testing: Plasma: 5.25.5 Frameworks: 5.97.0 Qt: 5.15.5 I get: $ touch test.m $ kmimetypefinder test.m text/x-matlab $ balooshow -x test.m Terms: Mmatlab Mtext Mx T8 $ krunner test.m Listed. Categorised as "Text" Then a file that does not match any of the "magic" in the mimetypes list: $ echo "Hello Penguin" > test.m $ kmimefiletype test.m text/x-objcsrc $ balooshow -x test.m Terms: Mmatlab Mtext Mx T8 (*1) $ krunner test.m Listed. Categorised as "Text" Then one that matches the "magic": $ echo "##Hello Penguin" > test.m $ kmimefiletype test.m text/x-matlab $ balooshow -x test.m Terms: Mmatlab Mtext Mx T8 $krunner test.m Listed. Categorised as "Text" Not sure what baloo is doing in "*1" above but the rest seems OK. Purging and reindexing with file content: for an empty "test.m" file: $ rm test.m; touch test.m $ kmimetypefinder test.m text/x-matlab $ balooshow -x test.m Terms: Mapplication Mx Mzerosize (*2) $ krunner test.m Not Listed, except as one of the "Recent Files" Then a file that does not match any of the "magic" in the mimetypes list: $ echo "Hello Penguin" > test.m $ kmimefiletype test.m text/x-objcsrc $ balooshow -x test.m Terms: Mmatlab Mtext Mx T8 X20-1 hello penguin (*3) $ krunner test.m Not Listed, except as one of the "Recent Files" (*4) Then one that does match the "magic": $ echo "##Hello Penguin" > test.m $ kmimefiletype test.m text/x-matlab $ balooshow -x test.m Terms: Mmatlab Mtext Mx T8 X20-1 hello penguin $krunner test.m Not Listed, except as one of the "Recent Files" (*4) I think the application/x-zerosize (the *2) the baloo seems to add is a shame, I think this needs a fix but can see in this case that krunner wouldn't list the list the file (as it's not text) Not sure what's happening with "*3", it's the same behaviour as "*1" further up. There might be some double guessing going on. I think I'd be happier trusting the mime type data. Also not sure what's happening with "*4", that also doesn't seem right. According to Comment 2, the T8 implies the file is Text and my assumption is that Krunner should then list it. It probably doesn't matter about the ambiguity with the mime type (text/x-objcsrc or text/x-matlab) as they are both "text". It might matter in other cases. It's disturbing that krunner gives you better results if baloo is not indexing content 8-] I realise this is an edge case and this writeup is a bit long but I've deliberately dug down as it might pinpoint something underlying. I would be happy to repeat with other examples and see if there are patterns (I think that's part of triaging...) For the files that baloo doesn't assign a T-term like application/x-zerosize, this is what the currently open MR would fix. For the files that are only listed as recent files, this doesn't mean that the baloo runner doesn't find them. It's just that KRunner filters out duplicates, and the same file found by both by baloo and among recent files is such a case. I'm not sure about the logic which of the two results (the baloo runner one or the recent files one) wins, but that would be a separate issue. The man point is that KRunner overall will find the file. I haven't actually done a test run with file content indexing disabled but you could test my recent files hypothesis (i.e. the file is found by the baloo runner, it just gets outrun by the recent files match) by disabling the recent files plugin and seeing if then the file shows up as a text file result. Thanks for the help in figuring this out! (In reply to Natalie Clarius from comment #19) > ... disabling the recent files plugin and seeing if then the file shows up as a text file ... Good catch. Yes, if I disable the Recent Files plugin, I see the test.m file listed as Text. However, I would expect "Recent Files" to work the same way, independent of whether baloo is indexing content or not. We might have explained "*4" but maybe we now have a "*5" :-) (In reply to Natalie Clarius from comment #18) > For the files that baloo doesn't assign a T-term like application/x-zerosize, this is what > the currently open MR would fix... I'll stick with baloo filename search results should not depend on whether content indexing is enabled or not. In this instance, baloo should consider an empty "test.txt" as "Text". However Krunner does seem to be doing something extra: $ krunner test.txt lists an empty "text.txt" as "Document" ... The baloo and recent files runner plugins don't change their behavior depending on whether content indexing is enabled. If there are differences in the runner results, it's due to Baloo sending different matches. If context indexing is enabled, Baloo may find matches other than the text file, which can change the relative ranking of the file match, and might explain why in this situation it loses against the recent files result. I'm not sure it's unexpected that in general, type assignment can be influenced by also taking content into account. Specifically that "application/x-zerosize" is preferred over "text/plain" for an empty text file is perhaps less ideal. That's an issue on the side of the indexing service rather than the runner plugin though, so if you think that's an issue I would suggest filing a bug report for Baloo. But in any event, the runner plugin doesn't do anything extra to the type assignment. If it reports a file as being type document, then that's information it got from Baloo. (In reply to Natalie Clarius from comment #22) > ... If there are differences in the runner results, it's due to Baloo sending > different matches... If it reports a file as being type document, then that's > information it got from Baloo ... Is this something that can be seen by setting debugs flags? I tried creating ~/.config/QtProject/qtlogging.ini with: [rules] kf.*.debug=true This gave some information but not results from a baloo "lookup". How much debug output you see depends on how many debug statement have been set in the source code; Baloo and the runner are currently not very verbose in that respect, so if you want to dig deeper and generate more info about what's going on, you'd have to build baloo and plasma-workspace from source and set some debug statements yourself. The types you can get from the T-terms with balooshow -x, as you've already done. "Document" is type #5 (i.e. T5); see https://invent.kde.org/frameworks/kfilemetadata/-/blob/master/src/types.h#L20. For the matches Baloo finds (and reports to the runner plugin), you can run `baloosearch`. *** Bug 464583 has been marked as a duplicate of this bug. *** |