SUMMARY Baloo splits and searches when files and directories are the following: 1. kebab-case 2. snake_case 3. space separated However, Baloo does not do well with camelCase. As someone who was using camelCase for years, I just chalked it up as Baloo being bad for searching for filenames, when it was just this kind of naming scheme that it was bad at. STEPS TO REPRODUCE 1. In Dolphin, create the following in some subdirectory: "oneFileTest.txt", "one_file_test.txt", "one-file-test.txt", and "one file test.txt" 2. Use Baloo in Dolphin via `ctrl+f` and search for the strings "one", "file", and "test", filtering them with the "Filter" feature (pressing `/` with the preview pane focused) if needed OBSERVED RESULT All 4 of the results show up when you search "one" but the "oneFileTest.txt" does not show up with the other searches EXPECTED RESULT All 4 of the results would ideally show up in all the searches SOFTWARE/OS VERSIONS Linux/KDE Plasma: Arch 6.8.2-arch2-1 KDE Plasma Version: 6.0.2 KDE Frameworks Version: 6.0.0 Qt Version: 6.6.2 ADDITIONAL INFORMATION For me, Baloo had a bit of a bad rep because I'd search for sub-strings of filenames I knew existed, but it would not find because I was using camelCase for years. I feel like if Baloo would work with camelCase and PascalCase like it does the others, a lot of people would have a much better time using it.
What a nice idea! A vote of support :-)
CamelCase is not actually something which can be trivially split. Yes, it would work for the cases you presented, but there are too many cases where it would not work, e.g. mixed-case acronyms. While these are not very common for english acronyms, baloo also has to work for other languages. Also, trademark names often have mixed cases (either because they are actually acronyms, to let them stand out, or to just make them trademark-able at all).
(In reply to Stefan Brüns from comment #2) > ... e.g. mixed-case acronyms ... trademark names ... Hmmm.... So things like "iPad" and "NaN" ... ... or "LaTeX" :-/ ... or "McArthur" It probably wouldn't matter if you found "iPad" when searching for "pad". I think whatever algorithm used would still need to index "ipad", "nan" and "latex". On the plus side, the benefit of just being able to search C++ code would be remarkable. I think a list of "awkward edge cases" (or is that awkwardEdgeCases?) would be needed to see if there are useful patterns or traps...
(In reply to Stefan Brüns from comment #2) > ... While these are not very common for english acronyms ... If I look through: https://en.wikipedia.org/wiki/Lists_of_acronyms there a small handful. Haven't read it all... I would say, provided that Baloo indexes the whole name, it could helpfully split on the "camelCase" boundaries. Would need to avoid single letters (mW, MiB, IoT etc). The question is what "traps for unwary" look like in other languages...
(In reply to tagwerk19 from comment #4) > (In reply to Stefan Brüns from comment #2) > The question is what "traps for unwary" look like in other languages... German: BAFöG, MwSt, GmbH ;-)
(In reply to Stefan Brüns from comment #5) > German: BAFöG, MwSt, GmbH ;-) OK, I'll give you MwSt :-) If I look through: https://en.wikipedia.org/wiki/List_of_German_abbreviations I also get KaDeWe, DuÖAV, HTBLuVA, KfzPflVV, StGB, StVO If we get too many exceptions, we could have a list of "known acronyms", look this up and avoid splitting those words. An option to have in reserve perhaps...