| Summary: | [Enhancement] Have Baloo split camelCase words | ||
|---|---|---|---|
| Product: | [Frameworks and Libraries] frameworks-baloo | Reporter: | Kody <kodyvonbargen> |
| Component: | general | Assignee: | baloo-bugs-null |
| Status: | REPORTED --- | ||
| Severity: | normal | CC: | postix, stefan.bruens, tagwerk19 |
| Priority: | NOR | ||
| Version First Reported In: | 6.0.0 | ||
| Target Milestone: | --- | ||
| Platform: | Arch Linux | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
|
Description
Kody
2024-03-31 17:05:55 UTC
What a nice idea! A vote of support :-) CamelCase is not actually something which can be trivially split. Yes, it would work for the cases you presented, but there are too many cases where it would not work, e.g. mixed-case acronyms. While these are not very common for english acronyms, baloo also has to work for other languages. Also, trademark names often have mixed cases (either because they are actually acronyms, to let them stand out, or to just make them trademark-able at all). (In reply to Stefan Brüns from comment #2) > ... e.g. mixed-case acronyms ... trademark names ... Hmmm.... So things like "iPad" and "NaN" ... ... or "LaTeX" :-/ ... or "McArthur" It probably wouldn't matter if you found "iPad" when searching for "pad". I think whatever algorithm used would still need to index "ipad", "nan" and "latex". On the plus side, the benefit of just being able to search C++ code would be remarkable. I think a list of "awkward edge cases" (or is that awkwardEdgeCases?) would be needed to see if there are useful patterns or traps... (In reply to Stefan Brüns from comment #2) > ... While these are not very common for english acronyms ... If I look through: https://en.wikipedia.org/wiki/Lists_of_acronyms there a small handful. Haven't read it all... I would say, provided that Baloo indexes the whole name, it could helpfully split on the "camelCase" boundaries. Would need to avoid single letters (mW, MiB, IoT etc). The question is what "traps for unwary" look like in other languages... (In reply to tagwerk19 from comment #4) > (In reply to Stefan Brüns from comment #2) > The question is what "traps for unwary" look like in other languages... German: BAFöG, MwSt, GmbH ;-) (In reply to Stefan Brüns from comment #5) > German: BAFöG, MwSt, GmbH ;-) OK, I'll give you MwSt :-) If I look through: https://en.wikipedia.org/wiki/List_of_German_abbreviations I also get KaDeWe, DuÖAV, HTBLuVA, KfzPflVV, StGB, StVO If we get too many exceptions, we could have a list of "known acronyms", look this up and avoid splitting those words. An option to have in reserve perhaps... |