SUMMARY "Enter a search term or character here", try an emoji like: 😕 1F615 Confused Face The Unicode value or assigned name "confused face", works, but not the actual emoji if you copy/paste that. This also doesn't appear to work at filtering to a single character, which might be the actual problem. STEPS TO REPRODUCE 1. Add "😕" to the search field. OBSERVED RESULT Nothing happens. EXPECTED RESULT Results should filter to this glyph/emoji. SOFTWARE/OS VERSIONS Linux/KDE Plasma: Manjaro KDE (up to date), Kernel 4.19.69-1-MANJARO KDE Plasma Version: 5.16.4 KDE Frameworks Version: 5.61.0 Qt Version: 5.13.0 ADDITIONAL INFORMATION I can type "confused fac" 1 result(1F615), but "confused face" returns 2 results(U+FACE as 2nd result, which has no mention of "confused"). It appears, you can enter multiple unicode values to add that direct glyph/character to the results ignoring any other keywords, bug? You can also add any emoji, and it appears to be treated as an empty character, not impacting results, as if it was never entered.It appears to be treated similar to a blank/space. " confused", remove "confused" and results are not updated, clear the entire field and it'll reset. Search just " " and no impact either(which is fine).
Emojis do appear to work if you add a space before or after. For single characters it appears that you need a space before and after.
☝ 261D WHITE UP POINTING INDEX ✌ 270C VICTORY HAND These two and others behave a little differently. Paste the glyph into the search field, then backspace once(nothing is deleted), then space two times, now a result shows. Paste and add space and no effect. Unclear what causes that, perhaps it's due to UTF-8/UTF-16 differences(inbetween values you'd get for single latin character or an emoji, latin is 1 UTF-8 char, emoji seem to be around 4 UTF-8, and these glyphs are 3 UTF-8 but only 1 UTF-16). Perhaps there will be similar issues with other glyphs/emoji like ZWJ (Zero Width Joiner), which combine multiple emoji together to form a single one.
The previous comment provides glyphs I copied over but copying the contents here won't allow for reproducing the issue. It does not appear to be something I can copy directly here, but can be sourced from getemoji.com. Presumably they included some other data when copied, like a skin tone. The two glyphs/emoji from the prior comment were rendering in black/white in Chrome for me, which I've tracked down to be from Noto Sans Symbols2(The DejaVu Sans equivalent does not take precedence for some reason, despite being listed earlier by fc-match query). --- 🕵 1F575 SLEUTH OR SPY + ♀ 2640 FEMALE SIGN == 🕵️♀️ The above glyph/symbol on the right should be a single one that you can select and input into the KCharSelect search field. In my case I see it rendered as two glyphs that I've shared earlier, but are treated as a single glyph with selection and arrow key navigation. You cannot remove the 1st part of the glyph, but you can remove the 2nd part via backspace(twice), The 1st part requires two backspaces, followed by a space to identify as the given 1st part of the glyph in results. (🕵️) This seems to work with copy/paste for reproduction. It requires two backspaces before the space. (🕵) This one does not, and is the result after the two backspaces, just add a space to get results(before or after the glyph). --- The issue I'm describing might be related to code points? https://emojipedia.org/female-sleuth/ Consists of these codepoints: 🕵 U+1F575 ️ U+FE0F U+200D ♀ U+2640 ️ U+FE0F FE0F is invisible codepoint for Variation Selector-16: https://emojipedia.org/variation-selector-16/ 200D is invisible codepoint for Zero Width Joiner: https://emojipedia.org/zero-width-joiner/ 😕 1F615 Confused Face, single codepoint https://emojipedia.org/rolling-on-the-floor-laughing/ ☝ 261D WHITE UP POINTING INDEX ✌ 270C VICTORY HAND Both are two codepoints(the 2nd codepoint is the invisible FE0F variation selector-16) https://emojipedia.org/white-up-pointing-index/ https://emojipedia.org/victory-hand/ That solves the backspace mystery :) While some emoji with a single codepoint are happy to return results with a single space character added, those that required two spaces, can add those in any order(" q", " q ", "q ") and it'll work, didn't need to be strictly before and after. I am not sure what makes an emoji like "Confused Face" only require a single space to trigger results.
The search field will only start its incremental searching when at least three characters are typed, or when Return is pressed. The idea is to avoid showing 10000+ results when starting to type with common letters such as 'e' or 'a'. KCharSelect doesn't implement the Unicode Emoji standard. It only works with codepoints. The reason a single space is sufficient for SMP characters is indeed a bug. It counts the number of UTF-16 codepoints, and Emoticon characters need two, so a single space raises it to three. In other words, if I fixed this bug, you would need to add two spaces. I doubt this is what you were after. Could you clarify how the decision when to start the incremental search could be improved?
The word FACE is unfortunately also a hex word. A group of 4 or 5 hex digits are treated as codepoints.
> I doubt this is what you were after. Actually, consistency would help at least. The mixed space behaviour was confusing. > Could you clarify how the decision when to start the incremental search could be improved? How do users tend to make use of the search field? If I want all glyphs related to "q" and I notice I get instant results with some inputs, then as a user, it'd be assumed with no obvious min input requirement, that adding spaces to fill UTF-16 values to a count of 3 is non-obvious. If I want to search a emoji such as "🤣", again, copy/paste to the search field has no immediate result. I only found out the added space inputs triggered a result somehow by mistake, later learning order was not important. The minimum requirement doesn't help but make it confusing here for what I'd imagine is a common use-case, to lookup a single character/glyph(not knowing the text name or unicode value(or whatever U+1F923 is)). Getting plenty of results during input isn't an issue, it already shows everything in the current subsection for an empty search field. It's not a realistic performance concern, so as the user types multiple characters into the query, those results will filter regardless? Some indication of minimum input would otherwise be helpful. You mention the user can press "return/enter" key to avoid the spaces, but there is no UI "search" button, just immediate results after the min input is reached, thus as a user it's a less obvious action(beyond assuming enter on a field might do the expected behaviour, but this for me was dispelled as I had seen the immediate results with searching unicode values previously, it just did not occur to me to try). > KCharSelect doesn't implement the Unicode Emoji standard. It only works with codepoints. Could you detect this like Konsole does? It notices invisible/zero-width codepoints and offers to remove them. Perhaps you could remove FE0F(although valid to search by this value, but not it's "rendered" glyph) and the zwj codepoint, such that the female spy would paste two separate glyphs(you could separate them via a space perhaps?). Alternatively, upon detection, straight-up inform the user that this type of input is not supported by KCharSelect, only single/individual codepoints(and allow the user to figure out what that means). > The word FACE is unfortunately also a hex word. A group of 4 or 5 hex digits are treated as codepoints. Yes, I understand that. My confusion was why is that result being returned when the other part of the query has "confused" which has nothing to do with the FACE result? For example "1F601 😂 1F923" as a query, will return only the two codepoints specified, the emoji glyph in the middle is omitted. Similarly "😁 😂 🤣" equates to no results. Something is wrong with the query/filtering here, unicode values are kind of treated as "OR" but the emoji glyphs are like "AND" for keywords(as in they won't return a result unless all keywords are relevant". "confused 😕 fac" is similar, the emoji glyph itself doesn't appear to have any effect at impacting the results, only by itself as "😕 ". --- SUMMARY It'd be nice if there was some consistency in these behaviours, and if certain inputs are not supported, that they could inform the user. Of interest might be this article: https://hsivonen.fi/string-length/ It points out how various languages handle such input/strings differently and why. Perhaps the mentioned Rust crate could be used to improve the current parsing? (though being another language probably makes that a hard no?)
> Getting plenty of results during input isn't an issue, it already shows everything in the current subsection for an empty search field. It's not a realistic performance concern, so as the user types multiple characters into the query, those results will filter regardless? Yes, it is problem, because search is not limited to the current subsection, but it searches the complete Unicode range of 100.000+ code points. Try typing "a a" and see the list you would already get when a single "a" without pressing the Return key was entered. Or try "c c"...