SUMMARY A few emojis appear duplicated. They have the same name and look the same. This is very noticeable in the symbols tab, for instance. I've tried using "noto-fonts-emoji" and "ttf-joypixels", I get the same issue with both; it seems completely independent from the font used. STEPS TO REPRODUCE 1. Open the Emoji selector; 2. Go to the "Symbols" tab. OBSERVED RESULT Lots of emojis are repeated. EXPECTED RESULT They shouldn't be repeated SOFTWARE/OS VERSIONS Linux/KDE Plasma: 5.11.4 (available in About System) KDE Plasma Version: 5.21.2 KDE Frameworks Version: 5.79.0 Qt Version: 5.15.2
Created attachment 136522 [details] Duplicate emojis
I can reproduce the issue for those and other emojis under the Symbols category.
I did a little research on this issue; documenting my findings for anyone (possibly myself) able to fix this in the future. The issue is that the emojier app pulls its lists of emojis from ibus (or libibus), and this list contains *all* emoji forms, including versions of emoji that are not fully qualified. So e.g. there are four transgender flags 🏳️⚧️ because there are three unqualified forms. However, Unicode say [1]: > The recommended behavior is: > > User Input: > > only fully-qualified emoji zwj sequences should be generated by keyboards and other user input devices. In other words, the correct behavior for the emojier would be to only display as selectable options the emoji forms that are fully qualified. Probably the appropriate way to fix this would be to filter the results from ibus at [2]. [1] https://unicode.org/reports/tr51/#Emoji_Variation_Selector_Notes [2] https://invent.kde.org/plasma/plasma-desktop/-/blob/master/applets/kimpanel/backend/ibus/emojier/emojierplugin.cpp
Thanks for investigating! Feel free to submit a merge request, even if it's a speculative one.
I had a look at fixing this and ran into problems. Specifically, the ibus interface for fetching emojis is *really* limited. In particular: 1. Ibus itself is affected by this bug. Its internal emoji tool shows the four transgender flags 🏳️⚧️ although they are (incorrectly!!) hidden under variants. 2. None of the data we can easily get out of Ibus contains information about whether a particular emoji is fully qualified or not. 3. Ibus's variant handling is slapped on top of a data format that was clearly not built to support modifiers. At runtime, they take their pregenerated dictionary of emoji and extract a single "base" character which is used to hide any emoji sequence beginning with that character behind a menu. This is incorrect behavior - as a result the rainbow and trans flags are hidden behind a white flag, since that's the first character of their sequence. For these reasons I think the only way forward is for KDE to maintain its own emoji dataset and update it with every Unicode release. The basic problem is that it's very hard to know whether a given emoji is fully qualified or not without using hardcoded data. For example: The code receives the emoji 1F3F4 (🏴). It needs to know whether this emoji is fully qualified (if not, it discards it). It turns out this one is. But 1F3F3 (🏳) is not. The *only* reason there's a difference is that the black flag is listed in the emoji data files as having a default emoji presentation, and the white flag does not (it has a text presentation on most platforms). In addition, if you want proper handling of modifiers using sub-menus (which would be really nice, especially if you could set a gender / skin color default), you'll probably need more data than you can get out of Ibus. Not sure if I'm the right person to work on this. My knowledge of C++ is pretty limited. I could probably code an ugly workaround, like stripping out the presentation selector (assuming QString allows iterating by code points), and using a hashmap to store and find the longest emoji sequence in a group, if this is unlikely to be fixed any other way.
> For these reasons I think the only way forward is for KDE to maintain its own emoji > dataset and update it with every Unicode release This is very likely outside of the scope of what we can feasible do and maintain. I think a more fruitful approach might be to go fix the bugs in IBus instead.
(In reply to Nate Graham from comment #6) > This is very likely outside of the scope of what we can feasible do and > maintain. I think a more fruitful approach might be to go fix the bugs in > IBus instead. Fair point. I was worried that if we were waiting on IBus to implement better APIs we'd have to wait (at least) until they release a new stable version. They seem to do only one or two a year. That said, I've filed an upstream bug so we can get their feedback on possible improvements to the API: https://github.com/ibus/ibus/issues/2356 Not adding it to the URL field since you can't have multiple URLs in Bugzilla and we're not closing this as an upstream bug since it requires more work on our end.
*** Bug 448262 has been marked as a duplicate of this bug. ***
*** Bug 448367 has been marked as a duplicate of this bug. ***
Can anyone still reproduce this? Seems to be fixed for me!
Also cannot reproduce. I guess it must have been fixed upstream somewhere.
Oh wait, no, we fixed this ourselves, and we did eventually end up duplicating the data, it seems. I hope someone remembers to keep it up to date.
This is fixed in https://invent.kde.org/plasma/plasma-desktop/-/commit/8e251dbce5dd95e43074acc8d43926ae8e004119 Is the intention to track releases of Noto? I'm happy to subscribe myself to Noto Emoji releases and make sure a PR gets sent as necessary, though I shouldn't be the only person tracking this. Incidentally, the data appears to need to be updated already. This script says that the data was last generated for Emoji 14.0, and Noto has since released for 15.0. I can confirm that the emoijer is missing emoji from 15.0, e.g. the jellyfish.
That would be amazing, thanks. Even more amazing would be a bot that does that automatically, like the FlatHub bot that tracks upstream releases and submits PRs automatically.