434184 – Duplicate emojis

Bug 434184 - Duplicate emojis

Summary: Duplicate emojis

Status:	RESOLVED FIXED

Alias:	None

Product:	plasmashell
Classification:	Plasma
Component:	Emoji Selector (show other bugs)
Version:	5.21.2
Platform:	Arch Linux Linux

Importance:	NOR normal
Target Milestone:	1.0
Assignee:	Plasma Bugs List

URL:
Keywords:

Duplicates (2):	448262 448367 (view as bug list)
Depends on:
Blocks:

Reported:	2021-03-09 11:58 UTC by João Figueiredo
Modified:	2023-06-13 21:17 UTC (History)
CC List:	5 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Duplicate emojis (31.56 KB, image/png) 2021-03-09 11:59 UTC, João Figueiredo	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description João Figueiredo 2021-03-09 11:58:13 UTC

SUMMARY
A few emojis appear duplicated. They have the same name and look the same. This is very noticeable in the symbols tab, for instance. I've tried using "noto-fonts-emoji" and "ttf-joypixels", I get the same issue with both; it seems completely independent from the font used.

STEPS TO REPRODUCE
1. Open the Emoji selector;
2. Go to the "Symbols" tab.

OBSERVED RESULT
Lots of emojis are repeated.

EXPECTED RESULT
They shouldn't be repeated

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 5.11.4
(available in About System)
KDE Plasma Version: 5.21.2
KDE Frameworks Version: 5.79.0
Qt Version: 5.15.2

Comment 1 João Figueiredo 2021-03-09 11:59:02 UTC

Created attachment 136522 [details]
Duplicate emojis

Comment 2 Nate Graham 2021-03-09 19:37:34 UTC

I can reproduce the issue for those and other emojis under the Symbols category.

Comment 3 Adam Fontenot 2021-10-05 13:04:42 UTC

I did a little research on this issue; documenting my findings for anyone (possibly myself) able to fix this in the future.

The issue is that the emojier app pulls its lists of emojis from ibus (or libibus), and this list contains *all* emoji forms, including versions of emoji that are not fully qualified. So e.g. there are four transgender flags 🏳️‍⚧️ because there are three unqualified forms.

However, Unicode say [1]:

> The recommended behavior is:
> 
> User Input:
> 
>    only fully-qualified emoji zwj sequences should be generated by keyboards and other user input devices.

In other words, the correct behavior for the emojier would be to only display as selectable options the emoji forms that are fully qualified. Probably the appropriate way to fix this would be to filter the results from ibus at [2].

[1] https://unicode.org/reports/tr51/#Emoji_Variation_Selector_Notes

[2] https://invent.kde.org/plasma/plasma-desktop/-/blob/master/applets/kimpanel/backend/ibus/emojier/emojierplugin.cpp

Comment 4 Nate Graham 2021-10-05 15:04:25 UTC

Thanks for investigating! Feel free to submit a merge request, even if it's a speculative one.

Comment 5 Adam Fontenot 2021-10-06 07:48:16 UTC

I had a look at fixing this and ran into problems. Specifically, the ibus interface for fetching emojis is *really* limited. In particular:

1. Ibus itself is affected by this bug. Its internal emoji tool shows the four transgender flags 🏳️‍⚧️ although they are (incorrectly!!) hidden under variants.

2. None of the data we can easily get out of Ibus contains information about whether a particular emoji is fully qualified or not.

3. Ibus's variant handling is slapped on top of a data format that was clearly not built to support modifiers. At runtime, they take their pregenerated dictionary of emoji and extract a single "base" character which is used to hide any emoji sequence beginning with that character behind a menu. This is incorrect behavior - as a result the rainbow and trans flags are hidden behind a white flag, since that's the first character of their sequence.

For these reasons I think the only way forward is for KDE to maintain its own emoji dataset and update it with every Unicode release. The basic problem is that it's very hard to know whether a given emoji is fully qualified or not without using hardcoded data. For example:

The code receives the emoji 1F3F4 (🏴). It needs to know whether this emoji is fully qualified (if not, it discards it). It turns out this one is. But 1F3F3 (🏳) is not. The *only* reason there's a difference is that the black flag is listed in the emoji data files as having a default emoji presentation, and the white flag does not (it has a text presentation on most platforms).

In addition, if you want proper handling of modifiers using sub-menus (which would be really nice, especially if you could set a gender / skin color default), you'll probably need more data than you can get out of Ibus.

Not sure if I'm the right person to work on this. My knowledge of C++ is pretty limited. I could probably code an ugly workaround, like stripping out the presentation selector (assuming QString allows iterating by code points), and using a hashmap to store and find the longest emoji sequence in a group, if this is unlikely to be fixed any other way.

Comment 6 Nate Graham 2021-10-07 16:51:45 UTC

> For these reasons I think the only way forward is for KDE to maintain its own emoji
> dataset and update it with every Unicode release
This is very likely outside of the scope of what we can feasible do and maintain. I think a more fruitful approach might be to go fix the bugs in IBus instead.

Comment 7 Adam Fontenot 2021-10-07 23:05:09 UTC

(In reply to Nate Graham from comment #6)
> This is very likely outside of the scope of what we can feasible do and
> maintain. I think a more fruitful approach might be to go fix the bugs in
> IBus instead.

Fair point. I was worried that if we were waiting on IBus to implement better APIs we'd have to wait (at least) until they release a new stable version. They seem to do only one or two a year.

That said, I've filed an upstream bug so we can get their feedback on possible improvements to the API: https://github.com/ibus/ibus/issues/2356

Not adding it to the URL field since you can't have multiple URLs in Bugzilla and we're not closing this as an upstream bug since it requires more work on our end.

Comment 8 Nate Graham 2022-01-11 16:20:29 UTC

*** Bug 448262 has been marked as a duplicate of this bug. ***

Comment 9 Gabriel 2022-02-22 17:18:05 UTC

*** Bug 448367 has been marked as a duplicate of this bug. ***

Comment 10 João Figueiredo 2023-06-09 12:55:36 UTC

Can anyone still reproduce this? Seems to be fixed for me!

Comment 11 Nate Graham 2023-06-09 20:54:10 UTC

Also cannot reproduce. I guess it must have been fixed upstream somewhere.

Comment 12 Nate Graham 2023-06-09 20:55:19 UTC

Oh wait, no, we fixed this ourselves, and we did eventually end up duplicating the data, it seems.

I hope someone remembers to keep it up to date.

Comment 13 Adam Fontenot 2023-06-10 23:19:20 UTC

This is fixed in https://invent.kde.org/plasma/plasma-desktop/-/commit/8e251dbce5dd95e43074acc8d43926ae8e004119

Is the intention to track releases of Noto? I'm happy to subscribe myself to Noto Emoji releases and make sure a PR gets sent as necessary, though I shouldn't be the only person tracking this.

Incidentally, the data appears to need to be updated already. This script says that the data was last generated for Emoji 14.0, and Noto has since released for 15.0. I can confirm that the emoijer is missing emoji from 15.0, e.g. the jellyfish.

Comment 14 Nate Graham 2023-06-13 21:17:37 UTC

That would be amazing, thanks. Even more amazing would be a bot that does that automatically, like the FlatHub bot that tracks upstream releases and submits PRs automatically.