Bug 455255

Summary: Supplementary plane unicode characters don't load anymore.
Product: [Applications] krita Reporter: wolthera <griffinvalley>
Component: File formatsAssignee: Krita Bugs <krita-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: alvin, halla
Priority: NOR    
Version: git master (please specify the git hash!)   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: test file.
Supplementary characters working, though color emoji don't get drawn.
Supplementary unicode characters getting stripped completely.

Description wolthera 2022-06-14 14:27:38 UTC
Created attachment 149685 [details]
test file.

SUMMARY
Noticed this first in the text branch, but generally supplementary plane characters don't seem to get loaded anymore.

STEPS TO REPRODUCE
1. Load the attached SVG into Krita.
2. Observe whether or not the characters are loaded, and whether or not they are present in the SVG code text editor.
3. Try to add supplementary characters into the SVG text editor. For example, by using the Emoji Selector KDE application.

OBSERVED RESULT
Supplementary plane characters seem to get stripped upon loading the text into Krita.

EXPECTED RESULT
They should be left alone.

I tested with krita-5.1.0-prealpha-61295bd-x86_64.appimage where it still works, and krita-5.2.0-prealpha-d2b57f1f17-x86_64.appimage where it doesn't work. There was some brief suggestions during the meeting it may have to do with this: https://invent.kde.org/graphics/krita/-/commit/0db3b4718cdf2da4d3461906993716bf5eb6dc66
Comment 1 wolthera 2022-06-14 14:28:35 UTC
Created attachment 149686 [details]
Supplementary characters working, though color emoji don't get drawn.
Comment 2 wolthera 2022-06-14 14:29:06 UTC
Created attachment 149687 [details]
Supplementary unicode characters getting stripped completely.
Comment 3 Halla Rempt 2022-06-15 08:38:44 UTC
Huh, that testfile doesn't declare an encoding. I though that was mandatory...
Comment 4 wolthera 2022-06-15 09:25:17 UTC
The default for SVG is utf16, isn't it? Either way, we'd still have a problem: I copied the top of that from a Krita authored svg document, so all existing kra files have this problem.
Comment 5 Halla Rempt 2022-06-15 09:32:48 UTC
As far as I know it's utf-8 by default.  If I add utf-8 as encoding, firefox still loads the file normally, so that's not the problem, I think.
Comment 6 Alvin Wong 2022-06-15 09:35:12 UTC
It can't be UTF-16 when all the ASCII chars in the file take only 1 byte each.
Comment 7 Alvin Wong 2022-06-15 15:18:35 UTC
I tried reverting 0db3b4718cdf2da4d3461906993716bf5eb6dc66 but it didn't fix the issue.

Just looking at the code, SVG import goes through QXmlInputSource. From its doc:

> This class recognizes the encoding of the data by reading the encoding
> declaration in the XML file if it finds one, and reading the data using
> the corresponding encoding. If it does not find an encoding declaration,
> then it assumes that the data is either in UTF-8 or UTF-16, depending on
> whether it can find a byte-order mark.

Checking QXmlInputSource::data() in SvgParser::createDocumentFromSvg, it does seem to read the non-BMP chars correctly (as two QChar as expected). These chars are lost after going through QDomDocument::setContent. I skimmed through the git log but I didn't notice any changes in caf9f20f..868c011a8 that might have cause this.
Comment 8 Alvin Wong 2022-06-15 15:26:34 UTC
I found the commit that caused the regression: https://invent.kde.org/graphics/krita/-/commit/516d26e11ebb103e39e589a67bf33a236a1b6e53

Presumably QDomDocument did not try to fetch both QChar of a surrogate pair before deciding the char is invalid...
Comment 9 Alvin Wong 2022-06-15 16:02:50 UTC
Filed the bug report upstream: https://bugreports.qt.io/browse/QTBUG-104362

I guess we have to just revert that commit anyway.
Comment 10 Alvin Wong 2022-06-15 16:09:37 UTC
Git commit 4abe62d7a737673ad609ff593d04b33f4b6be636 by Alvin Wong.
Committed on 15/06/2022 at 16:07.
Pushed by alvinwong into branch 'master'.

Revert "Drop characters that create invalid XML"

This change breaks loading non-BMP chars from SVG files.

This reverts commit 516d26e11ebb103e39e589a67bf33a236a1b6e53.

M  +0    -4    krita/main.cc

https://invent.kde.org/graphics/krita/commit/4abe62d7a737673ad609ff593d04b33f4b6be636
Comment 11 Alvin Wong 2022-06-15 16:11:00 UTC
Git commit b846a63f2aba98c774014b70bd943a26be1e643b by Alvin Wong.
Committed on 15/06/2022 at 16:10.
Pushed by alvinwong into branch 'krita/5.1'.

Revert "Drop characters that create invalid XML"

This change breaks loading non-BMP chars from SVG files.

This reverts commit 516d26e11ebb103e39e589a67bf33a236a1b6e53.
(cherry picked from commit 4abe62d7a737673ad609ff593d04b33f4b6be636)

M  +0    -4    krita/main.cc

https://invent.kde.org/graphics/krita/commit/b846a63f2aba98c774014b70bd943a26be1e643b
Comment 12 wolthera 2022-06-15 19:33:30 UTC
Thanks! That's really bizarre though...