Summary: | Supplementary plane unicode characters don't load anymore. | ||
---|---|---|---|
Product: | [Applications] krita | Reporter: | wolthera <griffinvalley> |
Component: | File formats | Assignee: | Krita Bugs <krita-bugs-null> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | alvin, halla |
Priority: | NOR | ||
Version: | git master (please specify the git hash!) | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | https://invent.kde.org/graphics/krita/commit/b846a63f2aba98c774014b70bd943a26be1e643b | Version Fixed In: | |
Attachments: |
test file.
Supplementary characters working, though color emoji don't get drawn. Supplementary unicode characters getting stripped completely. |
Description
wolthera
2022-06-14 14:27:38 UTC
Created attachment 149686 [details]
Supplementary characters working, though color emoji don't get drawn.
Created attachment 149687 [details]
Supplementary unicode characters getting stripped completely.
Huh, that testfile doesn't declare an encoding. I though that was mandatory... The default for SVG is utf16, isn't it? Either way, we'd still have a problem: I copied the top of that from a Krita authored svg document, so all existing kra files have this problem. As far as I know it's utf-8 by default. If I add utf-8 as encoding, firefox still loads the file normally, so that's not the problem, I think. It can't be UTF-16 when all the ASCII chars in the file take only 1 byte each. I tried reverting 0db3b4718cdf2da4d3461906993716bf5eb6dc66 but it didn't fix the issue.
Just looking at the code, SVG import goes through QXmlInputSource. From its doc:
> This class recognizes the encoding of the data by reading the encoding
> declaration in the XML file if it finds one, and reading the data using
> the corresponding encoding. If it does not find an encoding declaration,
> then it assumes that the data is either in UTF-8 or UTF-16, depending on
> whether it can find a byte-order mark.
Checking QXmlInputSource::data() in SvgParser::createDocumentFromSvg, it does seem to read the non-BMP chars correctly (as two QChar as expected). These chars are lost after going through QDomDocument::setContent. I skimmed through the git log but I didn't notice any changes in caf9f20f..868c011a8 that might have cause this.
I found the commit that caused the regression: https://invent.kde.org/graphics/krita/-/commit/516d26e11ebb103e39e589a67bf33a236a1b6e53 Presumably QDomDocument did not try to fetch both QChar of a surrogate pair before deciding the char is invalid... Filed the bug report upstream: https://bugreports.qt.io/browse/QTBUG-104362 I guess we have to just revert that commit anyway. Git commit 4abe62d7a737673ad609ff593d04b33f4b6be636 by Alvin Wong. Committed on 15/06/2022 at 16:07. Pushed by alvinwong into branch 'master'. Revert "Drop characters that create invalid XML" This change breaks loading non-BMP chars from SVG files. This reverts commit 516d26e11ebb103e39e589a67bf33a236a1b6e53. M +0 -4 krita/main.cc https://invent.kde.org/graphics/krita/commit/4abe62d7a737673ad609ff593d04b33f4b6be636 Git commit b846a63f2aba98c774014b70bd943a26be1e643b by Alvin Wong. Committed on 15/06/2022 at 16:10. Pushed by alvinwong into branch 'krita/5.1'. Revert "Drop characters that create invalid XML" This change breaks loading non-BMP chars from SVG files. This reverts commit 516d26e11ebb103e39e589a67bf33a236a1b6e53. (cherry picked from commit 4abe62d7a737673ad609ff593d04b33f4b6be636) M +0 -4 krita/main.cc https://invent.kde.org/graphics/krita/commit/b846a63f2aba98c774014b70bd943a26be1e643b Thanks! That's really bizarre though... |