Summary: | Broken interpretation of Unicode XML entities beyond BMP (SMP) | ||
---|---|---|---|
Product: | [Applications] konqueror | Reporter: | MD <angasule> |
Component: | khtml | Assignee: | Konqueror Developers <konq-bugs> |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | kevin.kofler |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Debian testing | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: |
Description
MD
2009-12-16 17:04:12 UTC
I believe the problem is in the file khtml/htmltokenizer.cpp , where this code is found: case Hexadecimal: { int uc = EntityChar.unicode(); int ll = qMin<uint>(src.length(), 8); while(ll--) { QChar csrc(src->toLower()); cc = csrc.cell(); if(csrc.row() || !((cc >= '0' && cc <= '9') || (cc >= 'a' && cc <= 'f'))) { break; } uc = uc*16 + (cc - ( cc < 'a' ? '0' : 'a' - 10)); cBuffer[cBufferPos++] = cc; ++src; } EntityChar = QChar(uc); Entity = SearchSemicolon; break; } case Decimal: { int uc = EntityChar.unicode(); int ll = qMin(src.length(), 9-cBufferPos); while(ll--) { cc = src->cell(); if(src->row() || !(cc >= '0' && cc <= '9')) { Entity = SearchSemicolon; break; } uc = uc * 10 + (cc - '0'); cBuffer[cBufferPos++] = cc; ++src; } EntityChar = QChar(uc); if(cBufferPos == 9) Entity = SearchSemicolon; break; } I think this code should generate two QChar in the case of unicode codepoints not in the Basic Multilingual Plane. Furthermore, I believe uc should be an unsigned int. I can confirm this. Testcase: http://www.yaronet.com/posts.php?s=130411 (should show a reversed B, assuming you have a font which covers Deseret, such as G. Douros's Analecta (gdouros-analecta-fonts in Fedora)). (No, I'm not interested in Mormon liturgy at all, I just picked that character because it showed up in kernel.org's April Fools joke. ;-) ) Thank you for the bug report. As this report hasn't seen any changes in 10 years or more, we ask if you can please confirm that the issue still persists. If this bug is no longer persisting or relevant please change the status to resolved. |