| Summary: | Recoded data in multipart/form-data | ||
|---|---|---|---|
| Product: | [Applications] konqueror | Reporter: | Tim Landscheidt <tim> |
| Component: | khtml | Assignee: | Konqueror Bugs <konqueror-bugs-null> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | chrislb, finex, maarten, maksim |
| Priority: | NOR | ||
| Version First Reported In: | unspecified | ||
| Target Milestone: | --- | ||
| Platform: | Fedora RPMs | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
|
Description
Tim Landscheidt
2007-12-15 21:01:50 UTC
Well, submitting a bug about Konqueror with Konqueror seems to be a little problem :-). Please check the mentioned link for the raw data. confirmed on konqueror 4 (trunk r797319) Still a bug in 4.2. Sorry, but I don't see the preview button there? Sorry, that is due to the circumstance that the article cannot be edited by anonymous users. Please try <URI:http://de.wikipedia.org/w/index.php?title=Gotische_Sprache&action=edit> instead. The button is labelled "Änderungen zeigen" ("Show changes"). Thanks.. What seems to happen is the following: that text is outside the Basic Multilingual Plane, so in UTF-16 it gets represented as a surrogate pair... Then, when we're serializing it out into utf-8, the encoding of the first half of the pair succeeds, and of the second fails, so it gets escaped into stuff like � ... and then because there is only half of a pair followed by & it (probably) gets swallowed up by the encoding pass. The following may be one approach, but I want to consult w/some people before committing it.. Hmm, do we even need to do the escaping pass for a unicode codec like utf-8 in the first place?
--- html/html_formimpl.cpp (revision 925164)
+++ html/html_formimpl.cpp (working copy)
@@ -201,11 +201,17 @@
inline static QString escapeUnencodeable(const QTextCodec* codec, const QString& s) {
QString enc_string;
const int len = s.length();
+
+ // Workaround below: the utf8 codec reports it can't encode the second half of a surrogate
+ // pair, so we need to force-feed it to it
+ // ### this may not quite right if it's malformed, though
+ bool utf8 = (codec->mibEnum() == 106);
+
for(int i=0; i <len; ++i) {
const QChar c = s[i];
- if (codec->canEncode(c))
+ if (codec->canEncode(c) || (utf8 && 0xDC00 <= c && c <= 0xDFFF)) {
enc_string.append(c);
- else {
+ } else {
QString ampersandEscape;
ampersandEscape.sprintf("&#%u;", c.unicode());
enc_string.append(ampersandEscape);
SVN commit 929608 by orlovich: Make sure we properly group surrogate pairs when letting the codec check whether it can encode them or not, so we don't mess up non-BMP characters (which can show up on wikipedia, at least) BUG: 154142 M +36 -12 html_formimpl.cpp WebSVN link: http://websvn.kde.org/?view=rev&revision=929608 SVN commit 929611 by orlovich: Merged revision 929608: Make sure we properly group surrogate pairs when letting the codec check whether it can encode them or not, so we don't mess up non-BMP characters (which can show up on wikipedia, at least) BUG: 154142 M +36 -12 html_formimpl.cpp WebSVN link: http://websvn.kde.org/?view=rev&revision=929611 *** Bug 180416 has been marked as a duplicate of this bug. *** |