KDE Bug Tracking System
Home
Report New Wish or Bug
Query Existing Reports
First
Last
Prev
Next
No search results available
Search page
Bug
82018
:
Extended unicode characters are not converted t...
P
roduct
:
konqueror
Co
m
ponent
:
general
Status
:
RESOLVED
Resolution
:
FIXED
Target
:
---
Version
:
unspecified
Pr
i
ority
:
NOR
Severity
:
wishlist
V
otes
:
20
Description
:
Opened:
2004-05-22 20:59
Last Changed:
2004-10-17 13:43:25
Version: (using KDE KDE 3.2.1) Installed from: Gentoo Packages Compiler: gcc 3.3 CXXFLAGS=CFLAGS=-O3 -march=athlon-xp OS: Linux This is quite important to me, because I need full unicode support for daily computer work; especially the case with German umlauts and Japanese characters on the same webpage/in the same text. When posting into a forum today, this was exactly what I had. Unfortunately though, the resulting text contained the KDE-typical question marks instead of the Kanji I entered. A quick investigation revealed that the encoding of the HTML file containing the post form was set to ISO-8859-1 (http-equiv tag). This led to the conclusion that Konqueror/KHTML interprets all text entered by the user must be in the same encoding as the page. Even forcing the page to use UTF-8 (via View->Set Encoding) had no effect. When I tried the same with Mozilla 1.6, it silently converted the non-iso8859-text into unicode entities as it should (Mozilla was set to UTF-8 mode explicitly). I assume this feature is available in other browsers too (haven't tried yet though), so I think you might consider adding into Konquerorvery soon.
Comment
#1
Egmont Koblinger 2004-09-05 11:33:02
I can second this request, and I think it's rather a severe bug than just a pure wish. At least mozilla and opera convert the accents that do not fit into the charset of the page containing the form to unicode entities, so e.g. if the page is iso-8859-1 then an euro-sign becomes "€" (which, in case of a GET submit type, gets even further escaped to %26%238364%3B in the URL). This way no information is lost, the user will most likely see exactly the same character he entered on the resulting page (unless the server explicitely does something for things to go bad). The current konqueror behavior leads to data loss without notifying the user, and furthermore assumes some character set knowledge from the user, which it shouldn't, users shouldn't need to know what iso-8859-1 or unicode or utf-8 are, they should only see things working properly.
Comment
#2
Egmont Koblinger 2004-09-05 11:33:40
Sorry, forgot to say, I have KDE 3.3.
Comment
#3
Gregor Riepl 2004-09-05 14:42:03
Wow, someone saw that bug report at last! I hope I can point the KDE developers' attention to it, so I we won't have to wait much longer for a fix.
Comment
#4
Allan Sandfeld 2004-10-11 15:12:46
Created an attachment (id=7837)
[details]
Patch for ampersand-escaping characters The attached patch should be applied to khtml/html. I have not fully tested it, but try and see if it helps.
Comment
#5
Egmont Koblinger 2004-10-11 16:23:50
Works nice for me, thanks. Just a small question... I'm not familiar with the Unicode handling of Qt, but .unicode() returning an unsigned short really shocked me, as Unicode definitely has characters above 65536. Is there anything known how this will be handled in future versions of Qt? Will a new function be introduced, or the return value of .unicode() extended to unsigned long? In the latter case, I'd recommend to use ampersandEscape.sprintf("&#%lu;", (unsigned long)(c.unicode())); instead of ampersandEscape.sprintf("&#%hu;", c.unicode()); it doesn't hurt, but works better if Qt changes to ucs4.
Comment
#6
Allan Sandfeld 2004-10-16 15:53:26
CVS commit by carewolf: Ampersand-escape otherwise unencodable characters. Matches Gecko behavior. FEATURE:82018 M +6 -2 ChangeLog 1.303 M +25 -7 html/html_formimpl.cpp 1.387 [POSSIBLY UNSAFE: printf] --- kdelibs/khtml/ChangeLog #1.302:1.303 @@ -1,2 +1,6 @@ +2004-10-16 Allan Sandfeld Jensen <
kde@carewolf.com
> + * html/html_formimpl.cpp: Escape otherwise unencodable characters. + Matches the behavior of Gecko. + 2004-10-15 Stephan Kulow <
coolo@kde.org
> @@ -4,5 +8,5 @@ got items when we calculate a height for items (#87466) - * css/html4.css: changing default horizontal margins for H1-H6 from + * css/html4.css: changing default horizontal margins for H1-H6 from auto to 0 (#91327) @@ -33,5 +37,5 @@ * rendering/render_block.cpp (layoutBlockChildren): simpler implementation for compact display: do not insert the - compact child within the next block anymore. + compact child within the next block anymore. Solves lot of problems with host blocks having non-inline children. --- kdelibs/khtml/html/html_formimpl.cpp #1.386:1.387 @@ -174,7 +174,25 @@ static QCString encodeCString(const QCSt } +// ### This function only encodes to numeric ampersand escapes, +// ### we could use standard ampersand values as well. +inline static QString escapeUnencodeable(const QTextCodec* codec, const QString& s) { + QString enc_string; + int len = s.length(); + for(int i=0; i <len; i++) { + QChar c = s[i]; + if (codec->canEncode(c)) + enc_string.append(c); + else { + QString ampersandEscape; + ampersandEscape.sprintf("&#%u;", c.unicode()); + enc_string.append(ampersandEscape); + } + } + return enc_string; +} + inline static QCString fixUpfromUnicode(const QTextCodec* codec, const QString& s) { - QCString str = codec->fromUnicode(s); + QCString str = codec->fromUnicode(escapeUnencodeable(codec,s)); str.truncate(str.length()); return str;
Comment
#7
Gregor Riepl 2004-10-17 13:43:25
Thanks a lot for the fix! I'll try it on the next KDE upgrade.
P
latform
:
Gentoo Packages
O
S
:
Linux
K
eywords
:
People
Reporter
:
Gregor Riepl
Assigned To
:
Konqueror Developers
CC
:
egmont gmail com
Related actions
View Bug Activity
Format For Printing
XML
Clone This Bug
Note
You need to
log in
before you can comment on or make changes to this bug.
Attachments
Patch for ampersand-escaping characters
(2.83 KB, patch)
2004-10-11 15:12
,
Allan Sandfeld
Details
View All
Add an attachment
(proposed patch, testcase, etc.)
Depends on
:
B
locks
:
Show dependency tree
-
Show dependency graph
First
Last
Prev
Next
No search results available
Search page
Actions
Reports
Requests
Reports
Bugs reported today
Bugs reported in the last 3 days
Bug reports with patches
Weekly Bug statistics
The most hated bugs
The most severe bugs
The most frequently reported bugs
The most wanted features
Junior Jobs
Report ownership counts and charts
My Account
New Account
Log In