Summary: | Remap GB2312 and GBK encoding to GB18030 | ||
---|---|---|---|
Product: | [Applications] konqueror | Reporter: | Funda Wang <fundawang> |
Component: | khtml parsing | Assignee: | Konqueror Developers <konq-bugs> |
Status: | CONFIRMED --- | ||
Severity: | wishlist | CC: | nicolasg, secludedsage |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: |
Description
Funda Wang
2005-02-07 20:26:32 UTC
How is this bug going? It seems that you'll need add some code like this into decoder.cpp: if(!enc.isEmpty() && (enc == "gb2312" || enc == "gbk" ) ) { enc = "gb18030"; setEncoding( enc.data(), true ); } How is it going? Is it possible getting into KDE 3.5? Nicolas, any chance your recent changes to the charset names address this as well? If they don't, I think it's ok to add this code, even though it's a special case. On Wednesday 05 October 2005 05:44, Thiago Macieira wrote: > ------- You are receiving this mail because: ------- (...) > Nicolas, any chance your recent changes to the charset names address this > as well? > > If they don't, I think it's ok to add this code, even though it's a special > case. No, the changes were only the fix alias names. Here it is more replacing a enoding by another, as the encoding name is abused. I think that fixing it in KHTML is better here, as in KCharsets somebody might really want to write something in an encoding, knowing that there is a difference between the two. Have a nice day! Personally, I do think GB2312 and GBK should be replaced by GB18030 everywhere, kate, amaroK, for instance. As there is no need maintaining GB2312 and GBK further, because GB18030 covers them perfectly. The same character will get same coding in GB2312, GBK and GB18030, as long as the correspoding encoding covers the character. And in fact, when we are talking about GB*, we usually mean the encoding, but rarely concern about the charset. If it is possible, I think define GB2312 and GBK as the aliases of GB18030 KDE-wide is more preferred. On Wednesday 05 October 2005 15:06, Funda Wang wrote: > ------- You are receiving this mail because: ------- (...) > > If it is possible, I think define GB2312 and GBK as the aliases of GB18030 > KDE-wide is more preferred. Well, we depend on Qt. I do not know what Qt has defined as names currently for all these encodings. (But indeed if it is the same encoding in Qt, then we should not care.) Have a nice day! On Wednesday 05 October 2005 15:57, Nicolas Goutte wrote:
[bugs.kde.org quoted mail]
I have looked in Qt3 and Qt4. In both versions, all three encodings are
defined (and not as alias of each other).
The role of KCharsets is not really to change the behaviour of Qt's
QTextCodec. (Apart the problem that there are KDE developers wanting the
whole class out of KDE4.)
So if you really feel that three encodings are too many, perhaps you should
try to discuss with Trolltech. As far as I understand the comment in the
code, Trolltech keeps it for interaction with older software.
>
> Have a nice day!
IMHO KCharset problem should be treat as another bug, let's stick to the KHTML problem, I think the substitution is harmless and almost trivial, and it will help to render the page much better. Select gb18030 manually is not a perfect solution, since it force the entire page to a single codec, while in many case when we use frames, some frames use gb18030 charset, others use utf-8, (pages have google ad for example). I feel konqueror should map gb2312 to gb18030 by its own instead of waiting Qt to make the change. Many simplified chinese pages have traditional characters which cannot be displayed with the default konqueror encoding (gb2312). Being a chinese user myself it is very annoying. No other browsers that I use have this glitch. *** Bug 223911 has been marked as a duplicate of this bug. *** |