Bug 98798 - Remap GB2312 and GBK encoding to GB18030
Summary: Remap GB2312 and GBK encoding to GB18030
Status: CONFIRMED
Alias: None
Product: konqueror
Classification: Applications
Component: khtml parsing (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR wishlist
Target Milestone: ---
Assignee: Konqueror Developers
URL:
Keywords:
: 223911 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-02-07 20:26 UTC by Funda Wang
Modified: 2012-01-13 07:24 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Funda Wang 2005-02-07 20:26:32 UTC
Version:            (using KDE Devel)
Installed from:    Compiled sources
OS:                Linux

As you may know, GB18030 is designed to provide a larget capacity of charset than GB2312 and GBK. i.e, GB18030 covers GBK, while GBK covers GB2312. So, there is no need providing all of them (GB2312, GBK and GB18030) in encoding menu.

Furthermore, there are lots of sites in China says they are using GB2312 encoding via HTTP header, while the content of pages are in GBK/GB18030 indeed. That is to say, there are some characters cannot be represented in GB2312 in those pages. Those characters will become squares in konqueror. But if I select GB18030 manually, all seems fine.

A good solution is to remap GB2312 and GBK encoding into GB18030 internally in khtml, and remove the menu items of GB2312/GBK from konqueror.
Comment 1 Funda Wang 2005-04-24 23:17:14 UTC
How is this bug going?

It seems that you'll need add some code like this into decoder.cpp:

if(!enc.isEmpty() && (enc == "gb2312" || enc == "gbk" ) ) 
{ 
    enc = "gb18030"; 
    setEncoding( enc.data(), true ); 
}
Comment 2 Funda Wang 2005-07-03 04:05:53 UTC
How is it going?
Comment 3 Funda Wang 2005-10-05 05:36:28 UTC
Is it possible getting into KDE 3.5?
Comment 4 Thiago Macieira 2005-10-05 05:44:02 UTC
Nicolas, any chance your recent changes to the charset names address this as well?

If they don't, I think it's ok to add this code, even though it's a special case.
Comment 5 Nicolas Goutte 2005-10-05 14:43:57 UTC
On Wednesday 05 October 2005 05:44, Thiago Macieira wrote:
> ------- You are receiving this mail because: -------

(...)
> Nicolas, any chance your recent changes to the charset names address this
> as well?
>
> If they don't, I think it's ok to add this code, even though it's a special
> case.


No, the changes were only the fix alias names. Here it is more replacing a 
enoding by another, as the encoding name is abused.

I think that fixing it in KHTML is better here, as in KCharsets somebody might 
really want to write something in an encoding, knowing that there is a 
difference between the two.

Have a nice day!
Comment 6 Funda Wang 2005-10-05 15:06:49 UTC
Personally, I do think GB2312 and GBK should be replaced by GB18030 everywhere, kate, amaroK, for instance. As there is no need maintaining GB2312 and GBK further, because GB18030 covers them perfectly. The same character will get same coding in GB2312, GBK and GB18030, as long as the correspoding encoding covers the character. And in fact, when we are talking about GB*, we usually mean the encoding, but rarely concern about the charset.

If it is possible, I think define GB2312 and GBK as the aliases of GB18030 KDE-wide is more preferred.
Comment 7 Nicolas Goutte 2005-10-05 15:57:27 UTC
On Wednesday 05 October 2005 15:06, Funda Wang wrote:
> ------- You are receiving this mail because: -------

(...)
>
> If it is possible, I think define GB2312 and GBK as the aliases of GB18030
> KDE-wide is more preferred.


Well, we depend on Qt. I do not know what Qt has defined as names currently 
for all these encodings. (But indeed if it is the same encoding in Qt, then 
we should not care.)

Have a nice day!
Comment 8 Nicolas Goutte 2005-10-05 16:13:51 UTC
On Wednesday 05 October 2005 15:57, Nicolas Goutte wrote:
[bugs.kde.org quoted mail]

I have looked in Qt3 and Qt4. In both versions, all three encodings are 
defined (and not as alias of each other).

The role of KCharsets is not really to change the behaviour of Qt's 
QTextCodec. (Apart the problem that there are KDE developers wanting the 
whole class out of KDE4.)

So if you really feel that three encodings are too many, perhaps you should 
try to discuss with Trolltech. As far as I understand the comment in the 
code, Trolltech keeps it for interaction with older software.

>
> Have a nice day!

Comment 9 LuRan 2005-11-04 05:07:01 UTC
IMHO KCharset problem should be treat as another bug, let's stick to the KHTML
problem, I think the substitution is harmless and almost trivial, and it will
help to render the page much better. Select gb18030 manually is not a perfect
solution, since it force the entire page to a single codec, while in many case
when we use frames, some frames use gb18030 charset, others use utf-8, (pages
have google ad for example).
Comment 10 Pace Sie 2006-10-07 00:19:06 UTC
I feel konqueror should map gb2312 to gb18030 by its own instead of waiting Qt to make the change. Many simplified chinese pages have traditional characters which cannot be displayed with the default konqueror encoding (gb2312). Being a chinese user myself it is very annoying. No other browsers that I use have this glitch.
Comment 11 Dawit Alemayehu 2012-01-13 07:24:43 UTC
*** Bug 223911 has been marked as a duplicate of this bug. ***