Bug 243513

Summary: KCharsets does not support CP949 encoding.
Product: [Frameworks and Libraries] frameworks-kcodecs Reporter: darklin20
Component: generalAssignee: kdelibs bugs <kdelibs-bugs>
Severity: wishlist CC: hein, kde, simonandric5
Priority: NOR    
Version: 5.1.0   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: patch for KCharsets to support CP949
Improvement of patch in comment #2
Test for the patch

Description darklin20 2010-07-03 17:33:20 UTC
Version:           4.4 (using KDE 4.4.4) 
OS:                Linux

I reported this problem to konversation team.

But they said this is kdelib's problem, and with my short look into kcharsets.cpp, I think they're right.

It looks like KCharsets is a wrapper class to use functions of QTextCodec easily.

There's two encoding for Korean, EUC-KR and CP949, and QTextCodec of Qt4 supports both of them.

See the list in "Detailed Description" of next page.

CP949 is superset of EUC-KR so, some characters are only included in CP949.
Tha lack of CP949 support in KDE4 is especially crucial problem when transport some texts over network or read text file etc...

If KCharsets is a wrapper for QTextCodec, I think that it's possible for KDE to support CP949 encoding just by adding some strings for cp949 to the arrays contains the full list of supporting encodings like language_for_encoding_string or builtin_string.

Reproducible: Didn't try
Comment 1 Eike Hein 2010-07-03 19:28:26 UTC
*** Bug 243495 has been marked as a duplicate of this bug. ***
Comment 2 darklin20 2010-07-04 02:17:33 UTC
Created attachment 48574 [details]
patch for KCharsets to support CP949
Comment 3 darklin20 2010-07-04 02:18:57 UTC
I wrote a patch for this issue.(See attached file.)
What I did is just add "cp 949" to input data for language_for_encoding_string and builtin_string, and generate codes with generate_string_table.pl.
I tested within konversation and Kate, and it looks working fine.
Comment 4 Shinjo Park 2010-07-04 22:56:16 UTC
Right, CP949 is superset of EUC-KR, covering every 11172 composable characters. EUC-KR only covers 2350, just 1/5 of what we need.

This bug affects everywhere CP949 decoding is needed, including webpages in Konqueror. Judging from changelog, CP949 support is included since Qt 4.3.0. It looks too late to be included in KDE 4.5, I hope this will be included as soon as possible.
Comment 5 Christoph Feck 2010-07-05 04:09:24 UTC
If the patch in comment #2 is all there is to close this bug, then I am all for getting it into the KDE SC 4.5 branch.
Comment 6 darklin20 2010-07-05 11:07:30 UTC
(In reply to comment #5)
> If the patch in comment #2 is all there is to close this bug, then I am all for
> getting it into the KDE SC 4.5 branch.

The patch is only for KCharsets, and there might be other related parts which should be corrected also.
I'm not sure that the patch is enough to add an encoding support because I'm not a KDE developer (even not a KDE app. developer).
I think the kdelibs developers need to glimpse the patch at least, before they apply it.
Comment 7 Shinjo Park 2010-07-05 23:01:31 UTC
Created attachment 48613 [details]
Improvement of patch in comment #2

I checked for patch #2, and made some improvement. As I think just appending new encoding at the end of list is not so good, I moved CP949 to the next of EUC-KR in both string tables. No strings are added/modified, no need to break message freeze. I tested it with myself, at least it works for me.
Comment 8 Shinjo Park 2010-07-05 23:05:46 UTC
Created attachment 48614 [details]
Test for the patch

This test program checks for "cp 949" in the KCharsets' encoding list, and try to create QTextCodec with all possible aliases, en/decode string in CP949 encoeding. Without patch in comment #5, the test will fail in step 1. With the patch, all steps wiil be passed.
Comment 9 Shinjo Park 2010-11-09 18:16:43 UTC
Any updates since KDE SC 4.5?
Comment 10 darklin20 2013-12-01 06:17:28 UTC
Nobody cares? KDE still lacks CP949 support.
Comment 11 Eike Hein 2015-01-15 19:05:24 UTC
Well, I care now (you guessed it, HanIRC). I'll investigate a bit.
Comment 12 Eike Hein 2015-01-15 19:14:20 UTC
Looks like this patch never made it in, indeed.

박신조씨, do you want to resurrect your patch and resubmit it via http://git.reviewboard.kde.org/? I'll make sure we get it on the way.
Comment 13 Eike Hein 2015-01-17 17:33:48 UTC
Git commit 7d4ca9a42be6f4361bef31a3f32644be7bdbf8c3 by Eike Hein, on behalf of Park Shinjo.
Committed on 17/01/2015 at 17:32.
Pushed by hein into branch 'KDE/4.14'.

Add support for CP949 to KCharsets.

M  +21   -14   kdecore/localization/kcharsets.cpp

Comment 14 Eike Hein 2015-01-17 17:34:47 UTC
After testing, I've now pushed attachment #2 [details] to kdelibs, with attribution to Park Shinjo. I've also created a new patch for the kcodecs framework with included test coverage that I will follow next.
Comment 15 Eike Hein 2015-01-17 17:36:46 UTC
Git commit 3f3e074237c52b49e06083a38ab6624a541e63d1 by Eike Hein.
Committed on 17/01/2015 at 17:34.
Pushed by hein into branch 'master'.

Add support for CP949 to KCharsets.

This is a recreation of a patch written originally for kdelibs
by Park Shinjo, which entered kdelibs.git as 7d4ca9a4 prior to
this. The tables in this version were regenerated clean and
test coverage was added and tests done with both ICU and non-ICU
builds of Qt.

M  +2    -0    autotests/kcharsetstest.cpp
M  +21   -14   src/kcharsets.cpp

Comment 16 Eike Hein 2015-01-17 22:52:37 UTC
Git commit f9a5d5cb8577d7afd013f6f345224bf8658941c4 by Eike Hein.
Committed on 17/01/2015 at 22:45.
Pushed by hein into branch 'master'.

Improve the mapping to Qt 5's codec names.

This fixes the test failure in the ICU build on CI.

M  +0    -1    autotests/kcharsetstest.cpp
M  +10   -10   src/kcharsets.cpp

Comment 17 Shinjo Park 2015-01-18 07:00:46 UTC
I was about to check out both KDE4 and KF5,  감사합니다 ;) Please also indicate original author of the patch if possible, as my role was revising it.
Comment 18 Eike Hein 2015-01-18 15:10:16 UTC
Probably good for completeness' sake: Basically I tested the KF5 version of Konversation against both ICU-enabled and ICU-disabled builds of Qt 5 to verify, with the testcase of sending and receiving "잌" which isn't covered by EUC-KR. With an ICU build, this becomes mostly a GUI improvement - even just selecting EUC-KR will cause Qt to use QIcuCodec for encode/decode, which will transparently use ICU's windows-949 implementation. With ICU disabled, Qt's own CP949 implementation needs to be used and has to be specifically requested, that's when this addition is the biggest win. "windows-949" is the name it understands in both modes, that's why it's the best alias.

In Qt 4 on the other hand it's just known as "cp949" and QTextCodec::codecForName() actually understands both "cp949" and "cp 949" (and so does the KCharsets wrapper now). Testing the KDE 4 version of Konversation worked, too.

As for attribution ... darklin20 not setting a real name here made this a bit tricky due to our real name policy for contributions. But the commits all reference this ticket, and since the code is generated and the input data differs in your versions I think it's OK to call this two different patches. Nonetheless, many thanks to darklin20  for getting the ball rolling and I'm truly sorry it took so long for us to respond.
Comment 19 darklin20 2015-01-18 15:29:21 UTC
I don't really care about credit. I just want to use Konversation as my default IRC client. Thank you so much.