Bug 13063 - BUG:konsole cannot deal with CJK correctly.
Summary: BUG:konsole cannot deal with CJK correctly.
Status: CLOSED FIXED
Alias: None
Product: konsole
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: unspecified Other
: NOR normal
Target Milestone: ---
Assignee: Konsole Developer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2000-10-17 00:03 UTC by Lars Doelle
Modified: 2003-07-09 03:29 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
Some random Tibetan characters (UTF-8) (43 bytes, text/plain)
2003-07-08 03:40 UTC, Chelsea Buchanan & Keith Briscoe
Details
Snapshot of the Tibetan characters in KWrite (27.43 KB, image/jpeg)
2003-07-09 01:57 UTC, Thiago Macieira
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lars Doelle 2000-10-17 00:01:11 UTC
(*** This bug was imported into bugs.kde.org ***)

On Thursday 12 October 2000 05:01 yangbt@legend.com.cn wrote:

> When I selected a iso10646-1 font which includes CJK glyph the CJK
> characters can be show in konsole rightly  but the ASCII characters are
> shown doouble width as they should be.(So "root" is shown as "r o o t"). 

Yang

konsole is using a fixed width grid so that characters are placed properly 
under each other. A likely problem appears when one uses a variable width 
font.

Now that you try to use CJK in the konsole i'd love to learn how it should 
be. I'd be glad if you can explain it to me or make contact with anyone who 
uses CJK regularly on an X terminal.

Regards Lars
Comment 1 Chelsea Buchanan & Keith Briscoe 2003-07-08 03:32:45 UTC
I wouldn't say I'm a regular CJK user, and I'm not sure what the "proper
behavior" is, if it is even defined, but I've noticed this on konsole as well.

My LANG=en_US.UTF-8.  Konsole works great with Latin-X characters, and even
Katakana, but when I try to display CJK ideographs or Tibetan characters, extra
spacing is inserted horizontally between the characters.

I really REALLY am no expert in this, but since KWord doesn't insert spacing, I
don't believe Konsole should (okay, I know Konsole has the grid to contend with).

Anyway, the function wcwidth() can be used to determine how many columns a wide
character takes up when displayed on the console.  I figure it should go like this:

ABCD??EFG

Where "??" is a single CJK ideograph that has a wcwidth of 2.

Right now, it looks more like ABCD   ?? EFG

And this is the interesting part, when you SELECT CJK text, the highlighted
characters are left-shifted so that they take up what I would consider to be the
correct amount of columns.  So if you can figure out what's going on there, it
may already be coded!

I'll try to create an attachment of some random wide characters for you to play
with.
Comment 2 Chelsea Buchanan & Keith Briscoe 2003-07-08 03:40:16 UTC
Created attachment 1968 [details]
Some random Tibetan characters (UTF-8)

These Tibetan characters should take only one or two columns, but they appear
to take four apiece.

Also of note: Kwrite also displays the extra spaces, KWord does not.  So this
may be an issue with all monospacing.  It doesn't explain why konsole behaves
differently when selecting text, however.  Selecting text in KWrite does not
behave this way.
Comment 3 Waldo Bastian 2003-07-08 10:51:18 UTC
Which version of KDE are you using? 
 
 
Comment 4 Chelsea Buchanan & Keith Briscoe 2003-07-08 16:43:17 UTC
Current released version (I'm at work now--can't check).  I've seen this on RedHat and SuSE, both out-of-the-box and updated-to-current.

I thought about this last night, and I think I have an explanation of why this is happening: When KDE is determining how much room a character takes up using a monospaced font, it is setting aside width based on the number of BYTES comprise the character.  So since I'm using UTF-8, one-byte single-width characters look okay, and two-byte double-width characters also look okay.  But four byte characters are always wrong.  The original reporter was using an encoding where all characters are two bytes (I think) and that's why all characters took two columns.

If this is right, this is actually a pretty serious bug in our Unicode support.  Search bugs.kde.org for comments containing "wcwidth" and you'll see another bug in KMail where KMail wraps Japanese messages incorrectly because it's not getting the character width right.  If you consider that, with combining diacritical marks, a single character can actually be three or four wide characters, which can each be up to four bytes long, this could get ugly!

Then again, maybe I'm out of my depth.  I'm very new to Unicode programming frankly.
Comment 5 Waldo Bastian 2003-07-08 17:12:39 UTC
konsole reserves space based on the result of wcwidth for the given 
unicode-character. The original encoding of the character plays no role 
whatsoever. 
 
Konsole's CJK handling was broken in KDE 3.1.2 but should be better again in 
the upcoming KDE 3.1.3. So the problems that you experience may be due to 
that, or they may be caused by some other, probably font-related, problem. 
 
How many characters are their supposed to be in the attachment that you 
created?  
 
Comment 6 Chelsea Buchanan & Keith Briscoe 2003-07-08 21:40:32 UTC
The attachment should contain 14 Tibetan characters.

I'd be interested to see how KWrite displays the file in KDE 3.1.3-CVS.  If it
looks okay on your end (and if you'd prefer if I chose a slightly more common
language, let me know what multibyte languages you can display) I'll consider
that good enough to wait and see how KDE 3.1.3 works out for me.

Also, if the text displays fine for me in KWord (even using the same fixed-width
font), is it safe to assume the font is okay?
Comment 7 Thiago Macieira 2003-07-09 01:57:11 UTC
Created attachment 1973 [details]
Snapshot of the Tibetan characters in KWrite

Both in Konsole and in KWrite (which use fixed-width fonts), I see 14
single-spaced characters.
Comment 8 Chelsea Buchanan & Keith Briscoe 2003-07-09 03:29:45 UTC
Beautiful!  Thank you all, I will consider this issue fixed in CVS and will stop
bugging you (no pun intended).