Bug 74190

Summary: Console Unicode support not 100% there
Product: [Applications] konsole Reporter: Chelsea Buchanan & Keith Briscoe <cheeth>
Component: generalAssignee: Konsole Developer <konsole-devel>
Status: RESOLVED FIXED    
Severity: normal CC: adaptee, hazelnusse, ismail, jnelson-kde, purpleposeidon
Priority: NOR    
Version: 1.3   
Target Milestone: ---   
Platform: RedHat Enterprise Linux   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Collection of Unicode test files
tibetan characters displayed in konsole

Description Chelsea Buchanan & Keith Briscoe 2004-02-05 03:38:08 UTC
Version:            (using KDE KDE 3.2.0)
Installed from:    RedHat RPMs
OS:          Linux

I just upgraded to KDE 3.2 and thought I'd re-test the Unicode support in Konsole.  It looks like Konsole cannot display characters from the following Unicode ranges, even if the system supports them:

0x0d80-0xdff (Sinhala)
0x10000-0x1007f (Linear B Syllabary)
0xf0000-0xffffd (Supplementary Private Use Area A)
It looks like 0x0600-0x06ff (Arabic) may also be broken, but I'm not sure since I don't know the language and it IS displaying SOMETHING.  It doesn't look right though.

My testing was not exhaustive, so there may be more bad ranges.  The following free TrueType fonts support these scripts:

Free Sans supports Sinhala and Arabic (http://savannah.nongnu.org/projects/freefont/)
Penuturesu supports Linear B Syllanary (http://www.i18nguy.com/unicode/unicode-font.html)
Code2001 uses the Supplementary PUA for Tengwar and Cirth support (http://home.att.net/~jameskass/code2001.htm)

On a positive note, Braille, Katakana, Tibetan, Syriac, and Runic all seem to work quite well.

I will attach some text files (UTF-8 encoded) along with some images of how I would expect them to look (more or less), using Mozilla as the reference.
Comment 1 Chelsea Buchanan & Keith Briscoe 2004-02-05 04:11:29 UTC
Created attachment 4517 [details]
Collection of Unicode test files

For each script, there is an HTML file, a UTF-8 encoded TXT file, and a
reference PNG image.

The PNG images were created by pointing Mozilla at the HTML files, and the HTML
files are nothing more than the TXT files with some extra code to tell the
browser that it's UTF-8.
Comment 2 Chelsea Buchanan & Keith Briscoe 2004-02-05 04:13:11 UTC
Okay, last comment for now: that's a ZIP archive I attached.
Comment 3 Ken Deeter 2004-04-09 23:54:40 UTC
This might be Qt related. Do these files work with things like kate or kedit?
Comment 4 Ismail Donmez 2004-04-10 17:08:09 UTC
Bash 2.05b release has 7 pending patches some of them for wide character input/output. Is your bash up to date?
Comment 5 Chelsea Buchanan & Keith Briscoe 2004-04-10 17:12:44 UTC
Dang, mid-air collision!  Kate and Kedit also fail.  There's also a Konqueror bug filed here: http://bugs.kde.org/show_bug.cgi?id=77348.  Could very well be a common Qt bug.  Lack of support for the Supplemental Private Use Area could indicate that Qt is limited to a two-byte datatype for wide characters, or it could just be a bug like the others.

Konsole/KWrite tend to function a little differently than Konqui because they force everything into monospace, so I filed separate bugs.  I'm using Bash 2.05b.0(1)-release (i386-redhat-linux-gnu).
Comment 6 Thiago Macieira 2005-01-07 17:07:19 UTC
Qt does not support characters outside the Basic Multilingual Plane. That means you're restricted to U+0000 to U+FFFF.
Comment 7 Chelsea Buchanan & Keith Briscoe 2005-01-07 18:14:33 UTC
Agreed, lack of support for anything above U+FFFF is a Qt bug.  I've already filed a bug with Trolltech to make QChar support 32 bits of data instead of 16.

This bug still applies to konsole for Sinhala and Arabic, however.  Could also be Qt's font substitution bug http://bugs.kde.org/show_bug.cgi?id=47682 but I doubt it because it still fails when you set Konsole's font to a font which contains these characters.
Comment 8 Thiago Macieira 2005-01-07 19:12:06 UTC
QString can support 21-bit Unicode chars via UTF-16 surrogate pairs. The problem with that is that QChar won't be able to handle single codepoints. And QString-to-UTF32 conversion will be more difficult.
Comment 9 Robert Knight 2006-08-07 17:08:40 UTC
Hello,

Display of unicode characters has improved quite a bit in Konsole for KDE 4, mainly because a major bug in character conversion was inadvertantly fixed :)

Unfortunately I cannot test the Sinhala test text file because I cannot get Sinhalese to display correctly in any Qt application.  Advice on how to do this would be helpful.
Comment 10 Chelsea Buchanan & Keith Briscoe 2006-08-08 07:02:32 UTC
All I know is that the Free Sans font has the right glyphs.  You can get it here: http://savannah.nongnu.org/projects/freefont/

Anything beyond that is beyond me!
Comment 11 Chelsea Buchanan & Keith Briscoe 2008-01-12 23:12:31 UTC
Okay, now testing with KDE 4.0.0 (openSUSE 10.3).  This is not a complete test of all Unicode ranges, just my little test ranges.

These ranges display nothing:
0x0f00-0x0fff Tibetan (KDE3 does NOT have a problem with this)

The following ranges appear to be fixed in KDE4:
0x10000-0x1007f (Linear B Syllabary)
0x0600-0x06ff (Arabic)

Either these ranges display with occasional problems or there's a problem with my test files:
0xf0000-0xffffd (Supplementary Private Use Area A)
0x0d80-0xdff (Sinhala)

I'll try to generate new test files to demonstrate the remaining problems.  My previous test Arabic text file appears screwed up, so it's possible I've got bigger problems on my end.
Comment 12 Chelsea Buchanan & Keith Briscoe 2008-01-14 06:48:16 UTC
Okay, my "occasional problems" were caused by fonts that were missing a glyph here and there.

So the status AFAIK is that everything I've tested works great, with the notable exception of Tibetan, which appears to be a regression from KDE3.

For testing purposes, a free font (utibetan.ttf) containing Tibetan glyphs can be downloaded here: http://www.wazu.jp/gallery/Fonts_Tibetan.html.  Tibetan is also in the Arial Unicode (arialuni) font, included with Microsoft Office.
Comment 13 Luke Peterson 2009-07-03 08:30:27 UTC
I'm using Konsole that ships with Kubuntu 9.04, and I can't get the Unicode combining diacritic "combining dot above" U+0307 to work.  For example, if I go into Python, I get:
luke@DELL-E1505:~/lib/python/sympy(master)$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print(u'a\u0307')
a
>>>

Which, at least on my computer is an a with no dot above it.  Other diacrits work though:
>>> print(u'a\u0352')>>>

I'm trying to use the dot above some letters for some symbolic mathematics where things like dx/dt would be represented more compactly by using the Newton notation for the derivative, i.e., an x with a dot above it.

I don't think it is an issue with the font I'm using, which is DejaVu Sans Mono, which is supposed to support this character.

xterm prints this just fine on my computer, with the exact same commands.
Comment 14 Jon Nelson 2010-06-17 23:32:18 UTC
Regarding comment 13 - see bug 96536.  konsole (as of 4.3.3) does not support NFD (combined) character display.
Comment 15 poseidon 2010-07-30 01:18:58 UTC
Some box drawing symbols are drawn as nothing, even with different fonts. (I've only tried the monospace ones that konsole shows in the profile dialog.) The glyphs it has trouble with are "╭╮╯╰╱╲╳". Konsole Version 2.4.2. All other KDE applications are able to draw those symbols. Might be related, might be a different bug.
Comment 16 Jekyll Wu 2011-07-29 11:06:42 UTC
(In reply to comment #15)
> Some box drawing symbols are drawn as nothing, even with different fonts. 

That problem would be bug #210329.
Comment 17 Jekyll Wu 2011-12-05 08:28:06 UTC
Created attachment 66388 [details]
tibetan characters displayed in konsole

(In reply to comment #12)
> So the status AFAIK is that everything I've tested works great, with the
> notable exception of Tibetan, which appears to be a regression from KDE3.

I just tested konsole 2.8 using some snippet from http://www.alanwood.net/unicode/tibetan.html. The result seems fine to me,  although I think some characters are calculated and displayed with wrong width. Maybe that width problem is related with bug 41744 and bug 186826.
Comment 18 Jekyll Wu 2011-12-07 08:19:06 UTC
Close it as FIXED in the sense that all specific problems mentioned in this report seems fixed now, not in the sense that Konsole now provides 100% unicode support.