Bug 273535 - Konsole removes some important Unicode characters from input, such as ZWNJ
Summary: Konsole removes some important Unicode characters from input, such as ZWNJ
Status: RESOLVED UNMAINTAINED
Alias: None
Product: konsole
Classification: Applications
Component: font (show other bugs)
Version: 2.8.3
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Konsole Developer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-18 07:35 UTC by Ebrahim Mohammadi
Modified: 2017-02-13 03:21 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ebrahim Mohammadi 2011-05-18 07:35:03 UTC
Version:           unspecified (using KDE 4.6.2) 
OS:                Linux

Konsole removes some important Unicode characters such as ZWNJ (Zero-Width Non-Joiner) from input.

Reproducible: Always

Steps to Reproduce:
Try pasting the following string which includes a ZWNJ into a Konsole terminal. You may want to enable bidirectional rendering option of Konsole to render it correctly.

نیم‌فاصله

Actual Results:  
نیمفاصله

Expected Results:  
نیم‌فاصله
Comment 1 Ebrahim Mohammadi 2011-05-18 07:42:07 UTC
Another important removed character is U+0654.
Comment 2 Jekyll Wu 2012-04-26 03:04:54 UTC
"The zero-width non-joiner (ZWNJ) is a non-printing character used in the computerization of writing systems that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively. This is also an effect of a space character, but a ZWNJ is used when it is desirable to keep the words closer together."  

So a ZWNJ is meant to increase the space between two letters. That does not go well with the columns x rows model used by konsole, if I understand it correctly.

Below is the code where ZWNJ is currently simply ignored (the category for ZWNJ is QChar::Other_Format)

void Screen::displayCharacter(unsigned short c)
{
    int w = konsole_wcwidth(c);
    if (w < 0)
        return;
    else if (w == 0) {
        if (QChar(c).category() != QChar::Mark_NonSpacing)
            return;
   ....
}

No idea about what U+0654 is meant for. Ebrahim Mohammadi, could you provide a test case? And it is better to re-check those problems using KDE SC newer than 4.8 (or just konsole newer than 2.8.0)
Comment 3 Kurt Hindenburg 2017-02-13 03:21:23 UTC
Please open a new ticket if this is still an issue w/ a recent KDE5 version