Bug 83278 - If I cut -b1 arabic_file in konsole, the output displayed in screen is not correct
Summary: If I cut -b1 arabic_file in konsole, the output displayed in screen is not co...
Status: RESOLVED FIXED
Alias: None
Product: konsole
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Platform: Unlisted Binaries Linux
: NOR normal with 40 votes (vote)
Target Milestone: ---
Assignee: Konsole Developer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-06-12 21:42 UTC by Munzir Taha
Modified: 2006-08-07 05:43 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
a sample arabic file (12 bytes, text/plain)
2004-12-20 14:41 UTC, Munzir Taha
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Munzir Taha 2004-06-12 21:42:54 UTC
Version:            (using KDE KDE 3.2.2)
Installed from:    Unspecified Linux
OS:                Linux

[mimo@localhost mimo]$ cat arabic_file
ع
ر
ب
ي
١
٢
٣
٤

[mimo@localhost mimo]$ cut -b1 arabic_file
�
 �
  �
   �
    �
     �
      �
       �
        [mimo@localhost mimo]$

The problem is in the leading spaces. I can understand that the squares mean non-valid bytes but I can't understand why there are leading spaces!!

This may also proves useful:
[mimo@localhost mimo]$ hexdump arabic_file
0000000 b9d8 d80a 0ab1 a8d8 d90a 0a8a a1d9 d90a
0000010 0aa2 a3d9 d90a 0aa4
0000018

[mimo@localhost mimo]$ cut -b1 arabic_file |hexdump
0000000 0ad8 0ad8 0ad8 0ad9 0ad9 0ad9 0ad9 0ad9
0000010
Comment 1 Waldo Bastian 2004-06-14 11:56:15 UTC
Which encoding is this?
Comment 2 Munzir Taha 2004-06-14 19:16:51 UTC
The original arabic_file file is utf-8
$ file arabic_file
arabic_file: UTF-8 Unicode text

But since an Arabic character in utf-8 is two bytes, cutting the first byte won't generate something useful. Squares may mean not a valid utf-8 sequence but no leading spaces should be embedded anyway.
Comment 3 Brian Beck 2004-11-21 17:50:22 UTC
Just curious, does this behavior occur using xterm or another terminal program?

Also could you post the file you're using.  I don't have the skills to fix it, but I was wondering if it still does this in KDE 3.3.

Thanks
Comment 4 Munzir Taha 2004-12-20 14:41:38 UTC
Created attachment 8739 [details]
a sample arabic file
Comment 5 Munzir Taha 2004-12-20 14:42:48 UTC
Yes, it also happens in xterm. I've just attached a sample arabic_file
Comment 6 Robert Knight 2006-08-07 05:39:36 UTC
Fixed in Konsole for KDE 4 as a side effect of changing the way in which the incoming character stream is decoded for display.
Comment 7 Robert Knight 2006-08-07 05:43:23 UTC
To clarify, the new behaviour when running cut -b1 on the above file is to print 4 blank lines (ie. invalid character sequences produce nothing at the output).