Bug 83278

Summary:	If I cut -b1 arabic_file in konsole, the output displayed in screen is not correct
Product:	[Applications] konsole	Reporter:	Munzir Taha <munzirtaha>
Component:	general	Assignee:	Konsole Developer <konsole-devel>
Status:	RESOLVED FIXED
Severity:	normal
Priority:	NOR
Version:	unspecified
Target Milestone:	---
Platform:	Unlisted Binaries
OS:	Linux
Latest Commit:		Version Fixed In:
Sentry Crash Report:
Attachments:	a sample arabic file

Description Munzir Taha 2004-06-12 21:42:54 UTC

Version:            (using KDE KDE 3.2.2)
Installed from:    Unspecified Linux
OS:                Linux

[mimo@localhost mimo]$ cat arabic_file
ع
ر
ب
ي
١
٢
٣
٤

[mimo@localhost mimo]$ cut -b1 arabic_file
�
 �
  �
   �
    �
     �
      �
       �
        [mimo@localhost mimo]$

The problem is in the leading spaces. I can understand that the squares mean non-valid bytes but I can't understand why there are leading spaces!!

This may also proves useful:
[mimo@localhost mimo]$ hexdump arabic_file
0000000 b9d8 d80a 0ab1 a8d8 d90a 0a8a a1d9 d90a
0000010 0aa2 a3d9 d90a 0aa4
0000018

[mimo@localhost mimo]$ cut -b1 arabic_file |hexdump
0000000 0ad8 0ad8 0ad8 0ad9 0ad9 0ad9 0ad9 0ad9
0000010

Comment 1 Waldo Bastian 2004-06-14 11:56:15 UTC

Which encoding is this?

Comment 2 Munzir Taha 2004-06-14 19:16:51 UTC

The original arabic_file file is utf-8
$ file arabic_file
arabic_file: UTF-8 Unicode text

But since an Arabic character in utf-8 is two bytes, cutting the first byte won't generate something useful. Squares may mean not a valid utf-8 sequence but no leading spaces should be embedded anyway.

Comment 3 Brian Beck 2004-11-21 17:50:22 UTC

Just curious, does this behavior occur using xterm or another terminal program?

Also could you post the file you're using.  I don't have the skills to fix it, but I was wondering if it still does this in KDE 3.3.

Thanks

Comment 4 Munzir Taha 2004-12-20 14:41:38 UTC

Created attachment 8739 [details]
a sample arabic file

Comment 5 Munzir Taha 2004-12-20 14:42:48 UTC

Yes, it also happens in xterm. I've just attached a sample arabic_file

Comment 6 Robert Knight 2006-08-07 05:39:36 UTC

Fixed in Konsole for KDE 4 as a side effect of changing the way in which the incoming character stream is decoded for display.

Comment 7 Robert Knight 2006-08-07 05:43:23 UTC

To clarify, the new behaviour when running cut -b1 on the above file is to print 4 blank lines (ie. invalid character sequences produce nothing at the output).