83278 – If I cut -b1 arabic_file in konsole, the output displayed in screen is not correct

Bug 83278 - If I cut -b1 arabic_file in konsole, the output displayed in screen is not correct

Summary: If I cut -b1 arabic_file in konsole, the output displayed in screen is not co...

Status:	RESOLVED FIXED

Alias:	None

Product:	konsole
Classification:	Applications
Component:	general (show other bugs)
Version:	unspecified
Platform:	Unlisted Binaries Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Konsole Developer

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-06-12 21:42 UTC by Munzir Taha
Modified:	2006-08-07 05:43 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
a sample arabic file (12 bytes, text/plain) 2004-12-20 14:41 UTC, Munzir Taha	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Munzir Taha 2004-06-12 21:42:54 UTC

Version:            (using KDE KDE 3.2.2)
Installed from:    Unspecified Linux
OS:                Linux

[mimo@localhost mimo]$ cat arabic_file
ع
ر
ب
ي
١
٢
٣
٤

[mimo@localhost mimo]$ cut -b1 arabic_file
�
 �
  �
   �
    �
     �
      �
       �
        [mimo@localhost mimo]$

The problem is in the leading spaces. I can understand that the squares mean non-valid bytes but I can't understand why there are leading spaces!!

This may also proves useful:
[mimo@localhost mimo]$ hexdump arabic_file
0000000 b9d8 d80a 0ab1 a8d8 d90a 0a8a a1d9 d90a
0000010 0aa2 a3d9 d90a 0aa4
0000018

[mimo@localhost mimo]$ cut -b1 arabic_file |hexdump
0000000 0ad8 0ad8 0ad8 0ad9 0ad9 0ad9 0ad9 0ad9
0000010

Comment 1 Waldo Bastian 2004-06-14 11:56:15 UTC

Which encoding is this?

Comment 2 Munzir Taha 2004-06-14 19:16:51 UTC

The original arabic_file file is utf-8
$ file arabic_file
arabic_file: UTF-8 Unicode text

But since an Arabic character in utf-8 is two bytes, cutting the first byte won't generate something useful. Squares may mean not a valid utf-8 sequence but no leading spaces should be embedded anyway.

Comment 3 Brian Beck 2004-11-21 17:50:22 UTC

Just curious, does this behavior occur using xterm or another terminal program?

Also could you post the file you're using.  I don't have the skills to fix it, but I was wondering if it still does this in KDE 3.3.

Thanks

Comment 4 Munzir Taha 2004-12-20 14:41:38 UTC

Created attachment 8739 [details]
a sample arabic file

Comment 5 Munzir Taha 2004-12-20 14:42:48 UTC

Yes, it also happens in xterm. I've just attached a sample arabic_file

Comment 6 Robert Knight 2006-08-07 05:39:36 UTC

Fixed in Konsole for KDE 4 as a side effect of changing the way in which the incoming character stream is decoded for display.

Comment 7 Robert Knight 2006-08-07 05:43:23 UTC

To clarify, the new behaviour when running cut -b1 on the above file is to print 4 blank lines (ie. invalid character sequences produce nothing at the output).