| Summary: | If I cut -b1 arabic_file in konsole, the output displayed in screen is not correct | ||
|---|---|---|---|
| Product: | [Applications] konsole | Reporter: | Munzir Taha <munzirtaha> |
| Component: | general | Assignee: | Konsole Bugs <konsole-bugs-null> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | ||
| Priority: | NOR | ||
| Version First Reported In: | unspecified | ||
| Target Milestone: | --- | ||
| Platform: | Unlisted Binaries | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: | a sample arabic file | ||
Which encoding is this? The original arabic_file file is utf-8 $ file arabic_file arabic_file: UTF-8 Unicode text But since an Arabic character in utf-8 is two bytes, cutting the first byte won't generate something useful. Squares may mean not a valid utf-8 sequence but no leading spaces should be embedded anyway. Just curious, does this behavior occur using xterm or another terminal program? Also could you post the file you're using. I don't have the skills to fix it, but I was wondering if it still does this in KDE 3.3. Thanks Created attachment 8739 [details]
a sample arabic file
Yes, it also happens in xterm. I've just attached a sample arabic_file Fixed in Konsole for KDE 4 as a side effect of changing the way in which the incoming character stream is decoded for display. To clarify, the new behaviour when running cut -b1 on the above file is to print 4 blank lines (ie. invalid character sequences produce nothing at the output). |
Version: (using KDE KDE 3.2.2) Installed from: Unspecified Linux OS: Linux [mimo@localhost mimo]$ cat arabic_file ع ر ب ي ١ ٢ ٣ ٤ [mimo@localhost mimo]$ cut -b1 arabic_file � � � � � � � � [mimo@localhost mimo]$ The problem is in the leading spaces. I can understand that the squares mean non-valid bytes but I can't understand why there are leading spaces!! This may also proves useful: [mimo@localhost mimo]$ hexdump arabic_file 0000000 b9d8 d80a 0ab1 a8d8 d90a 0a8a a1d9 d90a 0000010 0aa2 a3d9 d90a 0aa4 0000018 [mimo@localhost mimo]$ cut -b1 arabic_file |hexdump 0000000 0ad8 0ad8 0ad8 0ad9 0ad9 0ad9 0ad9 0ad9 0000010