The current version of kcharselect (including that in the Git repository) has a number of Unicode blocks missing: Arabic Extended-A Meetei Mayek Extensions Sundanese Supplement The section name "Other Scripts" should be "American Scripts" to conform with the Unicode charts at http://www.unicode.org/charts/. In addition, I would like the block "General Punctuation" to be the first listed in "Symbols", as this is by far the most commonly-used block. Listing it first saves having to use the secondary drop-box to select it, saving at least a few keystrokes or mouse movements (which get rather annoying rather quickly). I will attach a patch that fixes all of these problems.
Created attachment 70357 [details] Add missing Unicode blocks, rearrange some blocks 1. The following missing Unicode blocks are added: Arabic Extended-A Meetei Mayek Extensions Sundanese Supplement 2. The section "Other Scripts" is renamed "American Scripts" to match the Unicode charts. 3. "General Punctuation" is listed first in the section "Symbols". 4. Various blocks in the section "Mathematical Symbols" are rearranged into alphabetical order.
Created attachment 70358 [details] Diff from running kcharselect-generate-datafile.py Diff created by running the patched kcharselect-generate-datafile.py script on Unicode 6.1 data, as compared with the kdelibs Git repository.
For those who want to fix the issues mentioned in this bug without waiting---without even recompiling kcharselect---all that needs to be done is to place the new version of kcharselect-data in the appropriate directory. I have successfully used the new version of kcharselect-data with KDE SC 4.8.2. Of course, this means that one new string ("American Scripts") will not be localised. To fix this bug on a temporary basis: 1. Download the new kcharselect-data from my website: http://www.zap.org.au/~john/misc/kcharselect-data 2. Place the downloaded file in your KDE apps directory: cp kcharselect-data ~/.kde/share/apps/kcharselect (Your path may be slightly different). Kcharselect will now display the missing Unicode blocks. 3. Remember to remove your copy of kcharselect-data once the upstream version of the file is fixed.
I have updated kcharselect-data for Unicode 7.0. In particular, all missing BMP blocks have been added: Latin Extended-E Arabic Extended-A Meetei Mayek Extensions Myanmar Extended-B Sundanese Supplement Combining Diacritical Marks Extended I have also renamed and rearranged the sections to more closely match those on the Unicode website at http://www.unicode.org/charts/ : European Alphabets => European Scripts Philippine Scripts => Indonesia and Oceania Scripts South East Asian Scripts => Southeast Asian Scripts Other Scripts => American Scripts Comment 3 remains valid: users can update their kcharselect-data immediately, if desired. It would be nice to add all the non-BMP characters to KCharSelect, but given that QChar is 16-bit only, I'm not sure how one would do that. Even in Qt5, QChar is still 16-bit... I'm disappointed that my previous patch---created two years ago---was not applied. Is KCharSelect being maintained? Please apply this patch!
Comment on attachment 70357 [details] Add missing Unicode blocks, rearrange some blocks diff -ruNa kcharselect.orig/kcharselect-generate-datafile.py kcharselect/kcharselect-generate-datafile.py --- kcharselect.orig/kcharselect-generate-datafile.py 2014-07-02 09:35:18.516222690 +1000 +++ kcharselect/kcharselect-generate-datafile.py 2014-07-02 09:32:04.825658372 +1000 @@ -102,13 +102,14 @@ # based on http://www.unicode.org/charts/ sectiondata = ''' -SECTION European Alphabets +SECTION European Scripts Basic Latin Latin-1 Supplement Latin Extended-A Latin Extended-B Latin Extended-C Latin Extended-D +Latin Extended-E Latin Extended Additional Armenian Coptic @@ -137,6 +138,7 @@ SECTION Middle Eastern Scripts Arabic Arabic Supplement +Arabic Extended-A Arabic Presentation Forms-A Arabic Presentation Forms-B Hebrew @@ -144,6 +146,11 @@ Samaritan Syriac +SECTION Central Asian Scripts +Mongolian +Phags-pa +Tibetan + SECTION South Asian Scripts Bengali Common Indic Number Forms @@ -156,6 +163,7 @@ Limbu Malayalam Meetei Mayek +Meetei Mayek Extensions Ol Chiki Oriya Saurashtra @@ -166,33 +174,34 @@ Thaana Vedic Extensions -SECTION Philippine Scripts -Buhid -Hanunoo -Tagalog -Tagbanwa - - -SECTION South East Asian Scripts -Balinese -Batak -Buginese +SECTION Southeast Asian Scripts Cham -Javanese Kayah Li Khmer Khmer Symbols Lao Myanmar Myanmar Extended-A +Myanmar Extended-B New Tai Lue -Rejang -Sundanese Tai Le Tai Tham Tai Viet Thai +SECTION Indonesia and Oceania Scripts +Balinese +Batak +Buginese +Buhid +Hanunoo +Javanese +Rejang +Sundanese +Sundanese Supplement +Tagalog +Tagbanwa + SECTION East Asian Scripts Bopomofo Bopomofo Extended @@ -220,23 +229,18 @@ Yi Radicals Yi Syllables -SECTION Central Asian Scripts -Mongolian -Phags-pa -Tibetan - -SECTION Other Scripts +SECTION American Scripts Cherokee Unified Canadian Aboriginal Syllabics Unified Canadian Aboriginal Syllabics Extended SECTION Symbols +General Punctuation Braille Patterns Control Pictures Currency Symbols Dingbats Enclosed Alphanumerics -General Punctuation Miscellaneous Symbols Miscellaneous Technical Optical Character Recognition @@ -249,17 +253,17 @@ Arrows Block Elements Box Drawing -Supplemental Arrows-A -Supplemental Arrows-B Geometric Shapes Letterlike Symbols Mathematical Operators -Supplemental Mathematical Operators Miscellaneous Mathematical Symbols-A Miscellaneous Mathematical Symbols-B Miscellaneous Symbols and Arrows Number Forms Superscripts and Subscripts +Supplemental Arrows-A +Supplemental Arrows-B +Supplemental Mathematical Operators SECTION Phonetic Symbols IPA Extensions @@ -268,8 +272,9 @@ Phonetic Extensions Supplement Spacing Modifier Letters -SECTION Combining Diacritical Marks +SECTION Combining Diacritics Combining Diacritical Marks +Combining Diacritical Marks Extended Combining Diacritical Marks Supplement Combining Diacritical Marks for Symbols Combining Half Marks @@ -284,7 +289,6 @@ Specials Variation Selectors ''' -# TODO: rename "Other Scripts" to "American Scripts" categoryMap = { # same values as QChar::Category "Mn": 1, @@ -533,7 +537,7 @@ def getBlockList(self): return self.blockList - + def getSectionList(self): return self.sectionList
Created attachment 87507 [details] Add missing Unicode blocks, rearrange some blocks
Created attachment 87508 [details] Diff from running kcharselect-generate-datafile.py
Created attachment 87509 [details] Generated kcharselect-data
Created attachment 87510 [details] Diff from running kcharselect-generate-datafile.py
I am not sure if the patches require an updated Qt, in other words, if KCharSelect can only support the Unicode version that Qt supports. Hopefully the KCharSelect maintainer finds some time to review the changes, and provide some useful comments. On the other hand, thinking about bug 142625 KCharSelect might need an overhaul anyway. Again, thanks for the patches, I guess it would be useful if you applied for a KDE developer account.
My patches do NOT require an updated Qt that can handle non-BMP characters.
Created attachment 98809 [details] Add missing Unicode blocks, rearrange some blocks (for KF5) Updated for GIT version of kcharselect as at 2016-04-06 (for KF5).
Created attachment 98810 [details] Generated kcharselect-data file Updated for KF5 and Unicode 8.0
Created attachment 98811 [details] Diff from running kcharselect-generate-datafile.py The change generated by running the updated kcharselect-generate-datafile.py on data obtained from Unicode 8.0. The resulting kcharselect-translation.cpp file needs to be placed in kwidgetsaddons/src (KF5).
I am really disappointed that no one has applied these simple patches in over four years! These patches are still very much necessary. I have updated the patch for KF5 and Unicode 8.0. The resulting data file can also be used with KDE4. Only BMP characters (<= 0xFFFF) are included for compatibility. ==== For those who want to fix the issues mentioned in this bug without waiting---without even recompiling kcharselect---all that needs to be done is to place the new version of kcharselect-data in the appropriate directory. I have successfully used the new version of kcharselect-data with the latest KF5. Of course, this means that one new string ("American Scripts") will not be localised. To fix this bug on a temporary basis: 1. Download the new kcharselect-data from my website: http://www.zap.org.au/~john/misc/kcharselect-data 2. Place the downloaded file in the appropriate KF5 and KDE4 directories: mkdir -p ~/.local/share/kf5/kcharselect cp kcharselect-data ~/.local/share/kf5/kcharselect mkdir -p ~/.kde/share/apps/kcharselect cp kcharselect-data ~/.kde/share/apps/kcharselect (Your paths may be slightly different). Kcharselect will now display the missing Unicode blocks. 3. Remember to remove your copies of kcharselect-data once the upstream versions of the file are fixed. And I hope that will be soon!
KCharSelect is pretty out of date, it's too bad. It also needs emoji support, which GNOME Character Map has. I feel strange using gucharmap on my Plasma desktop, but it's my only option right now.
Git commit 9ba72a807a18da73c05e3e99f1c9799cf95f0c36 by Christoph Feck, on behalf of John Zaitseff. Committed on 14/07/2016 at 19:41. Pushed by cfeck into branch 'master'. Add missing Unicode blocks; improve ordering Reviewed by Christoph Feck M +78 -66 kcharselect-generate-datafile.py http://commits.kde.org/kcharselect/9ba72a807a18da73c05e3e99f1c9799cf95f0c36
Git commit deeb355fea88559dea8e36150db8f55f22c5a494 by Christoph Feck, on behalf of John Zaitseff. Committed on 14/07/2016 at 19:41. Pushed by bshah into branch 'master'. Add missing Unicode blocks; improve ordering Reviewed by Christoph Feck M +78 -66 kcharselect-generate-datafile.py http://commits.kde.org/kwidgetsaddons/deeb355fea88559dea8e36150db8f55f22c5a494