Bug 431866 - radselect: Radicals within each column are displayed in random order
Summary: radselect: Radicals within each column are displayed in random order
Status: RESOLVED FIXED
Alias: None
Product: kiten
Classification: Applications
Component: general (other bugs)
Version First Reported In: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Joseph Kerian
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-21 02:53 UTC by Frédéric Brière
Modified: 2021-01-31 22:59 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frédéric Brière 2021-01-21 02:53:27 UTC
While the radicals in radselect are grouped in columns based on their stroke count, the list of radicals in any column is not sorted in any way, resulting in them being shuffled in random order at every launch.  With up to 40 radicals in some columns, this make radselect nearly unusable.

Obviously, the solution is to sort those radicals, and I can think of three possible orders:

1) The order in which they are listed within radkfile

The order of radicals within radkfile was not chosen at random, but is closely similar (though not identical) to the order of Kangxi radicals, which is a de facto standard for most dictionaries and has been drilled in the head of Japanese schoolchildren.  (XJDIC also uses this order when pressing "R" within Radical Lookup Mode, though it displays them *horizontally* and without line breaks, so they all fit within a single screen.)

  PROS:
    - Convenient for people already familiar with the standard radicals order
    - Might be familiar to XJDIC users

  CONS:
    - Inconvenient for people *not* familiar with the standard order (aka most Westerners)
    - Requires much scrolling down for some very common radicals (e.g. ⺾)

2) Decreasing frequency order

In other words, the radical in each column which appears in the most kanji goes on top, and the rarest one sinks to the bottom.  Ties (common with rare radicals) could be broken with method #1.

  PROS:
    - Intuitive for most users
    - Most common radicals are always within reach
  CONS:
    - Order is somewhat arbitrary and unstable, as it can vary due to radkfile updates (e.g. when ⺣ was added to all kanji containing 馬, making it suddenly jump in frequency)

3) Unicode string order

While sorting radicals as Unicode strings doesn't seem like a good idea to me, it is technically an option, so I figured I might as well mention it.

  PROS:
    - It's very easy to program  :)
  CONS:
    - The result would only be meaningful to robots


The solution I'm opting for is to allow the user to choose between #1 (radkfile) and #2 (frequency), with the latter being the default, figuring that most users would be more comfortable with it.  I'm about to submit a merge request doing that, but I thought it would nevertheless be a good idea to file a bug report first, explaining the possible options with their pros and cons, and giving an opportunity for others to voice their opinion.
Comment 1 2wxsy58236r3 2021-01-21 05:25:32 UTC
I prefer solution #1, but...



radkfile's Problem 1
--------------------

Some radicals are not represented using the proper glyphs. Is it because the file uses EUC-JP encoding?

For example, radkfile uses "化" to represent the radical "亻", "个" to represent the radical "𠆢", and "刈" to represent the radical "刂".


radkfile's Problem 2
--------------------

As stated in [1], the radicals/elements used in the decomposition are NOT the same as the classical 214 radicals.

For example, it "created" a radical "滴" (to represent "滴" without "氵"), but "滴" without "氵" is not a standard radical. You won't find it in a Japanese dictionary.



If the above 2 problems can be solved, then solution #1 would be the best. A learner can switch to a paperback Japanese dictionary easily if the program lists _accurate_ radicals.



[1] http://nihongo.monash.edu/kradinf.html
Comment 2 Frédéric Brière 2021-01-21 06:04:59 UTC
(In reply to 2wxsy58236r3 from comment #1)
> Some radicals are not represented using the proper glyphs. Is it because the
> file uses EUC-JP encoding?

Yes.  (Or, to be more precise, it is restricted to JIS X 0208.)

> For example, radkfile uses "化" to represent the radical "亻", "个" to
> represent the radical "𠆢", and "刈" to represent the radical "刂".

I've already written a patch that replaces those substitutes with the proper characters, but since it would conflict with another merge request, it's on the back-burner at the moment.

> As stated in [1], the radicals/elements used in the decomposition are NOT
> the same as the classical 214 radicals.

Exact.  If you see this as a problem, you seem to be mistaken about the purpose of kitenradselect, which is simply to provide an easy way to look up kanji based on their written form.  Take it as an alternative to SKIP, the Four-Corner Method, and other similar methods.
Comment 3 2wxsy58236r3 2021-01-21 07:27:14 UTC
Thanks for the response. You are right that radkfile isn't matching a kanji to the respective radical.
e.g. 摘 has the element 口 so it is included in the 口 radical/element section.

The button to open kitenradselect is labelled "部首の選択" so it may be misleading. I thought it would be the same as how you lookup a kanji based on its Kangxi radical in a Japanese dictionary.

In my opinion, adding a lookup method which is based on the standard Kangxi radicals (in addition to the existing method of lookup by element / non-standard radicals) will be good, but I guess the structure of radkfile and the large number of kanji makes this difficult, unless someone volunteers to create a file which properly matches a kanji to its respective radical.
Comment 4 Frédéric Brière 2021-01-21 08:02:45 UTC
(In reply to 2wxsy58236r3 from comment #3)
> In my opinion, adding a lookup method which is based on the standard Kangxi
> radicals (in addition to the existing method of lookup by element /
> non-standard radicals) will be good, but I guess the structure of radkfile
> and the large number of kanji makes this difficult, unless someone
> volunteers to create a file which properly matches a kanji to its respective
> radical.

This information is already available in the kanjidic file, although I think the kanji browser would be a more appropriate location to display/search it.  Feel free to file a bug asking for this functionality.
Comment 5 Bug Janitor Service 2021-01-28 00:19:41 UTC
A possibly relevant merge request was started @ https://invent.kde.org/education/kiten/-/merge_requests/19
Comment 6 Frédéric Brière 2021-01-31 22:59:13 UTC
Git commit c49a0b28f9e5a69a79b9b10082336293aba80ea4 by Frédéric Brière.
Committed on 31/01/2021 at 16:00.
Pushed by aacid into branch 'master'.

radselect: Add option to display radicals in decreasing frequency order

M  +12   -1    radselect/buttongrid.cpp
M  +3    -0    radselect/buttongrid.h
M  +9    -0    radselect/radical.cpp
M  +1    -0    radselect/radical.h
M  +4    -0    radselect/radselectconfig.kcfg
M  +7    -0    radselect/radselectprefdialog.ui
M  +2    -0    radselect/radselectview.cpp

https://invent.kde.org/education/kiten/commit/c49a0b28f9e5a69a79b9b10082336293aba80ea4