(In reply to Tyson Tan from bug 419321) > Actually, I don't think the current Search box is working properly for CJK > languages yet. For example, it can seach "笔刷" and return some results, but > searching "笔刷预设" returns nothing. There must be some issues with the word > dividing logic. - `jieba` split it into two separate terms "笔刷" and "预设" in the search index. - The client side searching code can only split terms by whitespace, so "笔刷预设" (or any continuous CJK chars for the matter) is considered one term. - The searching code probably only finds exact matches. (In case you are interested, you can check the generated index at [1] -- paste its contents into a JS beautifier [2] and enable "Unescape printable chars encoded as \xNN or \uNNNN".) There are ways to provide `jieba` with custom dictionary terms and segmentation rules (check its readme [3] for more info). I think we can initialize them from `conf.py` if you would like to add some. However, *if* we make "笔刷预设" a full term in the index, then it seems likely that the search term "笔刷" will not be able to yield the results indexed with the term "笔刷预设", which might be worse than it currently is. We can probably see if there are any improvements in the upstream `sphinx_rtd_theme` search code to be backported, but most likely we will have to hack together something for matching search terms to get the behaviour we want. [1]: https://docs.krita.org/zh_CN/searchindex.js [2]: https://beautifier.io/ [3]: https://github.com/fxsjy/jieba
Git commit f67b09964d2ba0547bdfa5e73fa11ddcb0c98fe6 by Alvin Wong. Committed on 25/08/2021 at 14:10. Pushed by alvinwong into branch 'master'. Try to split search term into smaller parts for zh and ja When sphinx generates the search index, terms gets split into the smallest logical part, for example "笔刷预设介绍" will be split into three individual terms - "笔刷", "预设" and "介绍". The search page JavaScript does not know how to do segmentation (wouldn't be feasible anyway due to the need of a dictionary). Therefore here we add an extra logic to attempt to further split the search terms according to available terms in the search index to make the search function more useful for Chinese and Japanese languages. This logic requires that every part of the search term to be an existing term in the index, like "笔刷预设介绍". If the search term is instead "笔刷预设道路" and that "道路" does not exist in the index, this logic will not apply the split and the search will yield no results. M +52 -0 theme/static/searchtools.js_t https://invent.kde.org/documentation/docs-krita-org/commit/f67b09964d2ba0547bdfa5e73fa11ddcb0c98fe6
Git commit 2d63f23f4b21a182812fda8755ca94a58e661c01 by Alvin Wong. Committed on 25/08/2021 at 14:19. Pushed by alvinwong into branch 'krita/5.0'. Try to split search term into smaller parts for zh and ja When sphinx generates the search index, terms gets split into the smallest logical part, for example "笔刷预设介绍" will be split into three individual terms - "笔刷", "预设" and "介绍". The search page JavaScript does not know how to do segmentation (wouldn't be feasible anyway due to the need of a dictionary). Therefore here we add an extra logic to attempt to further split the search terms according to available terms in the search index to make the search function more useful for Chinese and Japanese languages. This logic requires that every part of the search term to be an existing term in the index, like "笔刷预设介绍". If the search term is instead "笔刷预设道路" and that "道路" does not exist in the index, this logic will not apply the split and the search will yield no results. (cherry picked from commit f67b09964d2ba0547bdfa5e73fa11ddcb0c98fe6) M +52 -0 theme/static/searchtools.js_t https://invent.kde.org/documentation/docs-krita-org/commit/2d63f23f4b21a182812fda8755ca94a58e661c01
Git commit f1068941ba2b3918630b2e03f99bdb92864b683c by Alvin Wong. Committed on 25/08/2021 at 14:41. Pushed by alvinwong into branch 'master'. Add the split search terms (zh & ja) to be highlighted M +4 -2 theme/static/searchtools.js_t https://invent.kde.org/documentation/docs-krita-org/commit/f1068941ba2b3918630b2e03f99bdb92864b683c
Git commit f8dfe32226a6258a916c3dbc600928001ac96a3e by Alvin Wong. Committed on 25/08/2021 at 14:43. Pushed by alvinwong into branch 'krita/5.0'. Add the split search terms (zh & ja) to be highlighted (cherry picked from commit f1068941ba2b3918630b2e03f99bdb92864b683c) M +4 -2 theme/static/searchtools.js_t https://invent.kde.org/documentation/docs-krita-org/commit/f8dfe32226a6258a916c3dbc600928001ac96a3e