Summary: | Search box on docs.krita.org does not work with Chinese search terms | ||
---|---|---|---|
Product: | [Applications] krita | Reporter: | Tyson Tan <tysontanx> |
Component: | Documentation | Assignee: | Krita Bugs <krita-bugs-null> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | alvin, griffinvalley, scottpetrovic |
Priority: | NOR | ||
Version First Reported In: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: | Working Chinese search |
Description
Tyson Tan
2020-03-28 01:29:55 UTC
It appears that Sphinx has issues supporting CJK (Chinese/Japanese/Korean) phrase search: CJK Phrase Search discussion provided the how-tos for at least Chinese: http://sphinxsearch.com/forum/view.html?id=11148 Sphinx's documentation about its RLP function: http://sphinxsearch.com/docs/current.html#conf-phrase-boundary The bug report of RLP not supporting Japanese and Korean http://sphinxsearch.com/bugs/view.php?id=1673 For the time being, I suggest adding a hint in the website to let CJK users know that they should use search engines like Google or DuckDuckGo instead. I wonder what version we are building the docs site with. It does seem like in late 2017 there was a fix to Sphinx that would get this working? https://github.com/sphinx-doc/sphinx/pull/4171/commits/44489029776b587ac1494df31d382cf8e595f2fa Maybe we just need to use a newer version of Sphinx for the build (In reply to 2wxsy58236r3 from comment #2) > For the time being, I suggest adding a hint in the website to let CJK users > know that they should use search engines like Google or DuckDuckGo instead. I've already added such a sentence on the top page today in the translation. But it will take another build of KDE Chinese project, then another update of the English version until other translations be synchronized as well, which can take months if the timing was unfortunate. China has blocked Google and DDG and basically everything except Bing, we can only use Bing to search international websites (which Krita.org is considered international). Chinese search providers don't index Krita.org, or gives it very low rank. (In reply to Scott Petrovic from comment #3) > I wonder what version we are building the docs site with. It does seem like > in late 2017 there was a fix to Sphinx that would get this working? > https://github.com/sphinx-doc/sphinx/pull/4171/commits/ > 44489029776b587ac1494df31d382cf8e595f2fa > > Maybe we just need to use a newer version of Sphinx for the build I hope so, or it can also be some arguments in Sphinx configuration we need to change. This seems to be stemming from a problem with sphinx's html search: https://github.com/sphinx-doc/sphinx/issues/1918 Some sphinx sites use elastisearch (sp?) for this, but no idea how exactly they're integrating that... Yeah that feels like a legitimate cause for such a problem. Wish we can find a way soon. I did notice the English search to be kinda weird too. It's not returning everything, but not as bad as the CJK versions where they return nothing. People also tend to "not seeing" the User Manual after landing at a page -- this is caused by how the left column is structured -- the User Manual link looks like a title for the whole Content column once you land on an actual page. Plus, some information is hiding under a page title you don't expect them to be, making them very difficult to discover without a proper search function. I have to act as a human index for the time being, I also put some crucial information on the Chinese equivalents of Quora and Reddit where they can be properly indexed and have high priority in the local search engines. Created attachment 139951 [details] Working Chinese search Sphinx does have Chinese search support. The catch is that it relies on the `jieba` library [1] [2] to perform Chinese word segmentation, which has not been installed on the build environment. [3] [1]: https://github.com/sphinx-doc/sphinx/blob/b09acabf0010ca95bab6f89012bb0e367cc1248e/sphinx/search/zh.py#L19 [2]: https://pypi.org/project/jieba/ [3]: https://invent.kde.org/sysadmin/ci-tooling/-/blob/master/system-images/static-websites/Dockerfile#L37 P.S.: (In reply to Tyson Tan from comment #1) > [...] Mind that sphinxsearch is completely unrelated to sphinx-doc. Ah! I'll install jieba tonight and if I can confirm that works, I'll make a sysadmin ticket for it. Yeah, this seems to work. Made a sysadmin ticket. Thank you guys! I think the issue is now solved! :D Since it's now broken again, please allow me to repoen this bug. See: https://phabricator.kde.org/T14693 Everything should be working now. Thanks! One thing to note: The English version shows highlighted keyword in extracted text of the result list. The Chinese version only shows metadata for some reason. But it's already much more useful compared to what it was before. (In reply to Tyson Tan from comment #13) > The English version shows highlighted keyword in extracted text of the > result list. The Chinese version only shows metadata for some reason. That is caused by Sphinx using the untranslated sources for the search result... I suppose you may open another bug for this and assign me to it, and I might investigate further some time in the future. Thanks! I've reported it as Bug 439989 and assigned you to it. Actually, I don't think the current Search box is working properly for CJK languages yet. For example, it can seach "笔刷" and return some results, but searching "笔刷预设" returns nothing. There must be some issues with the word dividing logic. Shall we mark it as Reopen again, or do you think this to be a different bug? The search function do work for some of the terms, so I would consider it a separate issue for housekeeping purpose. Opened bug 440246. |