Bug 419321 - Search box on docs.krita.org does not work with Chinese search terms
Summary: Search box on docs.krita.org does not work with Chinese search terms
Status: RESOLVED FIXED
Alias: None
Product: krita
Classification: Applications
Component: Documentation (other bugs)
Version First Reported In: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Krita Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-28 01:29 UTC by Tyson Tan
Modified: 2021-07-25 06:06 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Working Chinese search (106.47 KB, image/png)
2021-07-08 13:26 UTC, Alvin Wong
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tyson Tan 2020-03-28 01:29:55 UTC
The search box on the Chinese version of docs.krita.org is not working. In fact, it probably never worked. As long as the search term is Chinese, it is always no match, while many pages have been in fact, fully translated.

For example, if we search the Chinese word for brush: 笔刷, It returns nothing. But if we search in English for "brush", on the Chinese version, we can still get the untranslated pages with the word "brush".

This issue greatly impacts the usefulness of Krita's documentation for Chinese users. The effort of translating it does not reach many of our end users who searched the website for help. Whether there is something me as a translator can do, or it is in fact a configuration issue of Sphinx, help is greatly appreciated.
Comment 1 Tyson Tan 2020-03-28 01:46:02 UTC
It appears that Sphinx has issues supporting CJK (Chinese/Japanese/Korean) phrase search:

CJK Phrase Search discussion provided the how-tos for at least Chinese:
http://sphinxsearch.com/forum/view.html?id=11148

Sphinx's documentation about its RLP function:
http://sphinxsearch.com/docs/current.html#conf-phrase-boundary

The bug report of RLP not supporting Japanese and Korean
http://sphinxsearch.com/bugs/view.php?id=1673
Comment 2 2wxsy58236r3 2020-03-28 13:53:16 UTC
For the time being, I suggest adding a hint in the website to let CJK users know that they should use search engines like Google or DuckDuckGo instead.
Comment 3 Scott Petrovic 2020-03-28 14:24:26 UTC
I wonder what version we are building the docs site with. It does seem like in late 2017 there was a fix to Sphinx that would get this working?
https://github.com/sphinx-doc/sphinx/pull/4171/commits/44489029776b587ac1494df31d382cf8e595f2fa

Maybe we just need to use a newer version of Sphinx for the build
Comment 4 Tyson Tan 2020-03-29 13:03:15 UTC
(In reply to 2wxsy58236r3 from comment #2)
> For the time being, I suggest adding a hint in the website to let CJK users
> know that they should use search engines like Google or DuckDuckGo instead.

I've already added such a sentence on the top page today in the translation. But it will take another build of KDE Chinese project, then another update of the English version until other translations be synchronized as well, which can take months if the timing was unfortunate. China has blocked Google and DDG and basically everything except Bing, we can only use Bing to search international websites (which Krita.org is considered international). Chinese search providers don't index Krita.org, or gives it very low rank.

(In reply to Scott Petrovic from comment #3)
> I wonder what version we are building the docs site with. It does seem like
> in late 2017 there was a fix to Sphinx that would get this working?
> https://github.com/sphinx-doc/sphinx/pull/4171/commits/
> 44489029776b587ac1494df31d382cf8e595f2fa
> 
> Maybe we just need to use a newer version of Sphinx for the build

I hope so, or it can also be some arguments in Sphinx configuration we need to change.
Comment 5 wolthera 2020-08-29 15:31:24 UTC
This seems to be stemming from a problem with sphinx's html search: https://github.com/sphinx-doc/sphinx/issues/1918

Some sphinx sites use elastisearch (sp?) for this, but no idea how exactly they're integrating that...
Comment 6 Tyson Tan 2020-08-31 01:56:30 UTC
Yeah that feels like a legitimate cause for such a problem. Wish we can find a way soon. I did notice the English search to be kinda weird too. It's not returning everything, but not as bad as the CJK versions where they return nothing. 

People also tend to "not seeing" the User Manual after landing at a page -- this is caused by how the left column is structured -- the User Manual link looks like a title for the whole Content column once you land on an actual page. Plus, some information is hiding under a page title you don't expect them to be, making them very difficult to discover without a proper search function.

I have to act as a human index for the time being, I also put some crucial information on the Chinese equivalents of Quora and Reddit where they can be properly indexed and have high priority in the local search engines.
Comment 7 Alvin Wong 2021-07-08 13:26:04 UTC
Created attachment 139951 [details]
Working Chinese search

Sphinx does have Chinese search support. The catch is that it relies on the `jieba` library [1] [2] to perform Chinese word segmentation, which has not been installed on the build environment. [3]

[1]: https://github.com/sphinx-doc/sphinx/blob/b09acabf0010ca95bab6f89012bb0e367cc1248e/sphinx/search/zh.py#L19
[2]: https://pypi.org/project/jieba/
[3]: https://invent.kde.org/sysadmin/ci-tooling/-/blob/master/system-images/static-websites/Dockerfile#L37


P.S.:

(In reply to Tyson Tan from comment #1)
> [...]

Mind that sphinxsearch is completely unrelated to sphinx-doc.
Comment 8 wolthera 2021-07-08 13:28:35 UTC
Ah! I'll install jieba tonight and if I can confirm that works, I'll make a sysadmin ticket for it.
Comment 9 wolthera 2021-07-09 11:05:01 UTC
Yeah, this seems to work. Made a sysadmin ticket.
Comment 10 Tyson Tan 2021-07-10 03:01:14 UTC
Thank you guys! I think the issue is now solved! :D
Comment 11 Tyson Tan 2021-07-11 13:35:38 UTC
Since it's now broken again, please allow me to repoen this bug.

See:
https://phabricator.kde.org/T14693
Comment 12 Alvin Wong 2021-07-17 07:19:59 UTC
Everything should be working now.
Comment 13 Tyson Tan 2021-07-17 07:25:36 UTC
Thanks!

One thing to note:
The English version shows highlighted keyword in extracted text of the result list. The Chinese version only shows metadata for some reason.

But it's already much more useful compared to what it was before.
Comment 14 Alvin Wong 2021-07-17 07:30:43 UTC
(In reply to Tyson Tan from comment #13)
> The English version shows highlighted keyword in extracted text of the
> result list. The Chinese version only shows metadata for some reason.

That is caused by Sphinx using the untranslated sources for the search result... I suppose you may open another bug for this and assign me to it, and I might investigate further some time in the future.
Comment 15 Tyson Tan 2021-07-18 00:57:39 UTC
Thanks! I've reported it as Bug 439989 and assigned you to it.
Comment 16 Tyson Tan 2021-07-24 18:42:09 UTC
Actually, I don't think the current Search box is working properly for CJK languages yet. For example, it can seach "笔刷" and return some results, but searching "笔刷预设" returns nothing. There must be some issues with the word dividing logic. Shall we mark it as Reopen again, or do you think this to be a different bug?
Comment 17 Alvin Wong 2021-07-25 06:06:25 UTC
The search function do work for some of the terms, so I would consider it a separate issue for housekeeping purpose. Opened bug 440246.