Bug 464439

Summary: Chinese characters are wrongly separated with spaces using speech to text
Product: [Applications] kdenlive Reporter: sunruikang2000
Component: Video Effects & TransitionsAssignee: Jean-Baptiste Mardelle <jb>
Status: CONFIRMED ---    
Severity: normal CC: erjiang
Priority: NOR    
Version First Reported In: 22.12.1   
Target Milestone: ---   
Platform: Microsoft Windows   
OS: Microsoft Windows   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description sunruikang2000 2023-01-18 04:45:55 UTC
When I use speech to text to recognize Chinese, Chinese characters are wrongly separated with spaces. I think it may be an issue of CJK words recognization engine. CJK characters should not be separated to read.

It looks like this:
希望 大家 明白
hope (wrong space) everyone (wrong space) understand

It should be like this:
希望大家明白
hope (no space) everyone (no space) understand
Comment 1 erjiang 2023-01-25 02:16:59 UTC
I think i’ve noticed this too in the past. My guess is that it’s just what vosk (the speech recognizer) outputs, but maybe we can just detect if the language is Chinese and remove the spaces in Kdenlive. A workaround is to edit the subtitle file in a text editor and remove the spaces.
Comment 2 sunruikang2000 2023-04-26 06:58:37 UTC
(In reply to erjiang from comment #1)
> I think i’ve noticed this too in the past. My guess is that it’s just what
> vosk (the speech recognizer) outputs, but maybe we can just detect if the
> language is Chinese and remove the spaces in Kdenlive. A workaround is to
> edit the subtitle file in a text editor and remove the spaces.

Yes, using text editor is a temporary method. But still a little bit difficult because ".srt" format maybe wrongly changed by users.