The VOSK API compatible models for the speech recognition subtitles are not bad, but the Whisper models are better and allow for automatic detection of punctuation - VOSK not detecting these creates unreadable sentences where you do not know where they began and end. A temporary workaround is to use the recasepunc model separately, but it puts a full stop at the end of every line of text, rather than where the sentence actually ends 4 subtitles later, requiring every subtitle to be edited to remove the excess full stops. I hope to see Whisper support soon.
Git commit 7c1936bb44b592eaf3174f559fbb97cf61cc2bcc by Jean-Baptiste Mardelle. Committed on 13/03/2023 at 11:54. Pushed by mardelle into branch 'release/23.04'. Add support for whisper speech recogition engine for automated subtitling M +3 -0 data/scripts/CMakeLists.txt A +14 -0 data/scripts/checkgpu.py A +65 -0 data/scripts/whispertosrt.py A +63 -0 data/scripts/whispertotext.py M +2 -0 src/core.h M +79 -2 src/dialogs/kdenlivesettingsdialog.cpp M +1 -1 src/dialogs/kdenlivesettingsdialog.h M +87 -27 src/dialogs/speechdialog.cpp M +1 -0 src/dialogs/speechdialog.h M +196 -36 src/dialogs/textbasededit.cpp M +4 -0 src/dialogs/textbasededit.h M +20 -0 src/kdenlivesettings.kcfg M +23 -1 src/pythoninterfaces/abstractpythoninterface.cpp M +4 -2 src/pythoninterfaces/abstractpythoninterface.h M +163 -6 src/pythoninterfaces/speechtotext.cpp M +9 -3 src/pythoninterfaces/speechtotext.h M +204 -92 src/ui/configspeech_ui.ui M +104 -58 src/ui/speechdialog_ui.ui M +24 -2 src/ui/textbasededit_ui.ui https://invent.kde.org/multimedia/kdenlive/commit/7c1936bb44b592eaf3174f559fbb97cf61cc2bcc