Bug 467172

Summary: Add ability to use Whisper models for speech recognition
Product: [Applications] kdenlive Reporter: calibre705
Component: Video Effects & TransitionsAssignee: Jean-Baptiste Mardelle <jb>
Status: RESOLVED FIXED    
Severity: wishlist    
Priority: NOR    
Version First Reported In: 22.12.3   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description calibre705 2023-03-10 21:12:26 UTC
The VOSK API compatible models for the speech recognition subtitles are not bad, but the Whisper models are better and allow for automatic detection of punctuation - VOSK not detecting these creates unreadable sentences where you do not know where they began and end.

A temporary workaround is to use the recasepunc model separately, but it puts a full stop at the end of every line of text, rather than where the sentence actually ends 4 subtitles later, requiring every subtitle to be edited to remove the excess full stops.

I hope to see Whisper support soon.
Comment 1 Jean-Baptiste Mardelle 2023-03-15 07:36:52 UTC
Git commit 7c1936bb44b592eaf3174f559fbb97cf61cc2bcc by Jean-Baptiste Mardelle.
Committed on 13/03/2023 at 11:54.
Pushed by mardelle into branch 'release/23.04'.

Add support for whisper speech recogition engine for automated subtitling

M  +3    -0    data/scripts/CMakeLists.txt
A  +14   -0    data/scripts/checkgpu.py
A  +65   -0    data/scripts/whispertosrt.py
A  +63   -0    data/scripts/whispertotext.py
M  +2    -0    src/core.h
M  +79   -2    src/dialogs/kdenlivesettingsdialog.cpp
M  +1    -1    src/dialogs/kdenlivesettingsdialog.h
M  +87   -27   src/dialogs/speechdialog.cpp
M  +1    -0    src/dialogs/speechdialog.h
M  +196  -36   src/dialogs/textbasededit.cpp
M  +4    -0    src/dialogs/textbasededit.h
M  +20   -0    src/kdenlivesettings.kcfg
M  +23   -1    src/pythoninterfaces/abstractpythoninterface.cpp
M  +4    -2    src/pythoninterfaces/abstractpythoninterface.h
M  +163  -6    src/pythoninterfaces/speechtotext.cpp
M  +9    -3    src/pythoninterfaces/speechtotext.h
M  +204  -92   src/ui/configspeech_ui.ui
M  +104  -58   src/ui/speechdialog_ui.ui
M  +24   -2    src/ui/textbasededit_ui.ui

https://invent.kde.org/multimedia/kdenlive/commit/7c1936bb44b592eaf3174f559fbb97cf61cc2bcc