Bug 467172 - Add ability to use Whisper models for speech recognition
Summary: Add ability to use Whisper models for speech recognition
Status: RESOLVED FIXED
Alias: None
Product: kdenlive
Classification: Applications
Component: Video Effects & Transitions (show other bugs)
Version: 22.12.3
Platform: Arch Linux Linux
: NOR wishlist
Target Milestone: ---
Assignee: Jean-Baptiste Mardelle
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-10 21:12 UTC by calibre705
Modified: 2023-03-15 07:36 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description calibre705 2023-03-10 21:12:26 UTC
The VOSK API compatible models for the speech recognition subtitles are not bad, but the Whisper models are better and allow for automatic detection of punctuation - VOSK not detecting these creates unreadable sentences where you do not know where they began and end.

A temporary workaround is to use the recasepunc model separately, but it puts a full stop at the end of every line of text, rather than where the sentence actually ends 4 subtitles later, requiring every subtitle to be edited to remove the excess full stops.

I hope to see Whisper support soon.
Comment 1 Jean-Baptiste Mardelle 2023-03-15 07:36:52 UTC
Git commit 7c1936bb44b592eaf3174f559fbb97cf61cc2bcc by Jean-Baptiste Mardelle.
Committed on 13/03/2023 at 11:54.
Pushed by mardelle into branch 'release/23.04'.

Add support for whisper speech recogition engine for automated subtitling

M  +3    -0    data/scripts/CMakeLists.txt
A  +14   -0    data/scripts/checkgpu.py
A  +65   -0    data/scripts/whispertosrt.py
A  +63   -0    data/scripts/whispertotext.py
M  +2    -0    src/core.h
M  +79   -2    src/dialogs/kdenlivesettingsdialog.cpp
M  +1    -1    src/dialogs/kdenlivesettingsdialog.h
M  +87   -27   src/dialogs/speechdialog.cpp
M  +1    -0    src/dialogs/speechdialog.h
M  +196  -36   src/dialogs/textbasededit.cpp
M  +4    -0    src/dialogs/textbasededit.h
M  +20   -0    src/kdenlivesettings.kcfg
M  +23   -1    src/pythoninterfaces/abstractpythoninterface.cpp
M  +4    -2    src/pythoninterfaces/abstractpythoninterface.h
M  +163  -6    src/pythoninterfaces/speechtotext.cpp
M  +9    -3    src/pythoninterfaces/speechtotext.h
M  +204  -92   src/ui/configspeech_ui.ui
M  +104  -58   src/ui/speechdialog_ui.ui
M  +24   -2    src/ui/textbasededit_ui.ui

https://invent.kde.org/multimedia/kdenlive/commit/7c1936bb44b592eaf3174f559fbb97cf61cc2bcc