467172 – Add ability to use Whisper models for speech recognition

Bug 467172 - Add ability to use Whisper models for speech recognition

Summary: Add ability to use Whisper models for speech recognition

Status:	RESOLVED FIXED

Alias:	None

Product:	kdenlive
Classification:	Applications
Component:	Video Effects & Transitions (show other bugs)
Version:	22.12.3
Platform:	Arch Linux Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Jean-Baptiste Mardelle

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-03-10 21:12 UTC by calibre705
Modified:	2023-03-15 07:36 UTC (History)
CC List:	0 users

See Also:
Latest Commit:	https://invent.kde.org/multimedia/kdenlive/commit/7c1936bb44b592eaf3174f559fbb97cf61cc2bcc
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description calibre705 2023-03-10 21:12:26 UTC

The VOSK API compatible models for the speech recognition subtitles are not bad, but the Whisper models are better and allow for automatic detection of punctuation - VOSK not detecting these creates unreadable sentences where you do not know where they began and end.

A temporary workaround is to use the recasepunc model separately, but it puts a full stop at the end of every line of text, rather than where the sentence actually ends 4 subtitles later, requiring every subtitle to be edited to remove the excess full stops.

I hope to see Whisper support soon.

Comment 1 Jean-Baptiste Mardelle 2023-03-15 07:36:52 UTC

Git commit 7c1936bb44b592eaf3174f559fbb97cf61cc2bcc by Jean-Baptiste Mardelle.
Committed on 13/03/2023 at 11:54.
Pushed by mardelle into branch 'release/23.04'.

Add support for whisper speech recogition engine for automated subtitling

M  +3    -0    data/scripts/CMakeLists.txt
A  +14   -0    data/scripts/checkgpu.py
A  +65   -0    data/scripts/whispertosrt.py
A  +63   -0    data/scripts/whispertotext.py
M  +2    -0    src/core.h
M  +79   -2    src/dialogs/kdenlivesettingsdialog.cpp
M  +1    -1    src/dialogs/kdenlivesettingsdialog.h
M  +87   -27   src/dialogs/speechdialog.cpp
M  +1    -0    src/dialogs/speechdialog.h
M  +196  -36   src/dialogs/textbasededit.cpp
M  +4    -0    src/dialogs/textbasededit.h
M  +20   -0    src/kdenlivesettings.kcfg
M  +23   -1    src/pythoninterfaces/abstractpythoninterface.cpp
M  +4    -2    src/pythoninterfaces/abstractpythoninterface.h
M  +163  -6    src/pythoninterfaces/speechtotext.cpp
M  +9    -3    src/pythoninterfaces/speechtotext.h
M  +204  -92   src/ui/configspeech_ui.ui
M  +104  -58   src/ui/speechdialog_ui.ui
M  +24   -2    src/ui/textbasededit_ui.ui

https://invent.kde.org/multimedia/kdenlive/commit/7c1936bb44b592eaf3174f559fbb97cf61cc2bcc