Created attachment 157420 [details] The log given by Kden upon trying to compute using Whisper on GPU. SUMMARY Upon trying to use my GTX 1660S GPU for Whisper voice recognition (under subtitles), PYTorch crashes and no subtitles are produced. STEPS TO REPRODUCE (requires experimental build 526 of the appImage/Version 23.07.70 (rev. b7fd236cd)) 1. Add speech to audio track 2. Go to Settings -> Configure Kdenlive -> Speech To Text and select Whisper as the required Speech Engine 3. Set the device to your GPU (GTX 1660s or other GTX 16XX series cards) (may require restart before it shows up) 4. Apply and ok the window 5. Select audio track 6. Go to Project -> Subtitles -> Speech recognition 7. Use any combination of settings in the dialog 8. Click on "show log" when it says it has finished, however the log shows that it has crashed and nothing has been added to the timeline OBSERVED RESULT No subtitles added, voice recognition crashed. EXPECTED RESULT Subtitles added, no crash. SOFTWARE/OS VERSIONS: Linux/KDE Plasma: Arch Rolling (available in About System) KDE Plasma Version: KDE Frameworks Version: Version 5.104.0 Qt Version: Version 5.15.8 (built against 5.15.8) ADDITIONAL INFORMATION Common problem on GTX 16XX series cards, options —no-half —precision=full —use-cudnn I believe are used as a workaround upon passing to PYTorch.
Searching the Web, there's a similar bug reported here against Whisper: https://github.com/openai/whisper/discussions/88 Seems like what you said about not using fp16 (half-precision) can work around the issue: maybe you can try modifying the whisper code to not use fp16 and see if that fixes it?
(In reply to erjiang from comment #1) > Searching the Web, there's a similar bug reported here against Whisper: > https://github.com/openai/whisper/discussions/88 > > Seems like what you said about not using fp16 (half-precision) can work > around the issue: maybe you can try modifying the whisper code to not use > fp16 and see if that fixes it? Changing data/scripts/whispertosrt.py: line 44: result = model.transcribe(source, task=sys.argv[5], language=sys.argv[6], verbose=False, fp16 = False line 46: result = model.transcribe(source, task=sys.argv[5], verbose=False, fp16 = False) Changing data/scripts/whispertotext.py: line 47: result = model.transcribe(source, task=sys.argv[4], language=sys.argv[5], verbose=False, fp16 = False) line 49: result = model.transcribe(source, task=sys.argv[4], verbose=False, fp16 = False) This seems to fix it, GPU usage looks good, it's very fast. It took a fair bit of figuring out with the complete lack of documentation. However, this will be slower for non-16XX GPUs, a possible improvement would be to detect if it's a 16XX GPU being considered for use in order to use the different version. If you could make this into a commit, that would be great!
A possibly relevant merge request was started @ https://invent.kde.org/multimedia/kdenlive/-/merge_requests/399
Git commit 856fdf59a631e53aa0ce94decd5d8f921c135f28 by Jean-Baptiste Mardelle. Committed on 15/05/2023 at 11:27. Pushed by mardelle into branch 'master'. Add an option to manually disable FP16 on Whisper in settings page. M +8 -3 data/scripts/whispertotext.py M +0 -4 src/dialogs/kdenlivesettingsdialog.cpp M +6 -1 src/dialogs/speechdialog.cpp M +8 -3 src/dialogs/textbasededit.cpp M +4 -0 src/kdenlivesettings.kcfg M +19 -12 src/ui/configspeech_ui.ui https://invent.kde.org/multimedia/kdenlive/commit/856fdf59a631e53aa0ce94decd5d8f921c135f28