502530 – When trying to use the transcribe feature I get an error and no subtitles

Bug 502530 - When trying to use the transcribe feature I get an error and no subtitles

Summary: When trying to use the transcribe feature I get an error and no subtitles

Status:	REPORTED

Alias:	None

Product:	kdenlive
Classification:	Applications
Component:	Title Clips & Subtitles (other bugs)
Version First Reported In:	24.12.3
Platform:	Homebrew (macOS) macOS

Importance:	NOR normal
Target Milestone:	---
Assignee:	Jean-Baptiste Mardelle

URL:
Keywords:

Depends on:
Blocks:

Reported:	2025-04-07 17:39 UTC by roy432002.rd
Modified:	2025-04-07 17:43 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description roy432002.rd 2025-04-07 17:39:25 UTC

SUMMARY
I wanted to use the transcribe feature to add subtitles to my home movie, but I keep getting "No speech detected" and an error when I press "Show log"
I had to download the whisper model myself since the downloader from Kdenlive seems to be stuck; I don't know if this might be relevant.

STEPS TO REPRODUCE
1. Go to a sequence
2. Select a clip in it
3. Press "Transcribe" in the "Speech Editor" menu

OBSERVED RESULT
"No speech detected" and the following error:

/Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py:75: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(fp, map_location=device)
Traceback (most recent call last):
  File "/Users/<My Username>/Library/Application Support/kdenlive/venv/lib/python3.9/site-packages/whisper/audio.py", line 58, in load_audio
    out = run(cmd, capture_output=True, check=True).stdout
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ffmpeg', '-nostdin', '-threads', '0', '-i', '/private/var/folders/_t/3_t8tnnx3cb0j7bdgw3hsdd40000gn/T/kdenlive-ZcKKVn.wav', '-f', 's16le', '-ac', '1', '-acodec', 'pcm_s16le', '-ar', '16000', '-']' returned non-zero exit status 183.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py", line 176, in <module>
    sys.exit(main())
  File "/Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py", line 158, in main
    result = run_whisper(source, model, device, task, language)
  File "/Applications/kdenlive.app/Contents/Resources/scripts/whisper/whispertotext.py", line 140, in run_whisper
    result = loadedModel.transcribe(source, **transcribe_kwargs)
  File "/Users/<My Username>/Library/Application Support/kdenlive/venv/lib/python3.9/site-packages/whisper/transcribe.py", line 133, in transcribe
    mel = log_mel_spectrogram(audio, model.dims.n_mels, padding=N_SAMPLES)
  File "/Users/<My Username>/Library/Application Support/kdenlive/venv/lib/python3.9/site-packages/whisper/audio.py", line 140, in log_mel_spectrogram
    audio = load_audio(audio)
  File "/Users/<My Username>/Library/Application Support/kdenlive/venv/lib/python3.9/site-packages/whisper/audio.py", line 60, in load_audio
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 7.1 Copyright (c) 2000-2024 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.3.9.4)
  configuration: --enable-libmp3lame --cc=/usr/bin/clang --cxx=/usr/bin/clang++ --enable-libopus --enable-libvorbis --enable-libvpx --enable-libass --enable-libaom --enable-libdav1d --enable-libzimg --arch=arm64 --disable-debug --disable-doc --enable-gpl --enable-version3 --enable-nonfree --enable-openssl --disable-xlib --disable-libxcb --enable-libx264 --enable-libx265 --enable-rpath --install-name-dir='@rpath' --prefix=/Users/gitlab/ws/builds/GZwHuM5x/0/sysadmin/ci-management/macos-arm-clang --libdir=/Users/gitlab/ws/builds/GZwHuM5x/0/sysadmin/ci-management/macos-arm-clang/lib --disable-static --enable-shared
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.100 / 61. 19.100
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
[in#0 @ 0x60000313c200] Error opening input: Invalid data found when processing input
Error opening input file /private/var/folders/_t/3_t8tnnx3cb0j7bdgw3hsdd40000gn/T/kdenlive-ZcKKVn.wav.
Error opening input files: Invalid data found when processing input

EXPECTED RESULT
Some subtitles for my clip

ADDITIONAL INFORMATION
Even though I have downloaded the model manually, I did pass the "Check model integrity" test in the "Manage models" menu, and I have "Check Configuration" and have updated dependencies, all in Kdenlive directly

Comment 1 roy432002.rd 2025-04-07 17:43:25 UTC

Some more information on my system:

Kdenlive: 24.12.3
Package Type: Unknown/Default
MLT: 7.30.0
Qt: 6.8.1 (built against 6.8.1 arm64-little_endian-lp64)
Frameworks: 6.11.0
System: macOS Sequoia (15.3)
Kernel: darwin 24.3.0
CPU: arm64
Windowing System: cocoa
GPU: 
Movit (GPU): disabled
Track Compositing: qtblend