Bug 512987

Summary: No Hardware Acceleration with Nvidia nvenc
Product: [Applications] kdenlive Reporter: kido <latlon>
Component: Rendering & ExportAssignee: Jean-Baptiste Mardelle <jb>
Status: RESOLVED NOT A BUG    
Severity: normal CC: berndmj
Priority: NOR Keywords: triaged
Version First Reported In: 25.08.3   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: screenshots

Description kido 2025-12-05 17:40:03 UTC
Created attachment 187367 [details]
screenshots

SUMMARY

When rendering a project, Kdenlive uses little CPU and no GPU. CPU load is around 15%, and GPU load is 1-5%.
A test video with a duration of 1:25 is encoded at an average speed of 6 frames/sec. in 6 min. 57 sec.
When compiling the project, I tried using the following presets with hardware acceleration:

    1. NVENC H264 ABR (ab=160k acodec=aac channels=2 f=mp4 real_time=-1 threads=0 vb=6000k vcodec=h264_nvenc) 
    2. NVENC H265 ABR (ab=160k acodec=aac channels=2 f=mp4 real_time=-1 threads=0 vb=6000k vcodec=hevc_nvenc) 
    3. NVENC AV1 VBR (ab=160k acodec=aac channels=2 f=mp4 rc=constqp real_time=-1 threads=0 vcodec=av1_nvenc vq=72 vqp=72) 

But if I run the command:

ffmpeg -hwaccel cuda -i 194.mp4 -c:v h264_nvenc -vf "scale=2560:1440" -b:v 10M output.mp4

The same test video is encoded in 11 seconds, with the GPU loaded at 100%.

STEPS TO REPRODUCE
1. Start rendering a project with nvenc preset

OBSERVED RESULT
Very slow rendering without hw acceleration on GPU

EXPECTED RESULT
Rendering with hw acceleration on GPU

SOFTWARE/OS VERSIONS
    • openSUSE Tumbleweed 20251127 
    • KDE Plasma: 6.5.3 
    • KDE Frameworks: 6.20.0 
    • Qt: 6.10.1 
    • Kernel: 6.12.59-1-longterm (64-bit) 
    • Wayland 
    • CPU: AMD Ryzen 7 3700X 8-Core Processor 
    • RAM: 32 Gb 
    • GPU: NVIDIA GeForce RTX 5060 Ti 16Gb (driver 580.95.05)
Comment 1 Bernd 2025-12-07 19:09:53 UTC
Thank you for your report.

Please note that rendering your project involves two steps: 1) Applying all effects, compositions and transitions; 2) Encoding the rendered frames using the selected encoder and adding it into the selected container.

Step 1) is done exclusively by MLT, the underlying framework for all the compositing and filtering, which does not utilize the GPU due to unresolved issues between MLT, movit (a library for GPU acceleration), and Kdenlive. Discussions and work is ongoing but due to the small team size progress is slower than we would like.

Step 2) is the only one where GPU acceleration is possible by selecting NVENC or VAAPI profiles. But this is only the smallest portion of the rendering, so the GPU is mostly idle during the Kdenlive rendering process.
Comment 2 kido 2025-12-08 19:38:36 UTC
(In reply to Bernd from comment #1)
> Thank you for your report.
> 
> Please note that rendering your project involves two steps: 1) Applying all
> effects, compositions and transitions; 2) Encoding the rendered frames using
> the selected encoder and adding it into the selected container.
> 
> Step 1) is done exclusively by MLT, the underlying framework for all the
> compositing and filtering, which does not utilize the GPU due to unresolved
> issues between MLT, movit (a library for GPU acceleration), and Kdenlive.
> Discussions and work is ongoing but due to the small team size progress is
> slower than we would like.
> 
> Step 2) is the only one where GPU acceleration is possible by selecting
> NVENC or VAAPI profiles. But this is only the smallest portion of the
> rendering, so the GPU is mostly idle during the Kdenlive rendering process.

Hi Bernd! Thank you for your work, first of all!
Now I understand why Kdenlive does not use the GPU constantly during rendering.
However, I conducted the following experiment:
I took a test video in 4k and in the first case I transcoded it with the command:
ffmpeg -hwaccel cuda -i 194.mp4 -c:v h264_nvenc -vf “scale=2560:1440” -b:v 12M output.mp4 
It took about 11 seconds, with the GPU loaded at 100%.
In the second case, I did the same thing using Kdenlive. No filters, effects, or compositing. Only rescale to 1440p with the parameters ab=160k acodec=aac channels=2 f=mp4 real_time=-1 threads=0 vb=12000k vcodec=h264_nvenc.
Result: encoding time 1 minute 25 seconds, CPU load ~17%, GPU load ~25-35%.
The difference in encoding time is 8 times in favor of ffmpeg!
Why doesn't Kdenlive use the GPU to its full capacity in this case? Or if it can't use the GPU, why doesn't it use the CPU to its full capacity?
Comment 3 Bernd 2025-12-09 14:56:31 UTC
>Why doesn't Kdenlive use the GPU to its full capacity in this case? Or if it can't use the GPU, why doesn't it use the CPU to its full capacity?

I am guessing here, but MLT doesn't "know" there are no effects or compositions and just touches every frame looking for something to do. That still takes time. And because parallel processing of frames can lead to artifacts (frames may need to "know" about previous ones and following) it cannot be applied throughout. So once MLT has processed the frame and sends it off to the GPU for encoding enough time has passed for the GPU to have finished previous tasks and been sitting idle.

Your examples just illustrate that Kdenlive/MLT is missing a step where it determines what is to be done, and if it's just a rescale and encode stepping out of the way and have ffmpeg do its job. But how often is that really the case? in 99.999% of the cases a project has at least several cuts and more than one clip.

Work is underway to make better use of GPUs during editing, playback, and rendering. But resources are few and other tasks are many ...
Comment 4 kido 2025-12-10 02:22:23 UTC
(In reply to Bernd from comment #3)
> >Why doesn't Kdenlive use the GPU to its full capacity in this case? Or if it can't use the GPU, why doesn't it use the CPU to its full capacity?
> 
> I am guessing here, but MLT doesn't "know" there are no effects or
> compositions and just touches every frame looking for something to do. That
> still takes time. And because parallel processing of frames can lead to
> artifacts (frames may need to "know" about previous ones and following) it
> cannot be applied throughout. So once MLT has processed the frame and sends
> it off to the GPU for encoding enough time has passed for the GPU to have
> finished previous tasks and been sitting idle.
> 
> Your examples just illustrate that Kdenlive/MLT is missing a step where it
> determines what is to be done, and if it's just a rescale and encode
> stepping out of the way and have ffmpeg do its job. But how often is that
> really the case? in 99.999% of the cases a project has at least several cuts
> and more than one clip.

Thanks for the detailed answer. Now I understand a little better how Kdenlive works.

> 
> Work is underway to make better use of GPUs during editing, playback, and
> rendering. But resources are few and other tasks are many ...

Thank you for your work. I wish you success in achieving your goals.