Bug 499563 - Kdenlive mask generation plugin crashes when using CUDA
Summary: Kdenlive mask generation plugin crashes when using CUDA
Status: RESOLVED FIXED
Alias: None
Product: kdenlive
Classification: Applications
Component: Video Effects & Transitions (other bugs)
Version First Reported In: git-master
Platform: Flatpak Linux
: NOR crash
Target Milestone: ---
Assignee: Jean-Baptiste Mardelle
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-02-05 20:41 UTC by Paul Brown
Modified: 2025-02-18 20:54 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Brown 2025-02-05 20:41:44 UTC
SUMMARY

I have a 4060 TI Nvidia GPU card I am using to test Kdenlive's AI mask rendering plugin. The 4060 has 8GBs of RAM.

The plugin detects the card correctly and CUDA works well on Krita and Blender.

On Kdenlive, however, the mask generating process crashes due to "lack of memory" trying to allocate 48 MBs.

STEPS TO REPRODUCE
1. Set card as Device in Configure > Plugins > Object Detextion
2. Use a clip with a moving object in the foreground 
3. Select object with a rectangle around it 
4. When the masked object is highlighted, click "generate mask"

OBSERVED RESULT

The processes crashes with the following error(s):

---
Traceback (most recent call last):
  File "/app/share/kdenlive/scripts/automask/sam-objectmask.py", line 209, in <module>
    for out_frame_idx, out_obj_ids, out_mask_logits in predictor.propagate_in_video(inference_state):
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 57, in generator_context
    response = gen.send(request)
               ^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/sam2_video_predictor.py", line 603, in propagate_in_video
    current_out, pred_masks = self._run_single_frame_inference(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/sam2_video_predictor.py", line 758, in _run_single_frame_inference
    ) = self._get_image_feature(inference_state, frame_idx, batch_size)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/sam2_video_predictor.py", line 714, in _get_image_feature
    backbone_out = self.forward_image(image)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/modeling/sam2_base.py", line 469, in forward_image
    backbone_out = self.image_encoder(img_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/modeling/backbones/image_encoder.py", line 31, in forward
    features, pos = self.neck(self.trunk(sample))
                              ^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/modeling/backbones/hieradet.py", line 292, in forward
    x = blk(x)
        ^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/modeling/backbones/hieradet.py", line 165, in forward
    x = x + self.drop_path(self.mlp(self.norm2(x)))
                           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/modeling/sam2_utils.py", line 133, in forward
    x = self.act(layer(x)) if i < self.num_layers - 1 else layer(x)
        ^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/activation.py", line 734, in forward
    return F.gelu(input, approximate=self.approximate)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 48.00 MiB. GPU 0 has a total capacity of 7.63 GiB of which 44.06 MiB is free. Process 105157 has 7.57 GiB memory in use. Of the allocated memory 7.26 GiB is allocated by PyTorch, and 155.32 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
---

EXPECTED RESULT

The masked object should be generated for all frames

SOFTWARE/OS VERSIONS

Operating System: Arch Linux 
KDE Plasma Version: 6.2.5
KDE Frameworks Version: 6.10.0
Qt Version: 6.8.2
Kernel Version: 6.13.1-arch1-1 (64-bit)
Graphics Platform: Wayland
Comment 1 farid 2025-02-07 01:59:58 UTC
I can reproduce the crash but can't generate any log because Kdenlive freezes when run with gdb rather than crashing.
Comment 2 Jean-Baptiste Mardelle 2025-02-07 12:53:28 UTC
There is a memory issue with SAM2 when trying to process a video longer than a few seconds. 
How long is the zone you are trying to apply the mask ? Does it work it you try to create a mask for like 10-20 frames ?
Comment 3 farid 2025-02-07 14:24:02 UTC
With a few frames it does work but with something like 2 seconds I get a freeze, crash or error:

Resize Array, COLS:
1
NumPy Array:
{0: array([[2239, 1293]])}
NumPy Array:
{0: array([1])}
using device: cuda:0
Traceback (most recent call last):
  File "/usr/share/kdenlive/scripts/automask/sam-objectmask.py", line 104, in <module>
    sam2_model = build_sam2(model_cfg, sam2_checkpoint, device=device)
  File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/sam2/build_sam.py", line 94, in build_sam2
    model = model.to(device)
  File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1343, in to
    return self._apply(convert)
           ~~~~~~~~~~~^^^^^^^^^
  File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  [Previous line repeated 4 more times]
  File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 930, in _apply
    param_applied = fn(param)
  File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1329, in convert
    return t.to(
           ~~~~^
        device,
        ^^^^^^^
        dtype if t.is_floating_point() or t.is_complex() else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        non_blocking,
        ^^^^^^^^^^^^^
    )
    ^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 5.76 GiB of which 9.31 MiB is free. Including non-PyTorch memory, this process has 286.00 MiB memory in use. Process 32606 has 296.00 MiB memory in use. Process 32629 has 298.00 MiB memory in use. Process 32651 has 358.00 MiB memory in use. Process 32676 has 738.00 MiB memory in use. Process 32697 has 1020.00 MiB memory in use. Process 32715 has 1020.00 MiB memory in use. Process 32734 has 1.12 GiB memory in use. Process 32758 has 698.00 MiB memory in use. Of the allocated memory 188.77 MiB is allocated by PyTorch, and 5.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Comment 4 Paul Brown 2025-02-07 16:57:35 UTC
(In reply to Jean-Baptiste Mardelle from comment #2)
> There is a memory issue with SAM2 when trying to process a video longer than
> a few seconds. 
> How long is the zone you are trying to apply the mask ?

22 seconds.

> Does it work it you
> try to create a mask for like 10-20 frames ?

Yes. I cut the video down to 5 seconds (125 frames) and it renders the mask no problem.
Comment 5 Jean-Baptiste Mardelle 2025-02-17 15:33:01 UTC
I just push a rather large update to the object segmentation module. In Kdenlive Settings > Plugins > Object Detection, there is now a checkbox "Offload video to CPU to save GPU Memory". This causes SAM2 to use the RAM instead of the VRAM, which should allow you to create longer masks. For me, on GPU with 12Gb, I could create a mask with a maximum of about 300 frames in Full HD. With the offload option, I can go up to 700 frames (on a 32Gb RAM system). Please check and let me know if it improved things for you.

I also improved user feedback during the process and if the process crashes, you should be able so see a log.
Comment 6 farid 2025-02-17 19:00:10 UTC
Don't have a crash anymore after the changes. Thanks JB
Comment 7 Paul Brown 2025-02-18 20:47:33 UTC
(In reply to Jean-Baptiste Mardelle from comment #5)
> I just push a rather large update to the object segmentation module. In
> Kdenlive Settings > Plugins > Object Detection, there is now a checkbox
> "Offload video to CPU to save GPU Memory". This causes SAM2 to use the RAM
> instead of the VRAM, which should allow you to create longer masks. For me,
> on GPU with 12Gb, I could create a mask with a maximum of about 300 frames
> in Full HD. With the offload option, I can go up to 700 frames (on a 32Gb
> RAM system). Please check and let me know if it improved things for you.
> 
> I also improved user feedback during the process and if the process crashes,
> you should be able so see a log.

Works for me too. Thanks!