SUMMARY I have a 4060 TI Nvidia GPU card I am using to test Kdenlive's AI mask rendering plugin. The 4060 has 8GBs of RAM. The plugin detects the card correctly and CUDA works well on Krita and Blender. On Kdenlive, however, the mask generating process crashes due to "lack of memory" trying to allocate 48 MBs. STEPS TO REPRODUCE 1. Set card as Device in Configure > Plugins > Object Detextion 2. Use a clip with a moving object in the foreground 3. Select object with a rectangle around it 4. When the masked object is highlighted, click "generate mask" OBSERVED RESULT The processes crashes with the following error(s): --- Traceback (most recent call last): File "/app/share/kdenlive/scripts/automask/sam-objectmask.py", line 209, in <module> for out_frame_idx, out_obj_ids, out_mask_logits in predictor.propagate_in_video(inference_state): File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 57, in generator_context response = gen.send(request) ^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/sam2_video_predictor.py", line 603, in propagate_in_video current_out, pred_masks = self._run_single_frame_inference( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/sam2_video_predictor.py", line 758, in _run_single_frame_inference ) = self._get_image_feature(inference_state, frame_idx, batch_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/sam2_video_predictor.py", line 714, in _get_image_feature backbone_out = self.forward_image(image) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/modeling/sam2_base.py", line 469, in forward_image backbone_out = self.image_encoder(img_batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/modeling/backbones/image_encoder.py", line 31, in forward features, pos = self.neck(self.trunk(sample)) ^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/modeling/backbones/hieradet.py", line 292, in forward x = blk(x) ^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/modeling/backbones/hieradet.py", line 165, in forward x = x + self.drop_path(self.mlp(self.norm2(x))) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/sam2/modeling/sam2_utils.py", line 133, in forward x = self.act(layer(x)) if i < self.num_layers - 1 else layer(x) ^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/paul/.var/app/org.kde.kdenlive/data/kdenlive/venv-sam/lib/python3.11/site-packages/torch/nn/modules/activation.py", line 734, in forward return F.gelu(input, approximate=self.approximate) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 48.00 MiB. GPU 0 has a total capacity of 7.63 GiB of which 44.06 MiB is free. Process 105157 has 7.57 GiB memory in use. Of the allocated memory 7.26 GiB is allocated by PyTorch, and 155.32 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) --- EXPECTED RESULT The masked object should be generated for all frames SOFTWARE/OS VERSIONS Operating System: Arch Linux KDE Plasma Version: 6.2.5 KDE Frameworks Version: 6.10.0 Qt Version: 6.8.2 Kernel Version: 6.13.1-arch1-1 (64-bit) Graphics Platform: Wayland
I can reproduce the crash but can't generate any log because Kdenlive freezes when run with gdb rather than crashing.
There is a memory issue with SAM2 when trying to process a video longer than a few seconds. How long is the zone you are trying to apply the mask ? Does it work it you try to create a mask for like 10-20 frames ?
With a few frames it does work but with something like 2 seconds I get a freeze, crash or error: Resize Array, COLS: 1 NumPy Array: {0: array([[2239, 1293]])} NumPy Array: {0: array([1])} using device: cuda:0 Traceback (most recent call last): File "/usr/share/kdenlive/scripts/automask/sam-objectmask.py", line 104, in <module> sam2_model = build_sam2(model_cfg, sam2_checkpoint, device=device) File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/sam2/build_sam.py", line 94, in build_sam2 model = model.to(device) File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1343, in to return self._apply(convert) ~~~~~~~~~~~^^^^^^^^^ File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 903, in _apply module._apply(fn) ~~~~~~~~~~~~~^^^^ File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 903, in _apply module._apply(fn) ~~~~~~~~~~~~~^^^^ File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 903, in _apply module._apply(fn) ~~~~~~~~~~~~~^^^^ [Previous line repeated 4 more times] File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 930, in _apply param_applied = fn(param) File "/home/farid/.local/share/kdenlive/venv-sam/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1329, in convert return t.to( ~~~~^ device, ^^^^^^^ dtype if t.is_floating_point() or t.is_complex() else None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ non_blocking, ^^^^^^^^^^^^^ ) ^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 5.76 GiB of which 9.31 MiB is free. Including non-PyTorch memory, this process has 286.00 MiB memory in use. Process 32606 has 296.00 MiB memory in use. Process 32629 has 298.00 MiB memory in use. Process 32651 has 358.00 MiB memory in use. Process 32676 has 738.00 MiB memory in use. Process 32697 has 1020.00 MiB memory in use. Process 32715 has 1020.00 MiB memory in use. Process 32734 has 1.12 GiB memory in use. Process 32758 has 698.00 MiB memory in use. Of the allocated memory 188.77 MiB is allocated by PyTorch, and 5.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
(In reply to Jean-Baptiste Mardelle from comment #2) > There is a memory issue with SAM2 when trying to process a video longer than > a few seconds. > How long is the zone you are trying to apply the mask ? 22 seconds. > Does it work it you > try to create a mask for like 10-20 frames ? Yes. I cut the video down to 5 seconds (125 frames) and it renders the mask no problem.
I just push a rather large update to the object segmentation module. In Kdenlive Settings > Plugins > Object Detection, there is now a checkbox "Offload video to CPU to save GPU Memory". This causes SAM2 to use the RAM instead of the VRAM, which should allow you to create longer masks. For me, on GPU with 12Gb, I could create a mask with a maximum of about 300 frames in Full HD. With the offload option, I can go up to 700 frames (on a 32Gb RAM system). Please check and let me know if it improved things for you. I also improved user feedback during the process and if the process crashes, you should be able so see a log.
Don't have a crash anymore after the changes. Thanks JB
(In reply to Jean-Baptiste Mardelle from comment #5) > I just push a rather large update to the object segmentation module. In > Kdenlive Settings > Plugins > Object Detection, there is now a checkbox > "Offload video to CPU to save GPU Memory". This causes SAM2 to use the RAM > instead of the VRAM, which should allow you to create longer masks. For me, > on GPU with 12Gb, I could create a mask with a maximum of about 300 frames > in Full HD. With the offload option, I can go up to 700 frames (on a 32Gb > RAM system). Please check and let me know if it improved things for you. > > I also improved user feedback during the process and if the process crashes, > you should be able so see a log. Works for me too. Thanks!