Bug 456871 - When importing subtitles in ISO-8859-15 format, some characters are not displayed correctly
Summary: When importing subtitles in ISO-8859-15 format, some characters are not displ...
Status: RESOLVED FIXED
Alias: None
Product: kdenlive
Classification: Applications
Component: Rendering & Export (show other bugs)
Version: 22.04.3
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: erjiang
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-18 16:59 UTC by Silas Henrique
Modified: 2022-07-20 22:14 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
screenshot of Kdenlive with caption displayed incorrectly (426.94 KB, image/png)
2022-07-18 16:59 UTC, Silas Henrique
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Silas Henrique 2022-07-18 16:59:36 UTC
Created attachment 150711 [details]
screenshot of Kdenlive with caption displayed incorrectly

Some characters do not appear correctly in Kdenlive's "edit subtitle tool" functionality.

(Sorry if I selected the wrong component. I have no idea which one the subtitle is a part of)

When importing a subtitle in ISO-8859-15 character encoding, very common in subtitles, some characters such as ã í á and others are not displayed correctly. In its place, a question mark appears.

It is not a problem with the subtitle file as in VLC Media Player the subtitle works normally.

STEPS TO REPRODUCE
1. Click on Kdenlive's "edit subtitle tool" button to activate the subtitle track.
2. Click and drag to the track any subtitle that is encoded in ISO-8859-15 and contains special characters (A subtitle in Portuguese or Spanish for example).
3. See in the Project Monitor if the characters are displayed correctly or with question marks as in the attached image

OBSERVED RESULT
Some characters are displayed as �

EXPECTED RESULT
Characters being displayed correctly (ã, ê, é, á...)

SOFTWARE/OS VERSIONS
Windows: Not tested
macOS: Not tested
Linux: Arch Linux (rolling release)
KDE Plasma Version: 5.25.3
KDE Frameworks Version: 5.96.0
Qt Version: 5.15.5
Comment 1 Silas Henrique 2022-07-18 17:06:45 UTC
When converting the subtitle to UTF-8 the subtitle works normally and the characters are displayed correctly.

So maybe a solution is to convert at import time, or a way for the program to identify which encoding the subtitle uses (Like the Kate editor does)
Comment 2 erjiang 2022-07-19 05:27:41 UTC
Confirmed, non-UTF8 files can get mangled on import. Looks like we could add kf5codecs as a dependency and use KEncodingProber: https://api.kde.org/frameworks/kcodecs/html/classKEncodingProber.html

Since guessing the encoding is not always reliable, maybe we could guess the encoding and then show it as an editable selection on the import dialog.
Comment 3 Bug Janitor Service 2022-07-20 03:42:33 UTC
A possibly relevant merge request was started @ https://invent.kde.org/multimedia/kdenlive/-/merge_requests/328
Comment 4 Silas Henrique 2022-07-20 15:33:07 UTC
Amazing!(In reply to Bug Janitor Service from comment #3)
> A possibly relevant merge request was started @
> https://invent.kde.org/multimedia/kdenlive/-/merge_requests/328

Amazing!
Comment 5 bionickatana 2022-07-20 22:14:33 UTC
Git commit 6880a0d21222706fb0b942d3a36e4ebb56696673 by Nathan Hinton, on behalf of Eric Jiang.
Committed on 20/07/2022 at 22:14.
Pushed by bionickatana into branch 'master'.

Guess subtitle encoding before importing

Since many subtitle files are not UTF-8, we need to guess the encoding
of the file before reading it. For example, SubRip's default encoding is
Windows-1252 (according to Wikipedia).

This also adds KF5 Codecs as a dependency in order to use KEncodingProber.

Future work could be done to allow the user to select the encoding in the import dialog. Currently there is no way to manually select the encoding if it's not guessed correctly, but this should at least be an improvement over only supporting UTF-8.

M  +1    -1    CMakeLists.txt
M  +1    -1    dev-docs/build.md
M  +40   -9    src/bin/model/subtitlemodel.cpp
M  +7    -1    src/bin/model/subtitlemodel.hpp
M  +8    -5    src/timeline2/view/timelinecontroller.cpp
A  +12   -0    tests/dataset/01-iso-8859-1.srt
M  +22   -2    tests/subtitlestest.cpp

https://invent.kde.org/multimedia/kdenlive/commit/6880a0d21222706fb0b942d3a36e4ebb56696673