Bug 419668 - Transliteration of ID3v1 tags to ASCII
Summary: Transliteration of ID3v1 tags to ASCII
Status: RESOLVED FIXED
Alias: None
Product: kid3
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR wishlist
Target Milestone: ---
Assignee: Urs Fleisch
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-05 10:55 UTC by Martin Mareš
Modified: 2020-05-10 07:55 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
QML script to transliterate contents of ID3v1 tags to ASCII (7.32 KB, text/x-qml)
2020-04-13 08:11 UTC, Urs Fleisch
Details
An improved version of the script which handles all extended Latin characters (33.19 KB, text/x-qml)
2020-04-13 11:59 UTC, Martin Mareš
Details
A script used to generate the transliteration tables (493 bytes, application/x-perl)
2020-04-13 11:59 UTC, Martin Mareš
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Mareš 2020-04-05 10:55:56 UTC
Hello!

First of all, I want to thank you for kid3, it's great and it saved hours of my time. With the following feature, it would be even greater :)

I have a plenty of Czech music with non-ASCII characters in names of songs and artists. These are represented faithfully in ID3v2 tags. But some music players I use still understand only ID3v1 and they usually assume a random encoding of characters with the 7th bit set.

Therefore I would like to have all ID3v1 tags in plain ASCII. However, kid3 (at least in version 3.7.0 in my Debian Buster) does not offer ASCII in the list of tag encodings. When I try to approximate ASCII by ISO-8859-1 and ask kid3 to convert ID3v2 tags to ID3v1, all characters which cannot be expressed in ISO-8859-1 are replaced by "?", even though the characters are just Latin letters with funny accents on their top.

What I wish for is adding ASCII to the list of encodings and applying simple transliteration during charset conversion, similar to the "//TRANSLIT" mode of iconv.

I will be glad to help implement it, but I would appreciate some directions.
Comment 1 Urs Fleisch 2020-04-05 17:23:39 UTC
I also try to have ID3v1 tags on all of my MP3 files and only have ASCII characters in them. The encodings which are available in the "Tags/Tag 1" tab (option "Text encoding") of the settings are those provided by QTextCodec, see https://doc.qt.io/qt-5/qtextcodec.html#details. ASCII is not listed there, but this should be no problem since a lot of the supported encodings share the first half of their characters with ASCII, so you can take ISO-8859-1 for example. To find an appropriate encoding, you could use "Text Encoding ID3v1" from the file list context menu. To reduce the characters in the chosen 8-bit encoding to ASCII, you will have to replace all characters of the second half with an appropriate ASCII sequence. I do not know Czech, but I sometimes have characters with German Umlauts, which I replace using the "String replacement" table in the "Files" tab of the settings, to do so, I have to generate the file names from the "Tag 1" tags, I then apply the file name format which uses the "String replacement" mentioned above ("Tools/Apply Filename Format") - I want the file names to contain only ASCII characters too - and then I generate the tags from the file names. This is a bit awkward, but you could use the "String replacement" table from the "Tags/All Tags" tab in the settings to have the required transliteration and then use "Tools/Apply Tag Format". The problem here will be that it will also change the "Tag 2", but you could avoid this by only checking the "Track Number" row in the "Tag 2" table (having nothing selected means all is selected), so the translitation will be applied to "Tag 1" and to the "Track Number" of "Tag 2" (which will not change anything since it contains only numbers).
Comment 2 Urs Fleisch 2020-04-13 08:11:59 UTC
Created attachment 127486 [details]
QML script to transliterate contents of ID3v1 tags to ASCII
Comment 3 Urs Fleisch 2020-04-13 08:14:31 UTC
I have attached a QML script to transliterate the contents of the ID3v1 tag to ASCII. Just store it somewhere and then go to the "User Actions" settings in Kid3 and add it with name "Tag 1 to ASCII" and command "@qml /path/to/Tag1ToAscii.qml". You can then invoke it using the file list context menu or assign it to a keyboard shortcut in the settings.
Comment 4 Urs Fleisch 2020-04-13 11:20:45 UTC
Git commit 211dfa1371737460873b4a9887b92f1ef083658f by Urs Fleisch.
Committed on 13/04/2020 at 11:18.
Pushed by ufleisch into branch 'devel'.

Script to transliterate ID3v1 tags to ASCII

A  +175  -0    src/qml/script/Tag1ToAscii.qml     [License: LGPL]

https://invent.kde.org/kde/kid3/commit/211dfa1371737460873b4a9887b92f1ef083658f
Comment 5 Urs Fleisch 2020-04-13 11:26:00 UTC
Thanks for the report. I have fixed it in Git [cf8a234c]. For a workaround with the current version, use "Open Folder" instead of "Open".
Comment 6 Urs Fleisch 2020-04-13 11:27:56 UTC
Sorry, forget "Comment 5", it was meant for another bug report.
Comment 7 Martin Mareš 2020-04-13 11:59:27 UTC
Created attachment 127493 [details]
An improved version of the script which handles all extended Latin characters
Comment 8 Martin Mareš 2020-04-13 11:59:54 UTC
Created attachment 127494 [details]
A script used to generate the transliteration tables
Comment 9 Martin Mareš 2020-04-13 12:03:33 UTC
Thanks, the script solves my problems. I am attaching a version of the script with transliteration table extended to all accented Latin characters, so it is no longer limited to iso-8859-1. I am also attaching a simple Perl script I used to generate the table using iconv.
Comment 10 Urs Fleisch 2020-04-14 07:36:43 UTC
Git commit c9e570d745b1ba1f541aa21e481bd195009cd704 by Urs Fleisch.
Committed on 14/04/2020 at 06:42.
Pushed by ufleisch into branch 'master'.

Script to transliterate ID3v1 tags to ASCII

M  +5    -0    doc/en/index.docbook
M  +4    -0    src/core/config/useractionsconfig.cpp
M  +1    -0    src/qml/CMakeLists.txt
A  +613  -0    src/qml/script/Tag1ToAscii.qml     [License: LGPL]

https://invent.kde.org/kde/kid3/commit/c9e570d745b1ba1f541aa21e481bd195009cd704
Comment 11 Urs Fleisch 2020-04-14 07:46:10 UTC
Thanks for your contribution. I have adapted it to replace all characters in these blocks and committed the script to Git.
Comment 12 Urs Fleisch 2020-05-10 07:55:29 UTC
Fixed in version 3.8.3.