Bug 222912 - Support manual charset recoding of tag values from the "Edit Track Details" dialog
Summary: Support manual charset recoding of tag values from the "Edit Track Details" d...
Status: RESOLVED FIXED
Alias: None
Product: amarok
Classification: Applications
Component: general (show other bugs)
Version: 2.2.2
Platform: Ubuntu Linux
: NOR wishlist
Target Milestone: ---
Assignee: Amarok Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-15 23:34 UTC by bugs-kde
Modified: 2010-01-18 17:05 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Pic (20.38 KB, image/png)
2010-01-18 17:05 UTC, Jeff Mitchell
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bugs-kde 2010-01-15 23:34:45 UTC
Version:           2.2.0 (using KDE 4.3.4)
OS:                Linux
Installed from:    Ubuntu Packages

This is a follow-up to bug 200596.

Since the tag charset autodetection has been shown to be unreliable, I propose a hybrid solution to the problem that would satisfy most of non-latin users.

Suppose that in the "Edit Track Details" dialog, on the "Tags" panel, there's a menu button placed next to each editable text box.

Clicking that button reveals a "recode from charset:" submenu.

The submenu lists all kinds of encodings (like in Firefox's "View"->"Character
Encoding" submenu).

Choosing an encoding performs a recode operation on the corresponding tag value.

After that,  the user can close the dialog using "Save & Close" or "Cancel" depending on whether he is
satisfied with the result or not.

This way, you get the basic tag charset handling functionality (which is currently non existent in Amarok and most open source media players BTW).

Note that you can gradually add more intelligence based on that:

1) By reusing the charset detector code (I suppose), you can supply a list of
the most probable encodings based on the original bytes string and promote
these encodings to the immediately highest submenu level - while all the other
encodings would be buried deeper in a Firefox-style "More Encodings"
sub-submenu.
2) I suppose you can plug in some neural network or a simple dynamic rules
engine that would learn the user's previous encoding choices and promote the
most probable encodings in the menu structure during following invocations.

Not being a QT/KDE developer, I cannot assess how hard would that be but it
sure sounds doable to me.
Comment 1 Myriam Schweingruber 2010-01-15 23:57:14 UTC
I am not sure this is still useful, since there were a lot of changes since 2.2.0, which seems to be the version you are using. 
I strongly suggest you upgrade to Amarok 2.2.2 and check again.
Comment 2 bugs-kde 2010-01-17 01:46:32 UTC
I'm running Amarok 2.2.2 and I have problems with Cyrillic tags in MP3s - e.g. try these:

http://olo.org.pl/files/Acropolis_Demo/
Comment 3 Myriam Schweingruber 2010-01-17 13:32:46 UTC
Well, the only thing that seems to be correctly encoded is the file name, the tags seem to use a different one, I can't read the name tags nor the lyrics with neither eyed3, kid3 nor easytag, and all my system is UTF-8.
Please check that you are using the same encoding everywhere, preferably UTF-8 or UTF-16
Comment 4 bugs-kde 2010-01-17 21:17:47 UTC
That is the problem this enhancement is intended to solve: not having to use a dedicated tool (like EasyTag) to recode the tags.

In this case, the tags are encoded using Windows-1251.

Knowing that, I'd like to be able to perform the operation from within Amarok.
Comment 5 bugs-kde 2010-01-17 21:28:01 UTC
Also, after recoding the tags to UTF-8 using EasyTag, they display fine for most MP3s, with the exception of the first one - for some reason Amarok 2.2.2 displays garbage in the title despite it being correct UTF-8 (e.g. the QuodLibet  player displays the title correctly).

Here's that MP3 with tag recoded to UTF-8:
http://olo.org.pl/files/Acropolis_Demo/utf-8/
Comment 6 Myriam Schweingruber 2010-01-17 23:38:43 UTC
I can't reproduce this. I retagged myself the 4 tracks you linked to earlier to UTF-8, using easytag, all my system is in UTF-8. After an update and an Amarok restart the tags show the characters correctly.
Using Amarok 2.2.3-git (the development build of a few minutes ago), Kubuntu 9.10, KDE SC 4.4 RC1.
As for your proposition: this should go to either a separate wish or to the mailing list amarok@kde.org, but keep in mind that Amarok is first of all a music player, it's highly unlikely it will become a mass tagger, there are enough tools for that available already.
Please check your LOCALE settings, those need to be *all* set to UTF-8.
Comment 7 Jeff Mitchell 2010-01-17 23:55:37 UTC
(In reply to comment #5)
> Also, after recoding the tags to UTF-8 using EasyTag, they display fine for
> most MP3s, with the exception of the first one - for some reason Amarok 2.2.2
> displays garbage in the title despite it being correct UTF-8 (e.g. the
> QuodLibet  player displays the title correctly).

Make sure the charset detector is turned off. Settings -> Collection. It's off by default in 2.2.2 but if you toggled it on it could definitely cause that problem (which is why it's now off by default).
Comment 8 bugs-kde 2010-01-18 12:09:40 UTC
All locale env vars are set to UTF-8 variants AFAIR (cannot check right now, it's a home machine).

I've also specifically verified that the charset detector had been turned off before testing.

Also, the problem is with the one specific file - others are displayed correctly. That's why I've uploaded it to http://olo.org.pl/files/Acropolis_Demo/utf-8/ .
Comment 9 bugs-kde 2010-01-18 12:13:54 UTC
> As for your proposition: this should go to either a separate wish or to the
> mailing list amarok@kde.org,

This bug IS a separate wish. It even has Severity: wishlist.

I don't understand why it has been marked as RESOLVED/FIXED...

The problems related to encoding mentioned in comment #2 are purely a digression, and is seems that I should have kept them to myself since because of them the discussion has drifted away from the actual subject.
Comment 10 Jeff Mitchell 2010-01-18 17:05:39 UTC
Created attachment 40009 [details]
Pic

But you're using the comments in #2 as proof of why this feature is needed, except that the problem is something local to your machine. For me, both the original file posted and the utf-8 versions have exactly the same result: what's in the picture attached.