Bug 82640

Summary: id3 tags incorrectly displayed and saved
Product: [Applications] kfile-plugins Reporter: Spiros Georgaras <sng>
Component: mp3Assignee: Scott Wheeler <wheeler>
Status: RESOLVED DUPLICATE    
Severity: normal    
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: The tag-unbreaker.cpp as changed by me

Description Spiros Georgaras 2004-06-01 17:25:35 UTC
Version:           2.4.1 (using KDE 3.2.2, SuSE)
Compiler:          gcc version 3.3 20030226 (prerelease) (SuSE Linux)
OS:                Linux (i686) release 2.4.20-4GB-athlon

When the id3 tag info is in locale text (e.g. ISO-8859-7), the tags are not displayed correctly any more. I think this started with KDE 3.2, before that everything was OK.

I have also found a similar bug report (bug 48604), where  Scott Wheeler answers:

... I've even written a small utility to convert ID3v1 tags in the system locale to valid ID3v2 tags...
http://ktown.kde.org/~wheeler/files/tag-unbreaker.cpp

I have tried it, and it seems ok with noatun, but then xmms does not display the tag (no utf-8 support)?
So that's not a solution...
Comment 1 Spiros Georgaras 2004-06-01 18:02:25 UTC
Hi again

I have just seen that the above applies to juk and konqueror in info list view. So this is something general not just noatun
Comment 2 Stefan Gehn 2004-06-01 18:30:33 UTC
sidenote: if xmms does not support utf-8 in id3v2 tags then it's broken
Comment 3 Scott Wheeler 2004-06-01 19:34:59 UTC
A few notes here:

*) TagLib does support overriding the ID3v1 text handling; it just doesn't do such by default since well, I'm not going to intentionally default to incorrect behavior.  ;-)

*) The UTF-8 thing is a bug in XMMS; see: http://bugs.xmms.org/show_bug.cgi?id=1741

*) This is a duplicate of: 78428, 65636, 63531, 79589, 77710, 81400...and probably a few more.  ;-)

I've given the explanation enough times though -- if you're interested read one of those.  ;-)

*** This bug has been marked as a duplicate of 81400 ***
Comment 4 Spiros Georgaras 2004-06-02 23:46:56 UTC
I see this is a big mess....

I have read all the messages you've pointed out, and really I would prefer beeing able to read the ISO-8859-7 (default locale) ID3v1 tags in noatun, accepting the risk of not beeing able to read other locales (e.g ISO-8859-3) for which I probably don't have the right fonts installed anyway, rather than not beeing able to read ALL the tags in the mp3s I have already on CDs (which I could read in KDE 3.1).

I should either re-burn all these CDs (no way) or stop using noatum and any other KDE's multimedia app using TagLib and start using xmms or worse go back to windows just to listen to some music. Do you see my problem?

Anyway, I have found a workaround
1. set the ID3v1 tag with id3ed (command line)
2. convert the tag to UTF-8 ID3v2 with your tag-unbreaker util (command line)
3. finally load the file in noatun and get the right results

Do you think this could be part of the gui? (say the meta file info for mp3s). It could have a locale selection box and if the user selects a non latin1 locale, the tag to be written would be UTF-8 ID3v2.

What do you think?
Comment 5 Scott Wheeler 2004-06-03 00:05:52 UTC
Well, the ID3v1 tags should already be there -- in KDE 3.1 all that KDE (i.e. Noatun) could read were ID3v1 tags.

But basically what you've suggested is what I had already decided to implement; there's no reliable way to do it without asking the user.  I don't know if I'll do it on a per file basis or as a choice in KControl, but probably the latter.

And this will be just for reading; for writing in non-ISO-8859-1 locales it will probably just write ID3v2 using UTF-8.

And again -- this isn't a TagLib limitation; TagLib specifically supports overriding the default text handling, it just defaults to ISO-8859-1 since that's what ID3v1 tags are *supposed* to contain.  Unfortunately this has been broken lots of places and not even with any regular pattern there.  So almost no matter what you choose you'll have some files that aren't shown correctly -- i.e. if you set ISO-8859-7 and have ISO-8859-1.  You also may have problems if you have some using CP 1523, which the Greek version of Windows defaults to.

So, yes, it's a mess and we'll sort it out as best as we can for 3.3, but every solution is a bit ugly.  :-)
Comment 6 Spiros Georgaras 2004-06-03 09:24:35 UTC
Yes this is even better!!!
A global setting in KControl would be the best thing to do...

By the way, in KDE 3.1 I could even read mp3s with tags written in windows (created before starting using linux) correctly, even though the locale for windows is CP 1523. Strange? I don't think there's much difference between ISO-8859-7 and CP 1523.

And yes the whole thing is not a TagLib limitation - your tag-unbreaker would not work if it was...

>> So, yes, it's a mess and we'll sort it out as best as we can for 3.3, but every solution is a bit ugly.  :-) 
You could not be more correct here, but backwards compatibility is an important issue... :-)

So we'll just have to wait for KDE 3.3 :-)
Comment 7 Scott Wheeler 2004-06-03 09:35:23 UTC
> I don't think there's much difference between ISO-8859-7 and CP 1523.

No, there's not.  The differences are listed here:

http://www.cs.tut.fi/~jkorpela/unicode/greek.html

I don't think there's much difference between ISO-8859-7 and CP 1523.

But still if you ran across one of those characters it would look like a bug.  
And some locales aren't so lucky.  As I recall the character sets used in 
Windows vs. Linux in Russia are very different.

Comment 8 Spiros Georgaras 2004-06-03 12:41:36 UTC
Created attachment 6236 [details]
The tag-unbreaker.cpp as changed by me
Comment 9 Spiros Georgaras 2004-06-03 12:43:43 UTC
Yes Scott I have seen the differences in action... :-(

I have just found out that you are the developer of TagLib :-)

So I think I should tell you about another thing I've noticed

Up until now I have been using ID3v1 for all my mp3s and ID3v2 for those with a title longer than 30 chars.

After this discussion, when I want to write an ID3v2 tag, (title longer than 30 chars) i do this

1. insert the tag with: id3v2 --TIT "What ever the title is for the file" file.mp3 (command line)
2. convert the tag to UTF-8 with your tag-unkreaker (I have changed it to change the ID3v2-->IDEv2-UTF-8 instead of the original ID3v1-->ID3v2-UTF-8 - I have attached the file)

In this case I noticed that the TIT2 field (maybe other fileds too, haven't checked) is trancated to 30 chars as in ID3v1. Can this be changed so I can insert the whole title?

Maybe this is relevant - from id3v2.4.0-structure.txt - section: 3.2. Extended header
====================================================
    r - Text fields size restrictions

       00   No restrictions
       01   No string is longer than 1024 characters.
       10   No string is longer than 128 characters.
       11   No string is longer than 30 characters.

       Note that nothing is said about how many bytes is used to
       represent those characters, since it is encoding dependent. If a
       text frame consists of more than one string, the sum of the
       strungs is restricted as stated.
======================================================
Comment 10 Scott Wheeler 2004-06-03 14:28:46 UTC
Well, the "tagunbreaker" is just for writing ID3v2 tags from ID3v1 tags; it ignores the ID3v2 tags created by the id3v2 tool.  I don't know if it's possible with the ID3v2 tool to write UTF-16 directly, but I know that it just writes ID3v2.3 (instead of ID3v2.4 which TagLib uses) which doesn't support UTF-8, but does support UTF-16.

However you can directly write UTF-8 tags using the tagwriter in taglib/examples with this change:

-      TagLib::String value = argv[i + 1];
+      TagLib::String value(argv[i + 1], TagLib::String::UTF8);

Also JuK writes UTF-8 ID3v2 tags by default, so you could retag things with it.