Summary: | ID3 tags with ISO 8859-1 characters wrongly encoded | ||
---|---|---|---|
Product: | [Frameworks and Libraries] taglib | Reporter: | Karl Ove Hufthammer <karl> |
Component: | general | Assignee: | Scott Wheeler <wheeler> |
Status: | RESOLVED DOWNSTREAM | ||
Severity: | normal | CC: | 123kash, danilo.luvizotto, lalinsky, myriam, rdieter |
Priority: | NOR | ||
Version: | 1.8 | ||
Target Milestone: | --- | ||
Platform: | openSUSE | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
MP3 file with no ID3 tags
MP3 file with wrongly encoded ID3 tags MP3 file with some wrongly and some correctly encoded ID3 tags |
Description
Karl Ove Hufthammer
2012-12-09 11:12:09 UTC
Created attachment 75742 [details]
MP3 file with no ID3 tags
Created attachment 75743 [details]
MP3 file with wrongly encoded ID3 tags
Created attachment 75744 [details]
MP3 file with some wrongly and some correctly encoded ID3 tags
Tough TargetMilestone is 2.7, this problem still exist in Amarok 2.7.0. I believe the importance of this bug is critical. A collection imported with rightly encoded tags will have them corrupted by Amarok when it adds it's own custom tags, like "FMPS_Rating_Amarok_Score0.5" or album arts. In other words, Amarok can corrupt the tags of entire collections (which can contain hundreds of files - hard to fix). The user will not know the tags are corrupted because they will be correct in the sql database but not in the file itself. If the collection is "fully rescanned", then Amarok will re-read the tags from the files and show them corrupted to the user. By the way, very good bug description, Karl. Guys, why do you insist on using an obsolete encoding system? One can very easily retag all files to Unicode with kid3 or easytag, and probably other mass tagger as well. This is not in fact an Amarok bug but a wish because we do not support ISO encoding at all, so not implemented. And this is not a regression since it never worked in Amarok 2,x, and not a testcase as it is not a bug, sorry. Gah, I should have read correctly, sorry. it is a bug, but still not a regression as this never worked otherwise in Amarok 2.x Danilo:not, this is not critical at all, please read up the definition of what is a critical bug. Myriam, you have misunderstood. This is not a wishlist; it’s a bug. I hoped my bug report was clear, as I even gave an easy to reproduce test case, but I’ll try to make it even clearer: Amarok incorrectly writes ID3 tags. Amarok writes all tags as UTF-8 (which is great), but says that they’re encoded as ISO 8859-1 *iff* they potentially could be represented as ISO 8859-1. In other words, the actual encoding and the text encoding description byte differ. This is clearly a bug. The solution is easy: Correctly set the text encoding description byte to 03 when saving the files. We do not not insist on using an obsolete encoding. There is no option for this in Amarok, so I don’t understand your accusation. This is *not* a wish about support for any ISO encoding. I would think it wonderful if Amarok correctly saved ID3 tags only as UTF-8 (or UTF-16). Unfortunately, it sometimes doesn’t. This is a bug. Also, this is a clearly regression, since this bug didn’t occur in earlier Amarok version (i.e., Amarok 2.5.x, I believe). This could likely a taglib issue, not an amarok one. for those experiencing any "regression-like" behavior, did the version of taglib vary between working and non-working test environments? And, it would help to mention what version of taglib you have installed currently. I spent some hours trying to find this bug in amarok source code but found no problems, I agree it may be a taglib problem. I'll post the version I'm using as soon as I get home. Rex, the version of taglib may very well have varied. I now use the latest official openSUSE versions of the packages, i.e., Amarok 2.6.0 and Taglib 1.8, and the bug is at least present in these versions. (The following is not important for the actual bug:) BTW, the bug manifests itself in a slightly different form than described in my initial comment. When setting the tags to ‘TitleABCÆØÅ’, the wrong encoding description byte is used, but the tags are also *shown* ‘wrongly’ (or really correctly, as by the ID3 standard), as UTF-8 interpreted as ISO 8859-1 in the playlist, i.e. with garbled characters. So Amarok is now correctly *reading* the (wrongly encoded) file, making the bug easier to spot. (The reason from the different behaviour might perhaps be because the file is not in my collection, so the tags are read from the file instead of from the collection DB?) I have also now tested this on a Kubuntu live CD with Amarok 2.4.0 and Taglib 1.6.3, and the bug was present even back then. (This is surprising, as I didn’t notice the problem before Amarok 2.6.0.) BTW, here the behaviour was as in my initial report, i.e., the characters did *not* appear garbled in the playlist. I have tried some googling. Could there perhaps be a missing TagLib::ID3v2::FrameFactory::instance()->setDefaultTextEncoding(TagLib::String::UTF8); in Amarok? Source: https://mail.gnome.org/archives/rhythmbox-devel/2006-June/msg00137.html (and others) Looks like this bug was actually fixed in 2005 https://bugs.kde.org/show_bug.cgi?id=111246 but the fix seems to have been lost in the meantime (in major code changes). Today I spent some hours (again) trying to debug this. I also download the latest taglib source code from http://taglib.github.com/releases/taglib-1.8.tar.gz . After analyzing the code of both Amarok and taglib I couldn't find anything wrong. So I self-compiled the taglib sources I downloaded and installed it. Now this bug doesn't manifest for me anymore. After some more research, I found out I was using taglib 1.8 from packman (my system is running opensuse Tumbleweed). So I downloaded the sources packman used (packman.links2linux.org/downloadsource/362876/taglib-1.8-54.2.src.rpm) and found the problem: a patch named "taglib-1.8-ds-rusxmms-r2.patch" which packman apply to taglib sources. So this bug is not an Amarok or taglib bug. It's a bug in the modified sources packman uses. Thank you everyone for your help! One more comment: taglib from original opensuse repo has this bug also. Only self-compiled taglib works fine. Good to know, did you report this to Opensuse? Then please provide a link to the bug here. openSUSE Tumbleweed doesn't have a bug tracker, as it is a rolling release. I don't have a 12.2 installation, but 12.3 will be release 10 days from now and then my packages will be the same as that release. When that happen, I'll be able to re-test and report a bug to the 12.3 bug tracker. Just setting the product right, waiting for a bug link, then. OpenSUSE was for a while (not sure if they still are) applying the RusXMMS patches to TagLib which cause this: http://lists.opensuse.org/opensuse-bugs/2012-11/msg00539.html Unfortunately, they think the bug is fixed and continue to use that patch: https://bugzilla.novell.com/show_bug.cgi?id=780256 |