Version: 1.4 (using KDE KDE 3.3.2) Installed from: Debian stable Packages OS: Linux If ID3v2 tag already holds UTF-8 string, toCString(true) does unnecessary conversion. This breaks applications using toCString(true) or TStringToQString (and expecting UTF-8 string). To reproduce: Set tag to say "ž" (2 bytes), toCString(true) will return "ž" (4 bytes). (toCString(false) will return corrent UTF-8 string)) {amarok which uses TStringToQString will show invalid tag}.
I don't quite understand what you're describing -- if an ID3v2 tag contains data in UTF-8 format then it is converted to UTF-16 when the tag is read. If you then use toCString(true) it is then converted back to UTF-8 on the way out. Possibly what you're seeing is something that has UTF-8 data in the tag, but it's appropriately marked as being UTF-8 (i.e. probably marked as using ISO8859-1) and then the conversion functions don't work properly. If you send me one of the files that you're having problems with by email (using the filename 111232.mp3) I'll confirm.
Oh, also forgot to note that JuK uses the same conversion functions and I regularly have "extended" characters in my tags...
Created attachment 12306 [details] Test code Running this code (with mp3 file arg.) results in: input='ž' (2 bytes) title(true)='ž' (4 bytes) // this, I think, is wrong title(false)='ž' (2 bytes)
I've "stripped" tested mp3 file with: id3v2 -D <mp3> However, if somehow encoding information is preserved, maybe taglib should overwrite it when setting tag then?
Your code is wrong though -- you're just using the default (implicit) constructor for a TagLib::String, which assumes that the data is encoded in ISO-8859-1. If you switch the line: fs.tag()->setTitle(utf8); to: fs.tag()->setTitle(TagLib::String(utf8, TagLib::String::UTF8)); Then it should work fine. Also note that if you're setting the information with the "id3v2" command line tool then it doesn't accept UTF-8 input. It uses id3lib, which is limited to ID3v2.3 which in turn is limited to ISO-8859-1 and UTF-16. In a nutshell you're writing invalid tags -- TagLib just gives them back to you that way. :-)
Created attachment 12307 [details] Upd test I've changed it, but result now are: input='ž' (2 bytes) taglib str(true)='ž' (2 bytes) taglib str(false)='~' (1 bytes) title(true)='~' (1 bytes) title(false)='~' (1 bytes) I'm actually only reading tags with taglib, however amarok fails to set them, and it seems it suffers from the same utf8 problem. They use: #define strip( x ) TStringToQString( x ).stripWhiteSpace() m_title = strip( tag->title() ); to read tag and: t->setTitle( QStringToTString( mb.title() ) ); to set it.
Sorry, one additional missing line: TagLib::ID3v2::FrameFactory::instance()->setDefaultTextEncoding(TagLib::String::UTF8); I tested with that and it works. In TagLib 2.0 I'll reconsider the default encoding (i.e. making it UTF-8), but at the time that 1.0 was written most consoles still defaulted to local encodings rather than UTF-8.
Thank You. Tested on amarok and it works fine.