Bug 77710 - i18n: ability to choose encoding of tag info
Summary: i18n: ability to choose encoding of tag info
Status: RESOLVED FIXED
Alias: None
Product: amarok
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: unspecified Other
: NOR wishlist
Target Milestone: ---
Assignee: Amarok Developers
URL:
Keywords:
: 80911 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-03-15 21:39 UTC by Nick Shaforostoff
Modified: 2006-06-11 12:32 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
utf8-1.cpp (1.25 KB, text/x-c++src)
2004-04-04 20:05 UTC, Scott Wheeler
Details
combobox for selecting tag charset (10.93 KB, patch)
2004-05-10 09:56 UTC, Nick Shaforostoff
Details
Patch for today CVS (15.60 KB, patch)
2004-07-31 19:23 UTC, Andrew Lavrinenko
Details
Script for encoding conversion. (2.99 KB, application/x-perl)
2005-02-03 17:58 UTC, Pavel Gurevich
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Shaforostoff 2004-03-15 21:39:27 UTC
Version:           0 (using KDE Continue)
Compiler:          amarok/general
OS:          X..........

Russian songs have id3tags in cp1251 charset (windows) de-facto.
But system locale is usually koi8-u (or even utf8).

So it would be good if user could choose encoding charset he needs

(maybe this feature should be added to taglib?)
Comment 1 Stanislav Karchebny 2004-03-24 06:33:34 UTC
This should be added to taglib. Amarok has nothing to do with tags encodings.
Comment 2 Scott Wheeler 2004-03-28 09:29:22 UTC
Well, and ID3v1, dumb as it is (and as broken as the implementation is, well, everywhere) is *supposed* to be ISO-8859-1 everywhere.  The problem is that there's no way to specify the encoding along with the file for MP3s.  Some things put UTF-8 in there, some put ISO-8859-1 in there (as they should -- even though I know that sucks).  Some put the current locale...

But that all breaks down as soon as you copy the files to a machine with a different locale or are running two OS'es with different locales set (this is common in many regions -- Linux defaults to one locale and Windows to another).

TagLib just implements ID3v1's brokenness; it's really a bug in ID3v1, not a bug in TagLib (though you can treat the string coming from TagLib as whatever locale you want to -- good luck figuring out what the right one is though).
Comment 3 Scott Wheeler 2004-04-04 16:51:53 UTC
CVS commit by wheeler: 

Add the ability to override the default ID3v1 string handling in TagLib by
adding a ID3v1::StringHandler class that has render and parse methods.

By default this still makes the (correct) assumption that ID3v1 tags contain
ISO-8859-1 data, but this makes it easy for applications to override this to
allow user specified codecs.

CCMAIL:78428@bugs.kde.org
CCMAIL:77710@bugs.kde.org
CCMAIL:amarok-devel@lists.sourceforge.net
CCMAIL:Shlomi Loubaton <loubaton.shlomi@012.net.il>
CCMAIL:Nir Misgav <misgav@ee.bgu.ac.il>


  M +34 -10    id3v1tag.cpp   1.23
  M +47 -0     id3v1tag.h   1.14



Comment 4 Scott Wheeler 2004-04-04 20:05:49 UTC
Ok, and now I'm attaching a small Qt-based program that shows how to override 
the default string encoding.  This one just uses utf8, but I think it's clear 
how one would proceed based on this...

I would also recommend by default (i.e. when ISO-8859-1 is selected) not using 
an external string handler at all since they extra lookups will likely slow 
things down...

- -Scott

On Sunday 04 April 2004 16:42, Scott Wheeler wrote:
> CVS commit by wheeler: 
> 
> Add the ability to override the default ID3v1 string handling in TagLib by
> adding a ID3v1::StringHandler class that has render and parse methods.
> 
> By default this still makes the (correct) assumption that ID3v1 tags contain
> ISO-8859-1 data, but this makes it easy for applications to override this to
> allow user specified codecs.
> 
> CCMAIL:78428@bugs.kde.org
> CCMAIL:77710@bugs.kde.org
> CCMAIL:amarok-devel@lists.sourceforge.net
> CCMAIL:Shlomi Loubaton <loubaton.shlomi@012.net.il>
> CCMAIL:Nir Misgav <misgav@ee.bgu.ac.il>
> 
> 
>   M +34 -10    id3v1tag.cpp   1.23
>   M +47 -0     id3v1tag.h   1.14
> 
> 
> 
> 



Created an attachment (id=5530)
utf8-1.cpp
Comment 5 Scott Wheeler 2004-05-04 02:44:09 UTC
Heh -- forgot to mark this as "done" earlier.  From the TagLib side of things this is done, so I'm marking it as such.  You may however want to reopen and reassign it to amarok.
Comment 6 Nick Shaforostoff 2004-05-10 09:56:58 UTC
Created attachment 5929 [details]
combobox for selecting tag charset

cd /path/to/amarok-cvs
patch -p1 < /path/to/amarok-tr-0.4.patch
make -f admin/Makefile.common cvs
./configure --prefix=`kde-config --prefix` --disable-debug && make && make
install
Comment 7 Nick Shaforostoff 2004-05-10 09:59:07 UTC
please, reassign this to amarok...
Comment 8 Adeodato Simó 2004-06-01 00:26:46 UTC
I think I'll take a look at this when I find some time. I do now understand the problem and I must say I'm quite concerned with l10n issues.

Have to ask on #amarok where to put the config option.
Comment 9 Adeodato Simó 2004-06-01 00:32:18 UTC
*** Bug 80911 has been marked as a duplicate of this bug. ***
Comment 10 Adeodato Simó 2004-06-01 00:34:24 UTC
Comment on attachment 5929 [details]
combobox for selecting tag charset

Note that the patch implementing the combobox seems to be from Andrew
Lavrinenko  <ggcand@aaanet.ru>. See #80911 for details.
Comment 11 Nick Shaforostoff 2004-06-01 11:02:22 UTC
At least, this patch was introduced to me by man with nikname 'Light'.
Anyways, the patch seems to be broken - there exist reference to Options6.h but there is no its implemenatation
Comment 12 Nick Shaforostoff 2004-06-20 14:31:09 UTC
here is patch for 1.0
http://www.redroot.ru/000/amarok-tr-0.5.patch
but it isnt complete.
at least there is no encoding handling on collection browser and
when i edit russian tag, it doesnt do reencoding.

and i dont know wether this is taglib bug or no
Comment 13 Andrew Lavrinenko 2004-07-31 19:21:42 UTC
Hello! Sorry for my long time absent, i have no time. I think, patch is compleat. Now it recode tag's in CollectionDB, correctly read AudioProperties, correctly save tag's if it editing, i move this option to "font" form. But exist one small trable, if file content both ID3v1 and ID3v2 tag's then amarok use only ID3v1. 
PS. You must include it to main realse:) It's very importent for non english speaking country! 
PPS. Sorry for my english:)
Comment 14 Andrew Lavrinenko 2004-07-31 19:23:19 UTC
Created attachment 6947 [details]
Patch for today CVS
Comment 15 Nick Shaforostoff 2004-07-31 20:42:29 UTC
got it but when i try to compile it, it gives me error

In file included from app.cpp:28:
id3v1stringhandler.h:16: error: base class `TagLib::ID3v1::StringHandler' has incomplete type
app.cpp: In member function `void App::applySettings(bool)':
app.cpp:341: error: 'class TagLib::ID3v1::Tag' has no member named 'setStringHandler'

taglib12, qt33
can somebody confirm this or its just suckness of my mdk10?
Comment 16 Max Howell 2004-07-31 21:44:10 UTC
andrew, thanks for the patch! I'll review and apply it either today or tomorrow :-)

And yes, we'll get it into 1.1 of course.
Comment 17 Andrew Lavrinenko 2004-08-01 12:28:27 UTC
2 Max Howell
BIG BIG Thenx!:)

2 Nick Shaforostoff
It's correctly work on taglib1.1 and taglib1.2

Sorry, i found small bag in my patch. Please replasce in id3v1stringhandler.cpp
ID3v1StringHandler::ID3v1StringHandler(int codecIndex)
       : m_codec(QTextCodec::codecForName("CP1251"))
{}

to

ID3v1StringHandler::ID3v1StringHandler(int codecIndex)
       : m_codec(QTextCodec::codecForIndex(codecIndex))
{}

Comment 18 Max Howell 2004-08-02 04:10:12 UTC
CVS commit by mhowell: 

* With much thanks to Andrew Lavrinenko and Mr Scott-TagLib-Wheeler, amaroK can now display ID3v1 tags in a user-specified 8-bit encoding. Please test (since I have not the facility) and suggest modifcations. Thanks.
* Note: I note with glee that KDElibs 3.3 supports XT using combo boxes and group boxes with checkboxes. So you need KDE 3.3 to activate this feature with the GUI. I figure 1.1 will likely require 3.3 anyway, if it doesn't I'll hack it backwards to work with 3.2.
* Also fixed in this commit: amaroK recognises session restored tracks again

CCMAIL: 77710-done@bugs.kde.org


  M +246 -194  Options2.ui   1.27
  M +28 -11    app.cpp   1.376
  M +0 -1      app.h   1.150
  M +24 -16    configdialog.cpp   1.39
  M +28 -16    enginecontroller.cpp   1.50
  M +5 -3      enginecontroller.h   1.24
  M +11 -1     metabundle.h   1.31



Comment 19 Max Howell 2004-08-02 04:11:46 UTC
Andrew, I didn't commit the mimetype detection stuff in MetaBundle. But have plans to at a future time. Thanks for the work!

Max
Comment 20 Andrew Lavrinenko 2004-08-02 22:27:13 UTC
I use mimetype detection only for override id3v2 to id3v1 tags becose almost all mp3 files content both id3v1 and id3v2 tags, but id3v2 not in unicode charset:( How to resolve this problem - i don't know.
Comment 21 Funda Wang 2004-12-10 13:23:35 UTC
It seems that amarok 1.2beta2 can recode id3v1 in Chinese perfectly. But for id3v2, the characters are just being known as iso-8859-1.

Maybe another dropdown list for id3v2?
Comment 22 Pavel Gurevich 2005-02-03 17:26:30 UTC
There still a problem with id3v1 and id3v2 character encodings. Unlike it appears in comment 13, id3v_2_ is used when both present and there is no option to alter this behavior.

Can you please add similar solution as you've done for id3v1 already? Thank you very much for a wonderful player.
Comment 23 Nick Shaforostoff 2005-02-03 17:43:24 UTC
we already discussed this
id3v2 must be in utf
(so then we need a simple tool to convert missing tags from cp1251 to utf8 and it might be a little button in 'edit metainfo' dialogue...)
Comment 24 Pavel Gurevich 2005-02-03 17:58:40 UTC
Created attachment 9410 [details]
Script for encoding conversion.

Description: Recodes id3v2 tag to/from any encoding iconv supports.
Depends: perl, iconv, id3v2.
Note: May be vulnerable for filenames containing "'", "`" etc.

For those who might need this. You may want to put it somewhere on your site or
include with player.
Comment 25 Funda Wang 2005-02-04 01:50:45 UTC
Beep media player has a option "Prefer id3v1 than id3v2". Maybe it is rather simple solution.
Comment 26 Max Howell 2005-02-04 15:28:05 UTC
Prefer ID3v1 over ID3v2 if the user specifically requests that mp3 ID3v1 tags get recoded. To me this doesn't seem like a solution that should work, but I can't argue with the number of requests for it, and the number of users who claim recode tags doesn't work because the id3v2 tag is preferred.
Please, please someone test this! I have nothing to test it with and cannot be sure that it'll work. (The common case still works though, that is tested)
BUG:77710
Comment 27 Frederick Alexander Thomssen 2005-02-04 22:19:06 UTC
Well, for me it doesn't work - when amarok tries to play an MP3 with ID3v1 (if I have selected the recode entry) amarok crashes - sorry, I have no debug infos - but when I turn off that recoding stuff, it works again

freddy
Comment 28 Max Howell 2005-02-05 16:59:36 UTC
CVS commit by mhowell: 

Don't crash if the user wants to recode tags
CCMAIL: 77710@bugs.kde.org


  M +33 -25    metabundle.cpp   1.44


--- kdeextragear-1/amarok/src/metabundle.cpp  #1.43:1.44
@@ -81,9 +81,8 @@ MetaBundle::MetaBundle( const PlaylistIt
 {
     if( m_url.protocol() == "file" )
-    {
         readTags( TagLib::AudioProperties::Accurate );
-    }
-    else { //is Stream
 
+    else {
+        // is a stream
         //FIXME not correct handling, say is ftp://file
         m_bitrate    = item->exactText( 10 ).left( 3 ).toInt();
@@ -151,36 +150,45 @@ MetaBundle::readTags( TagLib::AudioPrope
         return;
 
-    TagLib::FileRef f( QFile::encodeName( m_url.path() ), true, readStyle );
+    const QString path = m_url.path();
+    TagLib::FileRef fileref;
+    TagLib::Tag *tag = 0;
 
-    if( !f.isNull() && f.tag() )
+    if( AmarokConfig::recodeID3v1Tags() && path.endsWith( ".mp3", false ) )
     {
-        TagLib::Tag *tag = 0;
+        TagLib::MPEG::File *mpeg = new TagLib::MPEG::File( QFile::encodeName( path ), true, readStyle );
+        fileref = TagLib::FileRef( mpeg );
 
-        if( AmarokConfig::recodeID3v1Tags() && m_url.path().endsWith( ".mp3", false ) ) {
+        if( mpeg->isValid() )
             // we prefer ID3v1 over ID3v2 if recoding tags because
             // apparently this is what people who ignore ID3 standards want
-            TagLib::MPEG::File &mp3 = (TagLib::MPEG::File&)f;
-            tag = mp3.ID3v1Tag() ? (TagLib::Tag*)mp3.ID3v1Tag() : (TagLib::Tag*)mp3.ID3v2Tag();
+            tag = mpeg->ID3v1Tag() ? (TagLib::Tag*)mpeg->ID3v1Tag() : (TagLib::Tag*)mpeg->ID3v2Tag();
         }
-        else
-            tag = f.tag();
 
-        #define bing( x ) TStringToQString( x ).stripWhiteSpace()
-        m_title   = bing( tag->title() );
-        m_artist  = bing( tag->artist() );
-        m_album   = bing( tag->album() );
-        m_comment = bing( tag->comment() );
-        m_genre   = bing( tag->genre() );
+    else {
+        fileref = TagLib::FileRef( QFile::encodeName( path ), true, readStyle );
+
+        if( !fileref.isNull() )
+            tag = fileref.tag();
+    }
+
+    if( !fileref.isNull() && tag ) {
+        #define strip( x ) TStringToQString( x ).stripWhiteSpace()
+        m_title   = strip( tag->title() );
+        m_artist  = strip( tag->artist() );
+        m_album   = strip( tag->album() );
+        m_comment = strip( tag->comment() );
+        m_genre   = strip( tag->genre() );
         m_year    = tag->year() ? QString::number( tag->year() ) : QString();
         m_track   = tag->track() ? QString::number( tag->track() ) : QString();
-        #undef bing
-
-        init( f.audioProperties() );
+        #undef strip
 
         m_isValidMedia = true;
+
+        init( fileref.audioProperties() );
     }
-/*FIXME disabled for beta4 as it's simpler to not got 100 bug reports
-        else if( KMimeType::findByUrl( m_url )->is( "audio" ) )
-        init( KFileMetaInfo( m_url, QString::null, KFileMetaInfo::Everything ) );*/
+
+    //FIXME disabled for beta4 as it's simpler to not got 100 bug reports
+    //else if( KMimeType::findByUrl( m_url )->is( "audio" ) )
+    //    init( KFileMetaInfo( m_url, QString::null, KFileMetaInfo::Everything ) );
 }
 


Comment 29 Andrew Lavrinenko 2005-02-13 21:03:50 UTC
Thanks Max it's work fine, but some bug exist since adding to configdialog.cpp:95 
...
m_opt1->kcfg_RecodeEncoding->insertItem( QString::null );
...
Now the codec index not comply with they name in kcfg_RecodeEncoding
Maybe you delete this code or edit the applySettings in app.cpp like
...
if ( AmarokConfig::recodeID3v1Tags() )
TagLib::ID3v1::Tag::setStringHandler( new ID3v1StringHandler( AmarokConfig::recodeEncoding()-1 ) );
...
Comment 30 Max Howell 2005-02-13 21:37:45 UTC
Woah, good point. Sorry. I added the null string so people didn't panic that their tags were being recoded to something weird. And KConfigXT isn't flexible enough to let me set it to utf8 (or I couldn't make it work). Will fix asap.
Comment 31 Max Howell 2005-02-14 04:24:14 UTC
CVS commit by mhowell: 

* Extremely unfortunate bug that is the consequence of none of the regular developers using this feature at all. For some reason if the user recoded id3v1 tags using QTextCodec #0, it caused a crash. So I just disabled that codec and tested a random selection of the rest, seems to be fixed.
* This is also convenient since it means I can get round the KConfigXT-doesn't-like-ComboBoxes bug that has remained unfixed since KDE 3.2 *sigh*
BUG:95041
BUG:77710
CCMAIL:amarok-devel@lists.sf.net


  M +16 -6     app.cpp   1.478
  M +6 -4      configdialog.cpp   1.72


--- kdeextragear-1/amarok/src/app.cpp  #1.477:1.478
@@ -341,10 +341,8 @@ void App::initGlobalShortcuts()
 //here and save creating another header/source file combination
 
-class ID3v1StringHandler : public TagLib::ID3v1::StringHandler {
+class ID3v1StringHandler : public TagLib::ID3v1::StringHandler
+{
     QTextCodec *m_codec;
-public:
-    ID3v1StringHandler( int codecIndex )
-            : m_codec( QTextCodec::codecForIndex( codecIndex ) )
-    {}
+
     virtual TagLib::String parse( const TagLib::ByteVector &data ) const
     {
@@ -356,4 +355,12 @@ public:
         return TagLib::ByteVector( qcs, qcs.length() );
     }
+
+public:
+    ID3v1StringHandler( int codecIndex )
+            : m_codec( QTextCodec::codecForIndex( codecIndex ) )
+    {
+        debug() << "codec: " << m_codec << endl;
+        debug() << "codec-name: " << m_codec->name() << endl;
+    }
 };
 
@@ -417,5 +424,8 @@ void App::applySettings( bool firstTime 
     m_pTray->setShown( AmarokConfig::showTrayIcon() );
 
-    if ( AmarokConfig::recodeID3v1Tags() )
+
+    // we check > 0 because textCodecForIndex( 0 ) crashes amaroK for unknown
+    // reasons, also now we assign index 0 to "" in the config combobox
+    if( AmarokConfig::recodeID3v1Tags() && AmarokConfig::recodeEncoding() > 0 )
         TagLib::ID3v1::Tag::setStringHandler( new ID3v1StringHandler( AmarokConfig::recodeEncoding() ) );
 

--- kdeextragear-1/amarok/src/configdialog.cpp  #1.71:1.72
@@ -85,5 +85,5 @@ AmarokConfigDialog::AmarokConfigDialog( 
     KTrader::OfferList offers = PluginManager::query( "[X-KDE-amaroK-plugintype] == 'engine'" );
 
-    for ( KTrader::OfferList::ConstIterator it = offers.begin(); it != offers.end(); ++it ) {
+    for( KTrader::OfferList::ConstIterator it = offers.begin(); it != offers.end(); ++it ) {
         m_soundSystem->insertItem( (*it)->name() );
         // Save name properties in QMap for lookup
@@ -92,8 +92,10 @@ AmarokConfigDialog::AmarokConfigDialog( 
     }
 
-    // ID3v1 recoding locales
-    m_opt1->kcfg_RecodeEncoding->insertItem( QString::null );
+    // KConfigXT doesn't like Comboboxes, still, 2 versions after
+    // it was created. How shit. Hence we can't make recodeEncoding
+    // a string (which we need to because by index is not consistent)
     QTextCodec *codec;
-    for ( int i = 0; ( codec = QTextCodec::codecForIndex( i ) ); i++ )
+    m_opt1->kcfg_RecodeEncoding->insertItem( QString::null );
+    for( int i = 1; (codec = QTextCodec::codecForIndex( i )); i++ )
         m_opt1->kcfg_RecodeEncoding->insertItem( codec->name() );