Bug 89747 - Issues saving new tags with recoded 8bit encoding
Summary: Issues saving new tags with recoded 8bit encoding
Status: RESOLVED FIXED
Alias: None
Product: amarok
Classification: Applications
Component: general (show other bugs)
Version: 1.1-beta2
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Amarok Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-09-18 10:55 UTC by mrudolf
Modified: 2006-06-11 12:32 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
This patch for encoding issue with tags (4.71 KB, patch)
2005-04-13 21:03 UTC, LiuCougar
Details
fix encoding when writing back to mp3 tags (495 bytes, patch)
2005-09-21 21:32 UTC, LiuCougar
Details
fix encoding when writing back to mp3 tags (revised) (460 bytes, patch)
2005-09-24 00:14 UTC, LiuCougar
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mrudolf 2004-09-18 10:55:33 UTC
Version:           1.1-beta2 (using KDE 3.3.0, Gentoo)
Compiler:          gcc version 3.3.4 20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)
OS:                Linux (i686) release 2.6.8-gentoo-r3

Non-latin1 encodings are broken again (this is something I reported before and was fixed for 1.0 series). Now it seems that latin1 encoding is forced again.
Comment 1 Mark Kretschmann 2004-09-18 11:11:13 UTC
Thing is, all of our developers use Latin1 encoding, so we never notice these problems. Where exactly does the issue surface?
Comment 2 mrudolf 2004-09-18 11:26:46 UTC
Mark Kretschmann,  sobota, 18 września 2004 11:11:

>------- Additional Comments From markey web de  2004-09-18 11:11 -------
>Thing is, all of our developers use Latin1 encoding, so we never notice
> these problems. Where exactly does the issue surface?
All the artists/albums/titles names are shown in latin1 encoding instead of 
UTF-8.

To see the problem (assuming that you have some ISO8859-2 or UTF-8 font 
installed, you may try to check if this e-mail displays following characters:
    ćóń 
(c-acute, o-acute, n-acute) differently in iso8859-2 and ascii encodings.

I discussed the problems with one of the Amarok developers in the past, maybe 
you'd be able to find that. I remember that previously the encoding problem 
was caused by applying some patch that was supposed to improve encxoding 
handling - but instead it broke it.

Comment 3 Max Howell 2004-09-18 14:00:49 UTC
Please specify where exactly the problem is. The playlist? The Collection browser? The systray? The playerwindow? All of them.

Please try emptying the playlist and readding the files from the filebrowser. Then the collection. That would be quite informative for me if you could do that.

Thanks.
Comment 4 Matteo Merli 2004-09-18 15:24:26 UTC
hi, 

here I have the same problem, i've looked the collection.db and strings are correct UTF-8 so i've seen the problem is that strings are not "casted" from utf-8   when extracted from db.. 

This one-liner solves the problem for me (not sure if it doesn't break other things)

Index: src/collectiondb.cpp
===================================================================
RCS file: /home/kde/kdeextragear-1/amarok/src/collectiondb.cpp,v
retrieving revision 1.137
diff -u -3 -p -r1.137 collectiondb.cpp
--- src/collectiondb.cpp        16 Sep 2004 19:57:37 -0000      1.137
+++ src/collectiondb.cpp        18 Sep 2004 13:15:20 -0000
@@ -608,7 +608,7 @@ CollectionDB::execSql( const QString& st
         //iterate over columns
         for ( int i = 0; i < number; i++ )
         {
-            if ( values ) *values << QString( (const char*)sqlite3_column_text( stmt, i ) );
+            if ( values ) *values << QString::fromUtf8( (const char*)sqlite3_column_text( stmt, i ) );
             if ( names )  *names  << QString( sqlite3_column_name( stmt, i ) );
         }
     }
Comment 5 Mark Kretschmann 2004-09-18 19:35:09 UTC
CVS commit by markey: 

FIX: Collection encoding broken for non-Latin1. (BR 89747).
Patch from <matteo.merli@gmail.com>.

Please test!

CCMAIL: 89747-done@bugs.kde.org


  M +1 -0      ChangeLog   1.329
  M +1 -1      src/collectiondb.cpp   1.139


--- kdeextragear-1/amarok/ChangeLog  #1.328:1.329
@@ -5,4 +5,5 @@
 
 VERSION 1.1:
+  FIX: Collection encoding broken for non-Latin1. (BR 89747)
   ADD: Popup-menu for cover images in Context-Browser.
   FIX: The first track to play is now random for random-mode.

--- kdeextragear-1/amarok/src/collectiondb.cpp  #1.138:1.139
@@ -609,5 +609,5 @@ CollectionDB::execSql( const QString& st
         for ( int i = 0; i < number; i++ )
         {
-            if ( values ) *values << QString( (const char*)sqlite3_column_text( stmt, i ) );
+            if ( values ) *values << QString::fromUtf8( (const char*) sqlite3_column_text( stmt, i ) );
             if ( names )  *names  << QString( sqlite3_column_name( stmt, i ) );
         }


Comment 6 Mark Kretschmann 2004-09-18 19:51:46 UTC
CVS commit by markey: 

Encode data in utf8 for database, and decode with utf8 again. Seems to work correctly.

CCMAIL: 89747@bugs.kde.org


  M +1 -1      collectiondb.cpp   1.140


--- kdeextragear-1/amarok/src/collectiondb.cpp  #1.139:1.140
@@ -578,5 +578,5 @@ CollectionDB::execSql( const QString& st
 
     //compile SQL program to virtual machine
-    error = sqlite3_prepare( m_db, statement.local8Bit(), statement.length(), &stmt, &tail );
+    error = sqlite3_prepare( m_db, statement.utf8(), statement.length(), &stmt, &tail );
 
     if ( error != SQLITE_OK )


Comment 7 mrudolf 2004-10-09 19:08:45 UTC
Still broken in 1.1. While OGG files with UTF-8 encodings inside are displayed correctly, MP3 are still broken. This is regardless of encoding of MP3 files (UTF-8/local charset) and value of encoding setting in Amarok (UTF8/local/disabled).
And of course, saving modified tags containing local charsets is still completely broken.
Comment 8 mrudolf 2004-10-12 11:50:40 UTC
Update to previous information: saving modified tags work correctly in OGG files, it is broken only in MP3.
Comment 9 Sonyu Chen 2004-11-14 11:44:36 UTC
I am using 1.1.1, and find the bug too.

The id3 tag recode works partly. Here are the items that work and not work.

* Work:
  # Collection Database
  # Home of Context Browser
  # Dropdown selectors in Meta Information.
  # Filename in Playlist.

* Not Work:
  # Current of Context browser
  # OSD
  # Everything in playlist, except the filename.
  # Everything in Meta Information, except the Dropdown selectors.

It seems that the underlying mechanism works, but the not-work parts do not display them correctly.  

Fix these things will be great, thank you^^
Comment 10 mrudolf 2004-11-14 11:46:43 UTC
Sonyu Chen,  niedziela, 14 listopada 2004 11:44:
> ------- I am using 1.1.1, and find the bug too.
>The id3 tag recode works partly. Here are the items that work and not work.
>* Work:
>  # Collection Database
>  # Home of Context Browser
>  # Dropdown selectors in Meta Information.
>  # Filename in Playlist.
>
>* Not Work:
>  # Current of Context browser
>  # OSD
>  # Everything in playlist, except the filename.
>  # Everything in Meta Information, except the Dropdown selectors.

You can check both MP3 and OGG, because OGG work for me, while MP3 do not.

Comment 11 Alex Kern 2004-12-12 19:37:30 UTC
Hi, trying amarok-1.2-beta2. Looks better, I can change meta data in mon Latin language and I does not lost it at first time. But after next "Rescan collection" I lost again all non Latin information!
Comment 12 Thiago Macieira 2004-12-13 00:08:41 UTC
MP3 ID3v1 tags are Latin1 only. There's no way to save non-Latin1 text. Could that be your problem?
Comment 13 mrudolf 2004-12-13 08:55:56 UTC
Thiago Macieira,  poniedziałek, 13 grudnia 2004 00:08:
>------- Additional Comments From thiago.macieira kdemail net  2004-12-13
> 00:08 ------- MP3 ID3v1 tags are Latin1 only. There's no way to save
> non-Latin1 text. 
There is no way to write UTF-8 text. But in pre-UTF times everything was 
Latin1-only, and you still can use non-Latin1 texts. Of course, those will be 
less portable, because you can't mix two charsets.

> Could that be your problem? 
Probably that is the cause. Anyway, I will suggest using local8Bit() instead 
of latin1() when writing mp3. As many players assume local charset, and 
other, including Amarok, allow choosing charset manually, I don't see the 
problem with such solution.

And current behavior - writing a text with all non-ascii characters replaced 
with random ascii equivalents, without any warning for user - is not too 
intuitive.

Comment 14 Alex Kern 2004-12-17 22:29:14 UTC
Yes, it's true, that ID3v1 supports only Latin1, but what is wrong to write ID3v2? Is it allowed to have both in single file? Or simple replace tag wit v2 if you try to write non Latin1 text?
Comment 15 Max Howell 2005-02-11 20:43:15 UTC
I'm assuming all this is fixed, pleas verify.
Comment 16 mrudolf 2005-02-12 01:18:43 UTC
Max Howell,  piątek, 11 lutego 2005 20:43:
>------- Additional Comments From max.howell methylblue com  2005-02-11 20:43
> ------- I'm assuming all this is fixed, pleas verify.
Sorry.

MP3 still don't work. Unless I don't know how to set it (I used ISO-8859-2 and 
got ? instead after re-loading mp3).
Comment 17 Max Howell 2005-02-12 02:04:44 UTC
Please explain what you mean. Also are you using CVS?
Comment 18 mrudolf 2005-02-12 10:25:24 UTC
Max Howell,  sobota, 12 lutego 2005 02:04:
>------- Additional Comments From max.howell methylblue com  2005-02-12 02:04
> ------- Please explain what you mean. 
I enter national (here: ISO8859-2) characters into MP3 tags. They are 
displayed properly in Amarok. Alas, they are not saved properly - if I remove 
MP3 file from playlist and re-add it, all national characters are replaced 
with '?'.

This is with both 'Recode' options enabled and character set set to ISO 
8859-2.


> Also are you using CVS? 
Yes, version from 12 Feb.

Comment 19 Max Howell 2005-02-12 14:35:48 UTC
Can you try opening the tags in another program like XMMS. I'd like to know if the tags are incorrect or the DB entry. Thanks.
Comment 20 mrudolf 2005-02-12 15:32:57 UTC
Max Howell,  sobota, 12 lutego 2005 14:35:
> ------- Can you try opening the tags in another program like XMMS. I'd like
> to know if the tags are incorrect or the DB entry. Thanks.
Looks like tags. It looks like there is a problem with both reading and 
writing.

Writing: all non-ascii characters are replaced with '?'
Reading: all non-ascii characters (if entered outside Amarok) are replaced 
with some other characters (probably from latin1 encoding).

This concerns only MP3, OGG are handled perfectly.
Comment 21 Jozef Riha 2005-02-14 07:34:17 UTC
> Can you try opening the tags in another program like XMMS. I'd like to know if the tags are incorrect or the DB entry. Thanks.

both are incorrect (doublechecked using id3info program). and at the same time when ogg is used both tags and db entries are ok.

cheers,

-- jose
Comment 22 Max Howell 2005-02-14 16:02:09 UTC
This is an appeal to someone who wants this feature to work. I cannot fix this. Whenever I try it doesn't work completely. If you want this feature to work you simply must provide us with a patch. Thanks.
Comment 23 mrudolf 2005-02-14 22:37:47 UTC
Max Howell,  poniedziałek, 14 lutego 2005 16:02:

> ------- This is an appeal to someone who wants this feature to work. I
> cannot fix this. Whenever I try it doesn't work completely. If you want
> this feature to work you simply must provide us with a patch. Thanks.
Of course, I will look into. Alas, I will be away for two weeks now, but I 
will investigate it after that.


Comment 24 Stefan Briesenick 2005-02-16 02:56:49 UTC
well, internal charset handling is UTF-8. It is really no problem to encode latin1 (or 'local' or whatever is stored inside the MP3) to UTF-8. So the displaying part should be easy.

Writing TAGs could be a problem, if you have chars which are not available in latin1 (or 'local'). But if this happens (the converter should inform you), you could display a warning. I don't think this is a big deal at all.

I'm using de_DE.UTF-8 as my locale.
Comment 25 Alexey A. Kiritchun 2005-02-18 14:39:15 UTC
This affects me badly, as I am Russian.

My 2 cents on proper solution:

- id3v1 should be stored and read in the charset specified in options, or in current locale if option is not set. To make it clear, when reading, convert from 'charset' to UTF, and when writing, from UTF to 'charset'. If the latter loses some info, warn user.
- There should be an option to disable reading and writing id3v1 altogether, and only use id3v2. This should solve most of the problems. 
- id3v2 tags should _always_ be written in Unicode. There is no reason not to do this, and this is the easiest and least error-prone thing to do (fewer special cases), I think.

I can try and test whatever patches you would provide. At first I thought that I would patch it myself, but, alas, work with tags is scattered all over the code (or so it seems), and I could not see any 'scheme' behind it. My C++ got somewhat rusty (after 3 years of Perl), so the task turned out too difficult for me.
Comment 26 LiuCougar 2005-04-13 21:03:06 UTC
Created attachment 10610 [details]
This patch for encoding issue with tags

This patch fixes all the issues discussed in this bug thread

this patch was modified from the patch here (in Chinese):
http://www.linuxfans.org/nuke/modules.php?name=Forums&file=viewtopic&t=112574&postdays=0&postorder=asc&start=0#4358746
Comment 27 jackliu9999 2005-04-14 01:35:30 UTC
- id3v1 should be stored and read in the charset specified in options, or in current locale if option is not set. To make it clear, when reading, convert from 'charset' to UTF, and when writing, from UTF to 'charset'. If the latter loses some info, warn user. 

NO!!!BUT china's national standardization is gb18030, not UTF-8, your are  wrong!!!! Please do not use his patch!!! Thank you.
Comment 28 jackliu9999 2005-04-14 01:49:25 UTC
Are you sure it can show right charactors in gb18030 local?
Comment 29 Stefan Briesenick 2005-04-14 02:31:51 UTC
@jackliu9999: I don't see your problem.

'charset' is *your* locale. UTF-8 is the internal encoding of almost every modern app nowadays. UTF-8 (which is just an encoding of Unicode) has *all* characters, also that in gb18030. So converting from gb18030 to UTF-8 is no problem at all. You lose nothing. And converting von UTF-8 back to gb18030 is also no problem if only chars from gb18030 are used. If not, you should be warned. That's what Alexey said and that's the *only* correct implementation.
Comment 30 Alexey A. Kiritchun 2005-04-14 22:15:32 UTC
I tested the the patch from liuspider (with 1.2.3, patch applied cleanly), and it is still wrong. Really, it makes matters worse. From the looks of it, it tryes to handle the conversion between utf and locale, and it does not do it right. When loading an mp3 with v1 tags in 8bit charset, all non-latin1 chars are truncated to '?' (hey, that's what one would expect from

-            #define strip( x ) TStringToQString( x ).stripWhiteSpace()
+            #define strip( x ) QString::fromLocal8Bit(TStringToQString( x ).latin1()).stripWhiteSpace();

so that's just plain wrong). If I edit the tag, then import the song again, it shows '?' mixed with some chars my fonts can't handle (I see squares), and the length doubles - a clear sign of improper handling of UTF<->8bit manipulation.

Release 1.2.2 (didn't test 1.2.3 before) at least was half-working and had 'compensating' errors, so I _could_ manage my Russian songs. See below for caveats.

What I need (and I guess a number of other people) is correct working of 'recode id3v1 data' setting, and that setting is different from locale. The reason for it is my hardware player that assumes cp1251 encoding for Cyrillic (that's what Windows uses), but my locale is, of course, ru_RU.UTF-8. 

Release 1.2.2 can read in cp1251 v1 tags, but if I edit them, my player shows garbage.

Could you please contact me here or by &lt;kaa at nightmail dot ru&gt; and outline the parts of amarok code that deal with tags? I do not really have the time to work on amarok, but I bet I'll have some ideas at least.
Comment 31 Jackey Yang 2005-04-17 23:23:44 UTC
Hi Guys, I am the original creater of liuspider's patch. First I want to clear something here:
1. UTF-8 is not default charset for windows users (default is GB18030), most of chinese mp3 are created by windows users. So use UTF-8 to solve this problem is totally wrong. Same problem happens to Japanese and Korean.
2. I am not quite understand why amarok has a charset setting, which supposes to be handled by Amarok automatically, not manually.
3. Read a CJK mp3 has two basic steps: read a mp3 with CJK filename and read CJK tags from the mp3.
4. Save a CJK mp3 has two steps too: write CJK tag into the mp3 and write a mp3 (with CJK filename) on the disk

Alexey A. Kiritchun, in order to have this patch really work for you. You have to shutoff your amarok and delete your local amarok config file.
Comment 32 LiuCougar 2005-04-19 01:08:43 UTC
to Alexey A. Kiritchun:
yes, as said by Fackey Yang, you have to disable all existing re-encoding feature in amarok, and you have to set the encoding of your locale to the one which your mp3's tags were encoded in.

I assume you can do this to start amarok:
LANG=ru_RU.cp1251 amarok
rebuild all of your collection, and you should be fine with all the tags of your mp3. (cp1251 may be wrong, it should be the unix-type corresponding one)

After this patched merged into CVS, I will submit another patch to overwrite your locale settings, so you do not need to append LANG=blabal before you start amarok.

I am Chinese user, and my locale is en_US.UTF-8, but my songs are encoded in GBK (or GB18030 if you like), so I have to start amarok as this:
LANG=zh_CN.GBK amarok

after you modify according to this comment, you should have no problem with your tags anymore.
Comment 33 Greg Meyer 2005-08-15 21:11:20 UTC
Can you update to 1.3 final and see if this is still happening?
Comment 34 Alexey A. Kiritchun 2005-09-19 21:21:43 UTC
To Greg Meyer:

1.3 final (I have a build from kde-redhat on redora) still has _issues_ with encodings. Try editing a tag to contain non-latin1 characters, save, clean the playlist, than play that file again. Amarok cannot even load the tags it has just saved himself. This is just not acceptible, especially since it is all though Unicode.

I now have some (not too much, though) free time and would be willing to try and fix this, if someone cares to guide me a little though the tag-handling code of amarok. At least provide me with the list of file to look at or keywords to grep for. I actually read the mail for the address listed in bugzilla; I can try to hang on IRC a couple hour a day (around 20:00 GMT+4).

See my view of the problem above (comment 25 for this bug). I also propose:

a) explicit disabling of id3v1, via user-visible option. That's what 99% of people, who now have problems, would like, I suppose. At least, that's what I would like. And there's a wish report for it, I believe.
b) explicit handling of id3v2 versions and encoding. With some sane defaults like 'v2.4 in UTF-16 with BOM' or maybe 'v2.3 UTF-16 BOM', to support older hardware MP3-players. 
c) 'prefer v1 to v2' and 'v1 encoding is not latin1' should be clearly separated options. They certainly are useful in different situations.
d) these should go onto a separate options pane like 'MP3 tags options', with a worning of "don't touch unless you have problems and know what you're doing".
e) Maybe (but in the distant future) fuller tag-editing capabilities, like 'strip v{1,2} tags', 'copy v1<->v2, per field', 'set tags from filename and vice versa', etc.
Comment 35 LiuCougar 2005-09-21 21:32:25 UTC
Created attachment 12649 [details]
fix encoding when writing back to mp3 tags

this should fix all the issues (hopefully)

and it is only one line fix. I'd like to commit it if you think it is ok.
Comment 36 LiuCougar 2005-09-22 03:35:18 UTC
More about the patch in Comment #35:
this patch rectify the following problem: when your mp3s use none-utf8 encoded meta data, and in the configure dialog, id3v1 tags Encoding is set to this locale encoding (such as GB18030/GBK/GB2312 used in China), while you can see the mp3 meta data correctly, amarok can not write back data to mp3s correctly: you will get rubbish chars if you try to do that.

this patch should make it possible to write back meta data encoded in the specified "id3v1 tags Encoding"
Comment 37 Stanislav Karchebny 2005-09-23 23:50:59 UTC
I tested the patch and it seems to work correctly, gets my approval.
Comment 38 LiuCougar 2005-09-24 00:14:33 UTC
Created attachment 12678 [details]
fix encoding when writing back to mp3 tags (revised)

This revised patch is better than the one in comment #35
Comment 39 LiuCougar 2005-09-24 00:27:21 UTC
SVN commit 463400 by liucougar:

fixed writing back locale encoded mp3 tags
BUG: 89747


 M  +1 -1      app.cpp  


--- trunk/extragear/multimedia/amarok/src/app.cpp #463399:463400
@@ -369,7 +369,7 @@
 
     virtual TagLib::ByteVector render( const TagLib::String &ts ) const
     {
-        const QCString qcs = m_codec->fromUnicode( ts.toCString( true ) );
+        const QCString qcs = m_codec->fromUnicode( TStringToQString(ts) );
         return TagLib::ByteVector( qcs, qcs.length() );
     }
 
Comment 40 Jozef Riha 2005-10-10 10:01:32 UTC
is the fix included in 1.3.3? if so it would be nice to see it in Changelog