Bug 82252 - non-ASCII (unicode) characters in CD-TEXT not handled correctly
Summary: non-ASCII (unicode) characters in CD-TEXT not handled correctly
Status: RESOLVED FIXED
Alias: None
Product: k3b
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Sebastian Trueg
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-05-26 16:05 UTC by Michał Kosmulski
Modified: 2006-11-22 21:25 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
debug output (2.28 KB, text/plain)
2005-01-25 13:04 UTC, Andreas Pakulat
Details
snapshot for non-working cd-text (60.61 KB, image/png)
2005-01-25 13:05 UTC, Andreas Pakulat
Details
debug output, k3b not writing with automatic writing mode (4.35 KB, text/x-log)
2005-02-06 11:10 UTC, Andreas Pakulat
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michał Kosmulski 2004-05-26 16:05:54 UTC
Version:           0.11.10 (using KDE KDE 3.2.1)
Installed from:    Compiled From Sources
Compiler:          gcc 3.3.3 or 3.3.2 
OS:                Linux

When I create an audio CD with CD-TEXT and track titles or artist names assigned to tracks contain non-ASCII characters (in my tests, Polish diacritic characters and German umlauts), what gets written to the CD is not what I input in K3B, but these non-ASCII characters are represented as two random-looking 'garbage' characters. 

This looks like a UTF-8 to <some other encoding> conversion problem. 

Note, that the title and artist name for the whole CD (not individual tracks) are recorded correctly and non-ASCII characters are OK there. 

My locale is pl_PL.UTF-8, but I also had problems with non-ASCII characters in CD-TEXT when my locale was pl_PL (iso-8859-2 is the system encoding then). I checked what CD-TEXT actually got recorded on the CD using KsCD and cdrecord -toc.
Comment 1 Sebastian Trueg 2004-05-26 16:56:42 UTC
but CD-TEXT actually is ASCII only and german umlauts are ascii and work here. As for unicode I probably should not allow using it. K3b 0.12 (and the current cvs also) won't allow to use non-ASCII chars for CD-TEXT.
Comment 2 Michał Kosmulski 2004-05-27 10:24:00 UTC
Well, perhaps it depends on the locale and characters are not converted correctly. Anyway, when I use umlauts in track descriptions, what gets recorded on the cd has two characters in place of each umlaut. So perhaps it is their utf-8 representation which should get converted into ASCII but wasn't.
On the other hand, I can use unicode characters in the CD artist and title and it works. I used Polish characters there (which are not in ASCII) and they were encoded correctly. So is ASCII used for track titles and unicode for the disk title ? That would be really strange.
Comment 3 Sebastian Trueg 2004-05-27 10:50:26 UTC
CD-Text uses ASCII. Since k3b 0.11.x does not properly check this you may also write other chars as cd-text which violates the standard but seems to work.
Comment 4 Andreas Pakulat 2004-12-10 13:09:04 UTC
k3b 0.11.17 is not able to work in utf-8 environments. Cd-artist, track-title and track-artist are put in utf-8 encoding onto the cd. If I have non-ascii character like german umlauts in the cd-title field (the title in the "burn-cd" dialog) k3b won't be able to prepare the audio tracks and doesn't even start to burn the cd.

The last thing is also true for the CVS version, so this really needs fixing!

Andreas
Comment 5 Sebastian Trueg 2004-12-10 17:30:31 UTC
german umlauts are ascii.
sure about the cvs version? Have some output? console is the most important
Comment 6 Michał Kosmulski 2004-12-10 18:18:49 UTC
As a matter of fact, umlauts are in ISO-8859-1 / ANSI (8-bit charset) but not in ASCII, which is a 7 bit charset. If you use UTF-8, they are encoded using two bytes just like Polish characters.
Comment 7 Andreas Pakulat 2005-01-25 13:03:03 UTC
I compiled CVS this night:
andreas@morpheus:~>k3b --version
Qt: 3.3.3
KDE: 3.3.2
K3b: 0.11.97

And removed the debian package. k3b still doesn't work in utf-8 environments.

1. If the mp3 file's id3v2 Tag is UTF-8 encoded I get wrong characters, this is more a problem of taglib than k3b

2. latin1 (ISO-8859-1) encoded id3v2 tags are showing up correctly in k3b. 

3. Using a CD-Title (via properties->Cd-Text->title) that has german umlauts in it (all in a UTF-8 environment) make k3b create horific filenames for the wav-files and thus the decoding of the mp3-files is not working. I attach the debug output and a snapshot showing k3b's window (don't get confused, I changed language to en_US.UTF-8 so everybody understands the error messages, the same happens with de_DE.UTF-8)

I cannot verify the encoding of the CD-Text on the cd, as there seems to be no Unix-Tool to read CD-Text :-(
Comment 8 Andreas Pakulat 2005-01-25 13:04:30 UTC
Created attachment 9286 [details]
debug output

debug output from k3b having a german umlaut in the CD-Title of the CD-TEXT in
utf-8 environment
Comment 9 Andreas Pakulat 2005-01-25 13:05:59 UTC
Created attachment 9287 [details]
snapshot for non-working cd-text

Shows the k3b-burn-dialog when trying to burn a cd with CD-Title containing
german umlauts in utf-8 environment
Comment 10 Sebastian Trueg 2005-01-25 18:15:19 UTC
k3b reads cd-text. just use disk-info
Comment 11 Sebastian Trueg 2005-01-25 18:29:22 UTC
fixed number 3.
And please check the cd-text with k3b so I know if my code works (I could also switch my system to UTF-8... but well, lazy. ;)
Comment 12 Andreas Pakulat 2005-02-06 10:16:31 UTC
Checked with k3b from Jan-25th: CD-Text from id3-tags that were encoded latin1 is Ok. CD-Text from id3-tags that were encoded in UTF-8 is encoded UTF-8 too.
CD-Text edited within k3b is OK too (I didn't expect it to be wrong anyway)

Will retry with CVS from today...
Comment 13 Andreas Pakulat 2005-02-06 11:09:24 UTC
Current CVS doesn't show any changes at all - I'm not prevented from writing latin1 or utf-8 encoded strings to the CD in CD-Text, I also still can't create a CD-Title with non-ASCII characters. What changed is that using automatic as writing mode prevents me from burning the CD, cdrecord reports some errors and k3b also says something about "decoding audio titles". Setting to DAO (which is used when having automatic too) works. Also I noticed that using drag'n'drop to add mp3's with umlauts the title and artist do not get displayed in the title-list, until I klick somewhere... 

Andreas

PS: I attach the debug output from k3b not burning with automatic writing mode.
Comment 14 Andreas Pakulat 2005-02-06 11:10:44 UTC
Created attachment 9441 [details]
debug output, k3b not writing with automatic writing mode
Comment 15 Sebastian Trueg 2005-02-06 11:26:31 UTC
> Sense Key: 0x3 Medium Error, deferred error, Segment 0
this does not seem like a k3b problem.

Regarding the CD-Text: you are saying that it makes a difference if you type in the cd-text with umlauts or if it's loaded from id3-tags?
Comment 16 Andreas Pakulat 2005-02-06 13:19:59 UTC
I did not check the cdrecord output when choosing DAO mode, so the error might occur also with that. Anyway, k3b cannot decode the mp3-files when I use automatic write mode, it works when choosing DAO mode. But that probably has nothing to do with the cd-text (as it also happens when there's only ASCII in the Text).

For the CD-Text: As I said, the problem with id3-tags is that they might be UTF-8 encoded, which is currently not supported with id3lib or taglib. Personally I'd like to have my id3-tags encoded in UTF-8 (I had them, but there were too many apps that wouldn't work), as xosd doesn't play nice with latin1-encoded id3-tags in a utf-8 environment.

So the problems I have with k3b: 
1. I cannot tell k3b that the id3-tags will be encoded in utf-8, thus k3b uses taglibs standard QString method to get the tags, which returns latin1-encoded utf-8 string (i.e. 2 characters for 1 umlaut)

2. I cannot use any umlaut in the CD-Title, I mean the Titlefield in the Properties-Dialog on the CD-Text Tab. I don't know what's going on there, but it seems that the file that the mp3's are decoded to have strange names (IIRC doubly utf-8 encoded, so 4 characters where should be 1 umlaut). This might be related to me using umlauts in filenames

I can work around the first problem, by using latin1-encoded id3-tags, which makes xosd output miss the umlauts, but that's not the real problem.

I cannot work around the 2nd problem without using ae,ue or sth. similar for the umlauts. 

If only ASCII is allowd in CD-Text - i.e. even no german umlauts, then k3b should give a warning when I type em or convert them to ae,ue... IMHO. I think the latter would be the best, as CD-Player that support CD-Text probably cannot display umlauts anyway (I checked only one - car radio, which can't).

Andreas
Comment 17 Sebastian Trueg 2005-02-06 16:05:52 UTC
2. should already be fixed both in 0.11.20 and in cvs.

Regarding umlauts in CD-Text: not sure here. You are right, ASCII is 7-bit, I was always thinking of the extended 8-bit code which does contain umlauts. I will search for some more detailed info on this and maybe restrict the CD-TExt in K3b further.
Comment 18 Andreas Pakulat 2005-02-06 18:17:23 UTC
2. is not fixed in CVS, I just purged k3b-install and coed kdeextragear again. Same problem, using an umlaut in CD-Title makes k3b create a very weird filename in /tmp/kde-andreas: k3b_audio_SchÃÂöner0_01.inf (I hope this is readable, else I'll send it via mail).
Comment 19 Sebastian Trueg 2005-02-06 19:31:09 UTC
right. I messed 2 up instead of fixing it... I reverted that now and at least here it works... stupid char sets. There should be just one! ;)
Comment 20 Kai Ponte 2005-11-29 22:01:00 UTC
I got the same thing trying to add a German file to a DVD. 

Incorrectly encoded string (Herr Jesu hat ein Gärtchen.txt) encountered.
Possibly creating an invalid Joliet extension. Aborting.
Comment 21 Christoph Burger-Scheidlin 2006-09-05 20:16:15 UTC
Could you check if this bug still occurs with the recent version of k3b 
(0.12.17)?
Comment 22 Christoph Burger-Scheidlin 2006-11-22 21:25:21 UTC
pre 0.12 and no reply in over a month