Bug 268587

Summary:	Let K3b use intelligent transcription for non-Western European data
Product:	[Applications] k3b	Reporter:	Christopher Yeleighton <giecrilj>
Component:	Audio Project	Assignee:	K3b Bugs <k3b-bugs-null>
Status:	REPORTED ---
Severity:	wishlist	CC:	j.e.labarre, trueg
Priority:	NOR
Version First Reported In:	2.0.2
Target Milestone:	---
Platform:	openSUSE
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description Christopher Yeleighton 2011-03-15 21:50:35 UTC

Version:           2.0.2 (using KDE 4.6.0) 
OS:                Linux

I do not know whether CD-TEXT allows characters outside ISO Latin-1 character set, as this is a trade secret of Philips, but it seems K3b requires the texts to be in ISO Latin-1.  K3b uses meta-data encapsulated in media files to generate the CD-TEXT; if the meta-data contain characters outside of ISO Latin-1, they are replaced with an underscore.  This is far from a perfect solution.  

Reproducible: Always

Steps to Reproduce:
  1. Tell K3b to create an audio project.
  2. Tell K3b to include a FLAC file where its title is e.g. "Miąższość".

Actual Results:  
  2. CD-TEXT becomes "Mi__szo__".

Expected Results:  
  2. Let CD-TEXT become "Miazszosc".

OS: Linux (x86_64) release 2.6.37.1-1.2-desktop
Compiler: gcc

A similar problem is solved by Lynx pretty well.  Lynx even transcribes Cyrillic characters to ASCII.

Comment 1 James E. LaBarre 2021-02-04 02:25:46 UTC

Don't know about CD-TEXT, but I know that audio tags can be in other character sets, I have plenty of tracks with Japanese artists/titles/tags.  Current filesystems seem to have no problem with them either (although I have occasionally seen mangled filenames from some OS/filesystem that couldn't handle them).  

The problem with the example here is it presumes there are latin-1 characters that are near look-alikes.  This would not be the case with Kanji/kana, Cyrillic, Korean, Arabic, Chinese (I think there's at least two character sets there).

I bring this up because I had just tried bringing in just these sorts of files into K3B 20.08.1 (most recent in Fedora 33) and it still continues to replace non-Latin-1 characters with underscores.  Which makes it unusable for writing an audio CD