Bug 268587

Summary: Let K3b use intelligent transcription for non-Western European data
Product: [Applications] k3b Reporter: Christopher Yeleighton <giecrilj>
Component: Audio ProjectAssignee: k3b developers <k3b>
Status: REPORTED ---    
Severity: wishlist CC: j.e.labarre, trueg
Priority: NOR    
Version: 2.0.2   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Christopher Yeleighton 2011-03-15 21:50:35 UTC
Version:           2.0.2 (using KDE 4.6.0) 
OS:                Linux

I do not know whether CD-TEXT allows characters outside ISO Latin-1 character set, as this is a trade secret of Philips, but it seems K3b requires the texts to be in ISO Latin-1.  K3b uses meta-data encapsulated in media files to generate the CD-TEXT; if the meta-data contain characters outside of ISO Latin-1, they are replaced with an underscore.  This is far from a perfect solution.  

Reproducible: Always

Steps to Reproduce:
  1. Tell K3b to create an audio project.
  2. Tell K3b to include a FLAC file where its title is e.g. "Miąższość".

Actual Results:  
  2. CD-TEXT becomes "Mi__szo__".

Expected Results:  
  2. Let CD-TEXT become "Miazszosc".

OS: Linux (x86_64) release 2.6.37.1-1.2-desktop
Compiler: gcc

A similar problem is solved by Lynx pretty well.  Lynx even transcribes Cyrillic characters to ASCII.
Comment 1 James E. LaBarre 2021-02-04 02:25:46 UTC
Don't know about CD-TEXT, but I know that audio tags can be in other character sets, I have plenty of tracks with Japanese artists/titles/tags.  Current filesystems seem to have no problem with them either (although I have occasionally seen mangled filenames from some OS/filesystem that couldn't handle them).  

The problem with the example here is it presumes there are latin-1 characters that are near look-alikes.  This would not be the case with Kanji/kana, Cyrillic, Korean, Arabic, Chinese (I think there's at least two character sets there).

I bring this up because I had just tried bringing in just these sorts of files into K3B 20.08.1 (most recent in Fedora 33) and it still continues to replace non-Latin-1 characters with underscores.  Which makes it unusable for writing an audio CD