Bug 268587 - Let K3b use intelligent transcription for non-Western European data
Summary: Let K3b use intelligent transcription for non-Western European data
Status: REPORTED
Alias: None
Product: k3b
Classification: Applications
Component: Audio Project (show other bugs)
Version: 2.0.2
Platform: openSUSE Linux
: NOR wishlist
Target Milestone: ---
Assignee: k3b developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-15 21:50 UTC by Christopher Yeleighton
Modified: 2021-02-04 02:25 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christopher Yeleighton 2011-03-15 21:50:35 UTC
Version:           2.0.2 (using KDE 4.6.0) 
OS:                Linux

I do not know whether CD-TEXT allows characters outside ISO Latin-1 character set, as this is a trade secret of Philips, but it seems K3b requires the texts to be in ISO Latin-1.  K3b uses meta-data encapsulated in media files to generate the CD-TEXT; if the meta-data contain characters outside of ISO Latin-1, they are replaced with an underscore.  This is far from a perfect solution.  

Reproducible: Always

Steps to Reproduce:
  1. Tell K3b to create an audio project.
  2. Tell K3b to include a FLAC file where its title is e.g. "Miąższość".

Actual Results:  
  2. CD-TEXT becomes "Mi__szo__".

Expected Results:  
  2. Let CD-TEXT become "Miazszosc".

OS: Linux (x86_64) release 2.6.37.1-1.2-desktop
Compiler: gcc

A similar problem is solved by Lynx pretty well.  Lynx even transcribes Cyrillic characters to ASCII.
Comment 1 James E. LaBarre 2021-02-04 02:25:46 UTC
Don't know about CD-TEXT, but I know that audio tags can be in other character sets, I have plenty of tracks with Japanese artists/titles/tags.  Current filesystems seem to have no problem with them either (although I have occasionally seen mangled filenames from some OS/filesystem that couldn't handle them).  

The problem with the example here is it presumes there are latin-1 characters that are near look-alikes.  This would not be the case with Kanji/kana, Cyrillic, Korean, Arabic, Chinese (I think there's at least two character sets there).

I bring this up because I had just tried bringing in just these sorts of files into K3B 20.08.1 (most recent in Fedora 33) and it still continues to replace non-Latin-1 characters with underscores.  Which makes it unusable for writing an audio CD