268587 – Let K3b use intelligent transcription for non-Western European data

Bug 268587 - Let K3b use intelligent transcription for non-Western European data

Summary: Let K3b use intelligent transcription for non-Western European data

Status:	REPORTED

Alias:	None

Product:	k3b
Classification:	Applications
Component:	Audio Project (other bugs)
Version First Reported In:	2.0.2
Platform:	openSUSE Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	K3b Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-03-15 21:50 UTC by Christopher Yeleighton
Modified:	2021-02-04 02:25 UTC (History)
CC List:	2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Christopher Yeleighton 2011-03-15 21:50:35 UTC

Version:           2.0.2 (using KDE 4.6.0) 
OS:                Linux

I do not know whether CD-TEXT allows characters outside ISO Latin-1 character set, as this is a trade secret of Philips, but it seems K3b requires the texts to be in ISO Latin-1.  K3b uses meta-data encapsulated in media files to generate the CD-TEXT; if the meta-data contain characters outside of ISO Latin-1, they are replaced with an underscore.  This is far from a perfect solution.  

Reproducible: Always

Steps to Reproduce:
  1. Tell K3b to create an audio project.
  2. Tell K3b to include a FLAC file where its title is e.g. "Miąższość".

Actual Results:  
  2. CD-TEXT becomes "Mi__szo__".

Expected Results:  
  2. Let CD-TEXT become "Miazszosc".

OS: Linux (x86_64) release 2.6.37.1-1.2-desktop
Compiler: gcc

A similar problem is solved by Lynx pretty well.  Lynx even transcribes Cyrillic characters to ASCII.

Comment 1 James E. LaBarre 2021-02-04 02:25:46 UTC

Don't know about CD-TEXT, but I know that audio tags can be in other character sets, I have plenty of tracks with Japanese artists/titles/tags.  Current filesystems seem to have no problem with them either (although I have occasionally seen mangled filenames from some OS/filesystem that couldn't handle them).  

The problem with the example here is it presumes there are latin-1 characters that are near look-alikes.  This would not be the case with Kanji/kana, Cyrillic, Korean, Arabic, Chinese (I think there's at least two character sets there).

I bring this up because I had just tried bringing in just these sorts of files into K3B 20.08.1 (most recent in Fedora 33) and it still continues to replace non-Latin-1 characters with underscores.  Which makes it unusable for writing an audio CD