Bug 188187

Summary: Use UTF-8 charset in MySQL database instead of Latin_1
Product: [Frameworks and Libraries] Akonadi Reporter: rene
Component: serverAssignee: Volker Krause <vkrause>
Status: RESOLVED INTENTIONAL    
Severity: wishlist CC: toma
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:

Description rene 2009-03-26 19:48:22 UTC
Version:           akonadi-server-1.1.1/akonadi-4.2.1 (using KDE 4.2.1)
Compiler:          gcc version 4.1.2 (Gentoo 4.1.2 p1.3) 
OS:                Linux
Installed from:    Gentoo Packages

The (KDE local) Arkonadi MySQL tables use the Latin_1 charset.

This limits the use of KDE-PIM/Akonadi to West-European languages.

Please use UTF-8 with utf8_general_ci collation instead.


KDE ist great!

Renne
Comment 1 Volker Krause 2009-03-28 16:03:53 UTC
Where do you see actual bugs caused by this and how do you trigger them?

We should have unit tests that cover non-latin1 strings for all fields that can contain them, which work here.
Comment 2 rene 2009-03-30 19:22:11 UTC
I have not found a actual bug but realized this when I tried to figure out the structure of the database. It is no bug but a design error.

Adding unit tests for Latin_1 will not solve the problem. If you want to store e.g. asian characters in a Latin_1 database you'll get garbled characters anyway.

The character set of Latin_1 is smaller than UTF-8/Unicode. So you will loose information by all means when transforming unicode text to Latin_1. Using Latin_1 in a backend makes KDE unusable for all non-west-european users.

Unicode was created to solve charset problems. Windoze and Java use UTF-16, which misses some asian languages. UTF-32 covers nearly all known characters on this planet (even hieroglyphs, mathematical and musical notation), but needs four bytes for each character. Because of that UNIX systems switch to UTF-8 which is a variable-length character encoding form of UTF-32. The IETF requires all new protocols to support UTF-8 (RFC 2277).

Because of that KDE should require the global use of UTF-8 in the programming guide lines.

See http://en.wikipedia.org/wiki/UTF-8 for further infromation.
Comment 3 Volker Krause 2009-03-30 22:10:25 UTC
I know what Unicode and UTF-8 is. KDE actually mandates the use of that for user-visible strings. And if you check the database schema closely you will see that columns containing such data (such as CollectionTable.name) use in fact UTF-8 encoding.

The remaining columns however contain internal data which cannot contain Unicode (eg. mimetypes). Using the (slightly slower) UTF-8 encoding is thus not needed there, Latin1 does the job just fine.

So, unless there are real bugs, I would not want to change anything there, risking to do more damage than good.