Bug 188187

Summary:	Use UTF-8 charset in MySQL database instead of Latin_1
Product:	[Frameworks and Libraries] Akonadi	Reporter:	rene
Component:	server	Assignee:	Volker Krause <vkrause>
Status:	RESOLVED INTENTIONAL
Severity:	wishlist	CC:	toma
Priority:	NOR
Version First Reported In:	unspecified
Target Milestone:	---
Platform:	Gentoo Packages
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description rene 2009-03-26 19:48:22 UTC

Version:           akonadi-server-1.1.1/akonadi-4.2.1 (using KDE 4.2.1)
Compiler:          gcc version 4.1.2 (Gentoo 4.1.2 p1.3) 
OS:                Linux
Installed from:    Gentoo Packages

The (KDE local) Arkonadi MySQL tables use the Latin_1 charset.

This limits the use of KDE-PIM/Akonadi to West-European languages.

Please use UTF-8 with utf8_general_ci collation instead.


KDE ist great!

Renne

Comment 1 Volker Krause 2009-03-28 16:03:53 UTC

Where do you see actual bugs caused by this and how do you trigger them?

We should have unit tests that cover non-latin1 strings for all fields that can contain them, which work here.

Comment 2 rene 2009-03-30 19:22:11 UTC

I have not found a actual bug but realized this when I tried to figure out the structure of the database. It is no bug but a design error.

Adding unit tests for Latin_1 will not solve the problem. If you want to store e.g. asian characters in a Latin_1 database you'll get garbled characters anyway.

The character set of Latin_1 is smaller than UTF-8/Unicode. So you will loose information by all means when transforming unicode text to Latin_1. Using Latin_1 in a backend makes KDE unusable for all non-west-european users.

Unicode was created to solve charset problems. Windoze and Java use UTF-16, which misses some asian languages. UTF-32 covers nearly all known characters on this planet (even hieroglyphs, mathematical and musical notation), but needs four bytes for each character. Because of that UNIX systems switch to UTF-8 which is a variable-length character encoding form of UTF-32. The IETF requires all new protocols to support UTF-8 (RFC 2277).

Because of that KDE should require the global use of UTF-8 in the programming guide lines.

See http://en.wikipedia.org/wiki/UTF-8 for further infromation.

Comment 3 Volker Krause 2009-03-30 22:10:25 UTC

I know what Unicode and UTF-8 is. KDE actually mandates the use of that for user-visible strings. And if you check the database schema closely you will see that columns containing such data (such as CollectionTable.name) use in fact UTF-8 encoding.

The remaining columns however contain internal data which cannot contain Unicode (eg. mimetypes). Using the (slightly slower) UTF-8 encoding is thus not needed there, Latin1 does the job just fine.

So, unless there are real bugs, I would not want to change anything there, risking to do more damage than good.