Summary: | Use UTF-8 charset in MySQL database instead of Latin_1 | ||
---|---|---|---|
Product: | [Frameworks and Libraries] Akonadi | Reporter: | rene |
Component: | server | Assignee: | Volker Krause <vkrause> |
Status: | RESOLVED INTENTIONAL | ||
Severity: | wishlist | CC: | toma |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Gentoo Packages | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: |
Description
rene
2009-03-26 19:48:22 UTC
Where do you see actual bugs caused by this and how do you trigger them? We should have unit tests that cover non-latin1 strings for all fields that can contain them, which work here. I have not found a actual bug but realized this when I tried to figure out the structure of the database. It is no bug but a design error. Adding unit tests for Latin_1 will not solve the problem. If you want to store e.g. asian characters in a Latin_1 database you'll get garbled characters anyway. The character set of Latin_1 is smaller than UTF-8/Unicode. So you will loose information by all means when transforming unicode text to Latin_1. Using Latin_1 in a backend makes KDE unusable for all non-west-european users. Unicode was created to solve charset problems. Windoze and Java use UTF-16, which misses some asian languages. UTF-32 covers nearly all known characters on this planet (even hieroglyphs, mathematical and musical notation), but needs four bytes for each character. Because of that UNIX systems switch to UTF-8 which is a variable-length character encoding form of UTF-32. The IETF requires all new protocols to support UTF-8 (RFC 2277). Because of that KDE should require the global use of UTF-8 in the programming guide lines. See http://en.wikipedia.org/wiki/UTF-8 for further infromation. I know what Unicode and UTF-8 is. KDE actually mandates the use of that for user-visible strings. And if you check the database schema closely you will see that columns containing such data (such as CollectionTable.name) use in fact UTF-8 encoding. The remaining columns however contain internal data which cannot contain Unicode (eg. mimetypes). Using the (slightly slower) UTF-8 encoding is thus not needed there, Latin1 does the job just fine. So, unless there are real bugs, I would not want to change anything there, risking to do more damage than good. |