Summary: | Non-ASCII strings in Media Library: lowercase/uppercase conversion not working | ||
---|---|---|---|
Product: | [Applications] amarok | Reporter: | Thomas Herzog <thomas.herzog> |
Component: | general | Assignee: | Amarok Developers <amarok-bugs-dist> |
Status: | RESOLVED INTENTIONAL | ||
Severity: | normal | CC: | jiha.kde.bugzilla, stanley_87 |
Priority: | NOR | ||
Version: | 1.3.6 | ||
Target Milestone: | --- | ||
Platform: | unspecified | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Thomas Herzog
2005-11-16 00:24:24 UTC
Can confirm this here: Amarok should realize that these two letters are just uppercase and lowercase, but it treats them as if they were different letters. System: Suse Linux 10.0, KDE 3.5rc1 In other words, amaroK should use QString::localeAwareCompare. Na, not so sure, if this string goes directly in a DB query, it could also be the DB's fault...which could maybe circumvented "somehow".. Right, you have to tell the DB which encoding to use. MySQL has a lot of combinations of encoding and sorting possibilities. What is your DB configured to? I don't remember if the setting can be done per database or table. OK, seems like I found it: A collation can be defined to be used in SELECT statements: http://dev.mysql.com/doc/refman/4.1/en/case-sensitivity.html It seems, that amaroK should make the collation configurable which should be used when querying MySQL. Thus, one should be able to set a collation in the options, which will then be used by amaroK via the COLLATE option in SQL. Available collations are: mysql> SHOW COLLATION; +----------------------+----------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +----------------------+----------+-----+---------+----------+---------+ | big5_chinese_ci | big5 | 1 | Yes | Yes | 1 | | big5_bin | big5 | 84 | | Yes | 1 | | dec8_swedish_ci | dec8 | 3 | Yes | | 0 | | dec8_bin | dec8 | 69 | | | 0 | | cp850_general_ci | cp850 | 4 | Yes | | 0 | | cp850_bin | cp850 | 80 | | | 0 | | hp8_english_ci | hp8 | 6 | Yes | | 0 | | hp8_bin | hp8 | 72 | | | 0 | | koi8r_general_ci | koi8r | 7 | Yes | | 0 | | koi8r_bin | koi8r | 74 | | | 0 | | latin1_german1_ci | latin1 | 5 | | | 0 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | | latin1_danish_ci | latin1 | 15 | | | 0 | | latin1_german2_ci | latin1 | 31 | | Yes | 2 | | latin1_bin | latin1 | 47 | | Yes | 1 | | latin1_general_ci | latin1 | 48 | | | 0 | | latin1_general_cs | latin1 | 49 | | | 0 | | latin1_spanish_ci | latin1 | 94 | | | 0 | | latin2_czech_cs | latin2 | 2 | | Yes | 4 | | latin2_general_ci | latin2 | 9 | Yes | | 0 | | latin2_hungarian_ci | latin2 | 21 | | | 0 | | latin2_croatian_ci | latin2 | 27 | | | 0 | | latin2_bin | latin2 | 77 | | | 0 | | swe7_swedish_ci | swe7 | 10 | Yes | | 0 | | swe7_bin | swe7 | 82 | | | 0 | | ascii_general_ci | ascii | 11 | Yes | | 0 | | ascii_bin | ascii | 65 | | | 0 | | ujis_japanese_ci | ujis | 12 | Yes | Yes | 1 | | ujis_bin | ujis | 91 | | Yes | 1 | | sjis_japanese_ci | sjis | 13 | Yes | Yes | 1 | | sjis_bin | sjis | 88 | | Yes | 1 | | hebrew_general_ci | hebrew | 16 | Yes | | 0 | | hebrew_bin | hebrew | 71 | | | 0 | | tis620_thai_ci | tis620 | 18 | Yes | Yes | 4 | | tis620_bin | tis620 | 89 | | Yes | 1 | | euckr_korean_ci | euckr | 19 | Yes | Yes | 1 | | euckr_bin | euckr | 85 | | Yes | 1 | | koi8u_general_ci | koi8u | 22 | Yes | | 0 | | koi8u_bin | koi8u | 75 | | | 0 | | gb2312_chinese_ci | gb2312 | 24 | Yes | Yes | 1 | | gb2312_bin | gb2312 | 86 | | Yes | 1 | | greek_general_ci | greek | 25 | Yes | | 0 | | greek_bin | greek | 70 | | | 0 | | cp1250_general_ci | cp1250 | 26 | Yes | | 0 | | cp1250_czech_cs | cp1250 | 34 | | Yes | 2 | | cp1250_croatian_ci | cp1250 | 44 | | | 0 | | cp1250_bin | cp1250 | 66 | | | 0 | | gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | | gbk_bin | gbk | 87 | | Yes | 1 | | latin5_turkish_ci | latin5 | 30 | Yes | | 0 | | latin5_bin | latin5 | 78 | | | 0 | | armscii8_general_ci | armscii8 | 32 | Yes | | 0 | | armscii8_bin | armscii8 | 64 | | | 0 | | utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | | utf8_bin | utf8 | 83 | | Yes | 1 | | utf8_unicode_ci | utf8 | 192 | | Yes | 8 | | utf8_icelandic_ci | utf8 | 193 | | Yes | 8 | | utf8_latvian_ci | utf8 | 194 | | Yes | 8 | | utf8_romanian_ci | utf8 | 195 | | Yes | 8 | | utf8_slovenian_ci | utf8 | 196 | | Yes | 8 | | utf8_polish_ci | utf8 | 197 | | Yes | 8 | | utf8_estonian_ci | utf8 | 198 | | Yes | 8 | | utf8_spanish_ci | utf8 | 199 | | Yes | 8 | | utf8_swedish_ci | utf8 | 200 | | Yes | 8 | | utf8_turkish_ci | utf8 | 201 | | Yes | 8 | | utf8_czech_ci | utf8 | 202 | | Yes | 8 | | utf8_danish_ci | utf8 | 203 | | Yes | 8 | | utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 | | utf8_slovak_ci | utf8 | 205 | | Yes | 8 | | utf8_spanish2_ci | utf8 | 206 | | Yes | 8 | | utf8_roman_ci | utf8 | 207 | | Yes | 8 | | utf8_persian_ci | utf8 | 208 | | Yes | 8 | | ucs2_general_ci | ucs2 | 35 | Yes | Yes | 1 | | ucs2_bin | ucs2 | 90 | | Yes | 1 | | ucs2_unicode_ci | ucs2 | 128 | | Yes | 8 | | ucs2_icelandic_ci | ucs2 | 129 | | Yes | 8 | | ucs2_latvian_ci | ucs2 | 130 | | Yes | 8 | | ucs2_romanian_ci | ucs2 | 131 | | Yes | 8 | | ucs2_slovenian_ci | ucs2 | 132 | | Yes | 8 | | ucs2_polish_ci | ucs2 | 133 | | Yes | 8 | | ucs2_estonian_ci | ucs2 | 134 | | Yes | 8 | | ucs2_spanish_ci | ucs2 | 135 | | Yes | 8 | | ucs2_swedish_ci | ucs2 | 136 | | Yes | 8 | | ucs2_turkish_ci | ucs2 | 137 | | Yes | 8 | | ucs2_czech_ci | ucs2 | 138 | | Yes | 8 | | ucs2_danish_ci | ucs2 | 139 | | Yes | 8 | | ucs2_lithuanian_ci | ucs2 | 140 | | Yes | 8 | | ucs2_slovak_ci | ucs2 | 141 | | Yes | 8 | | ucs2_spanish2_ci | ucs2 | 142 | | Yes | 8 | | ucs2_roman_ci | ucs2 | 143 | | Yes | 8 | | ucs2_persian_ci | ucs2 | 144 | | Yes | 8 | | cp866_general_ci | cp866 | 36 | Yes | | 0 | | cp866_bin | cp866 | 68 | | | 0 | | keybcs2_general_ci | keybcs2 | 37 | Yes | | 0 | | keybcs2_bin | keybcs2 | 73 | | | 0 | | macce_general_ci | macce | 38 | Yes | | 0 | | macce_bin | macce | 43 | | | 0 | | macroman_general_ci | macroman | 39 | Yes | | 0 | | macroman_bin | macroman | 53 | | | 0 | | cp852_general_ci | cp852 | 40 | Yes | | 0 | | cp852_bin | cp852 | 81 | | | 0 | | latin7_estonian_cs | latin7 | 20 | | | 0 | | latin7_general_ci | latin7 | 41 | Yes | | 0 | | latin7_general_cs | latin7 | 42 | | | 0 | | latin7_bin | latin7 | 79 | | | 0 | | cp1251_bulgarian_ci | cp1251 | 14 | | | 0 | | cp1251_ukrainian_ci | cp1251 | 23 | | | 0 | | cp1251_bin | cp1251 | 50 | | | 0 | | cp1251_general_ci | cp1251 | 51 | Yes | | 0 | | cp1251_general_cs | cp1251 | 52 | | | 0 | | cp1256_general_ci | cp1256 | 57 | Yes | | 0 | | cp1256_bin | cp1256 | 67 | | | 0 | | cp1257_lithuanian_ci | cp1257 | 29 | | | 0 | | cp1257_bin | cp1257 | 58 | | | 0 | | cp1257_general_ci | cp1257 | 59 | Yes | | 0 | | binary | binary | 63 | Yes | Yes | 1 | | geostd8_general_ci | geostd8 | 92 | Yes | | 0 | | geostd8_bin | geostd8 | 93 | | | 0 | | cp932_japanese_ci | cp932 | 95 | Yes | Yes | 1 | | cp932_bin | cp932 | 96 | | Yes | 1 | +----------------------+----------+-----+---------+----------+---------+ Note the collations "_ci", which are meant to be case-insensitive. So I think, if amaroK saves as UTF8, the collations utf8_XY_ci and the collations latin1_XY_ci should be at least selectable. This defines sorting values for all characters of the respective language, thus ü and Ü and ä and Ä and ö and Ö should have the same value -> will be both returned by the LIKE operator... Would be great, if the devs could include such a configuration fpr MySQL. It should also apply to PostgreSQL as well! So, detailed analysis follows: ################################## mysql> select * from artist; +----+---------------------+ | id | name | +----+---------------------+ | 1 | James Newton Howard | | 2 | Depeche Mode | | 3 | Die Ärzte | | 4 | H.I.M | | 5 | Iron Maiden | | 6 | Klaus Badelt | | 7 | Nickelback | | 8 | Rammstein | | 9 | Seeed | | 10 | The Bloodhound Gang | | 11 | The Rolling Stones | +----+---------------------+ 11 rows in set (0.03 sec) mysql> select * from artist where name LIKE '%ä%'; Empty set (0.00 sec) mysql> select * from artist where name LIKE '%Ä%'; +----+------------+ | id | name | +----+------------+ | 3 | Die Ärzte | +----+------------+ 1 row in set (0.00 sec) ############### But reselecting and defining a COLLATION simply does not work correctly. Maybe some friggin MySQL bug *shame on them* But there is maybe a simple solution to this: - save a correctly lowercased expression in the database for the string fields? - do a SELECT with both uppercase and lowercase expressions (will fail, if there are MIXED Cases for different characters in the string) Or has someone been able to get its MySQL doing the above query with "ä" and "Ä" correctly? Jesus, amaroK could encode strings e.g. in HTML encoding? So encode the "Umlaute" with Ä or ä Any DB should be able to use these values in strings without Problems. Don't know, if this imposes a restriction in available chars for asian languages? HTML entities won't solve here. *** Bug 115684 has been marked as a duplicate of this bug. *** According to bug 104610, the problem with the MySQL backend is solved. For other backends, I think you should file wishlists with the backend authors to have the set up proper case insensitive lookups. Hm, the "bug solved" on Bug 104610 is from 09/23/2005 I can reproduce the Bug on amaroK 1.3.6 release 1, dating 10/08/2005... This hardly qualifies as "fixed" ?!?! This bug still exists using the Sqlite engine, is this a bug with the way AmaroK handles Sqlite, or is Sqlite just broken with regards to this issue? Unfortunately this is a SQLite issue. From sqlite.org, and note the section in parentheses marked "a bug": "The LIKE operator is not case sensitive and will match upper case characters on one side against lower case characters on the other. (A bug: SQLite only understands upper/lower case for 7-bit Latin characters. Hence the LIKE operator is case sensitive for 8-bit iso8859 characters or UTF-8 characters. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.)." This means that unless items are stored (incorrectly) as all upper or lowercase in the database, SQLite cannot handle the type of behavior desired and reported in this bug. If this is fixed in SQLite it can be fixed in amaroK; until then you'll need to use something else (i.e. MySQL) if you want the correct behavior. Also, you can pester the SQLite guys to fix it. |