Bug 101211

Summary: All KDE Applications Are Fail With The Turkish Language When Changing Case of A Turkish (i/ı) Character
Product: [Unmaintained] kdelibs Reporter: S.Çağlar ONUR <caglar>
Component: qtAssignee: Thiago Macieira <thiago>
Status: RESOLVED LATER    
Severity: normal CC: baris, ismail
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description S.Çağlar ONUR 2005-03-10 01:37:48 UTC
Version:            (using KDE KDE 3.3.2)
Installed from:    Gentoo Packages
Compiler:          gcc (GCC) 3.3.5  (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1) 
OS:                Linux

As described in http://www.i18nguy.com/unicode/turkish-i18n.html web page, Turkish have 4 "i"'s. So converting "lower dotted i" to upper results "Upper dotted I" and "lower dotless i" to the "upper dotless I" also vice versa.

For example, in KWrite if one write "türkiye", select it and change to Upper Case ( Ctrl-U ) resulting string is "TÜRKIYE" which is wrong. Then selecting "TÜRKIYE" and changing it to lower case ( Ctrl-Shift-U ) also gives the wrong string "türkiye" again.

"türkiye" / Ctrl-U / "TÜRKIYE" 
"TÜRKIYE" / Ctrl-Shift-U / "türkiye"
"TÜRKİYE" / Ctrl-Shift-U / "türkiye" --> ToLower is working

Another example is;

"barış" / Ctrl-U / "BARIŞ" --> ToUpper is working
"BARIŞ" / Ctrl-Shift-U / "bariş" --> ToLower is not working
"BARİŞ" / Ctrl-Shift-U / "bariş" --> ToLower is working

Expected Behaviours are like this;

#1
"türkiye" / Ctrl-U / "TÜRKİYE"
"türkıye" / Ctrl-U / "TÜRKIYE"
"TÜRKIYE" / Ctrl-Shift-U / "türkıye"
"TÜRKİYE" / Ctrl-Shift-U / "türkiye"

#2
"barış" / Ctrl-U / "BARIŞ"
"bariş" / Ctrl-U / "BARİŞ"
"BARIŞ" / Ctrl-Shift-U / "barış" 
"BARİŞ" / Ctrl-Shift-U / "bariş" 

So converting "lower dotless i" to its upper case is working but others not.

I looked source code and found the problematic parts are QChar's upper/lower functions defined in QT. These functions are defined as inline functions in qt-VERSION/src/tools/qunicodetables_p.h file, between the lines 107 to 143.

If i'm not wrong QString::upper also uses QChar::upper so it also behaves wrong.
Comment 1 S.Çağlar ONUR 2005-03-10 01:39:28 UTC
By the way, all system locales are set to "tr_TR.UTF-8" and KDE's Language is Turkish.
Comment 2 Thiago Macieira 2005-03-10 03:13:13 UTC
Does Qt provide any locale-dependent uppercasing function?

If not, then it is a Qt bug and should be reported to them. We should not be the ones to include the uppercasing tables.
Comment 3 Anders Lund 2005-03-10 09:20:32 UTC
Neither QString not QChar provides other transformation methods than upper() and lower().

Where can we go from here?
Comment 4 Anders Lund 2005-03-10 10:12:17 UTC
Reported to trolltech.
Comment 5 Baris Metin 2005-03-10 18:53:59 UTC
Thank you very much for your quick response.

In fact upper() and lower() functions are locale-dependent. But they can't handle a change of a byte-count change in case conversions.

It was the same problem with gawk (GNU AWK), which we fixed. Bug reports are in Turkish but maybe patch can help understanding the problem. (http://bugs.uludag.org.tr/attachment.cgi?id=13&action=view)

This is a serious problem for Turkish and Azerbeijani locales. This simply makes applications (most parts of them) useless in these locales.

Case conversions and chage in the byte-count is documented in glibc documentation, but very few program implement this right :(.

Do we have any thing to do for bug to fixed more quickly?
Comment 6 Anders Lund 2005-03-10 18:59:10 UTC
On Thursday 10 March 2005 18:54, Baris Metin wrote:
> Do we have any thing to do for bug to fixed more quickly?


Work with trolltect is better than here. I reported the issue to them, an I 
hope they take it seriously, and follow this issue. But I can't know.

-anders
Comment 7 Thiago Macieira 2005-03-11 03:09:06 UTC
I'm sorry, but your assessment is not correct.

upper() and lower() should be locale-dependent, but they are not. They use the standard rules in the Unicode tables, and do not factor into the algorithm the exceptions. I have just read the code in qunicodetables_p.h to confirm that.

There's also no problem with byte count, since all characters are 2-byte Unicode codepoints (UTF-16). Nothing inside the Basic Multilingual Plane can case-fold to outside it. But, you are right: uppercasing, lowercasing and titlecasing are *string* operations, not *character* operations. The QChar::upper() & family functions should not be used.

The issue here lies with Qt. Has anyone reported this to them? If not, I will do so. If you have, please point them to this bug report.
Comment 8 Thiago Macieira 2005-03-11 03:33:05 UTC
Assigning to null isn't nice.
Comment 9 Baris Metin 2005-03-11 08:36:14 UTC
You are right Thiago. I think my assessment is true for only QCString, which call toupper/tolower directly ( if Qt doesn't do some function overloading for these ).

We have reported the problem and I think Andres did too.
Comment 10 Anders Lund 2005-03-11 10:53:37 UTC
On Friday 11 March 2005 03:09, Thiago Macieira wrote:
> The issue here lies with Qt. Has anyone reported this to them? If not, I
> will do so. If you have, please point them to this bug report

I repeat, I have mailed qt-bugs about this.

-anders
Comment 11 Thiago Macieira 2005-03-11 11:38:25 UTC
Ok, record here the Qt Issue numbers that Trolltech assigns to you.

I have also confirmed that Qt4 suffers from the same problem.

I am closing the bug report with LATER because there's nothing we can do right now (that doesn't mean we can't discuss).
Comment 12 Ismail Donmez 2005-09-10 16:07:57 UTC
Fixed in Qt4 (tested with Qt 4.0.1) using QByteArray which replaced QCString.