Bug 140005 - ajax in UTF-8 doesn't parse cyrillic encoding properly (google example)
Summary: ajax in UTF-8 doesn't parse cyrillic encoding properly (google example)
Status: RESOLVED WORKSFORME
Alias: None
Product: konqueror
Classification: Applications
Component: khtml ecma (show other bugs)
Version: unspecified
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: Konqueror Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-01-13 05:22 UTC by Anton
Modified: 2007-09-18 18:51 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Anton 2007-01-13 05:22:59 UTC
Version:            (using KDE KDE 3.5.5)
Installed from:    Gentoo Packages
Compiler:          gcc version 4.1.1 (Gentoo 4.1.1-r3) CFLAGS="-O2 -march=pentium4m -pipe -msse2 -mfpmath=sse"
OS:                Linux

Subj.
A javascript alert message box with received AJAX data should be a very simple testcase.
This is different from the bug 130234 because server gives proper UTF-8 encoding HTML header and doesn't display it properly.

I experience it with google wadgets available from personal google page http://www.google.com/ig (both bugtracks and gmail, see screenshoot)
http://img145.imageshack.us/img145/3959/konqgoogleiglt7.png 

and also with my own application.
Let me know If you need a real testcase.
Comment 1 Igor 2007-09-15 09:30:54 UTC
I have this behaviour in zooomr, when I try to name the photos. So I can confirm this bug (ArchLinux, Slackware). But in Kubuntu it works as I can remember.
Comment 2 Maksim Orlovich 2007-09-15 17:46:58 UTC
First of all, you want to be using at least 3.5.6, since that fixes problems with unicode support in regular expressions (and you need to make sure your libpcre has utf-8 support --- unfortunately it's possible for its support to be missing at runtime, even ...).  That's quite likely the difference. 

The bugtraq.ru widget works for me, so I can't confirm. Of course, if you have a testcase, that would let me be sure...

Comment 3 Anton 2007-09-16 01:03:09 UTC
I have tried kde 3.5.7 and the problem has gone.
Igor, please try the latest version too.
Comment 4 Igor 2007-09-18 07:49:28 UTC
I have the latest stable version, 3.5.7. BugTraq.ru and GMail work for me also, but Zooomr doesn't work properly. But after refreshing the page the text is in proper encoding and a can read it.
Here is a screenshot: http://img451.imageshack.us/img451/9082/snapshot1ez6.png
Comment 5 Anton 2007-09-18 09:21:29 UTC
Can you describe steps before refreshing page as well, so we could reproduce it and create a test case please?
Do I need to have a Zooomr account? Do I need to upload picture?
Comment 6 Igor 2007-09-18 09:27:19 UTC
You need to register on Zooomr.com, then upload a picture, then you simply give it a name or tag (in russian), then you will see this behaviour. Then you refresh the page and see the proper text.
Comment 7 Anton 2007-09-18 10:08:09 UTC
ok, I managed to reproduce. Here is the JSON respond from the server:

HTTP/1.0 200 OK
Connection: keep-alive
Status: 200 OK, 200 OK
Content-Language: en
x-zmr-token: 238256
Vary: Accept-Language, Cookie
server: ZAPI/0.9r3, lighttpd/1.4.18
date: Tue, 18 Sep 2007 07:54:37 GMT, Tue, 18 Sep 2007 07:54:37 GMT
Content-Type: text/javascript
Content-length: 1364
Keep-Alive: timeout=30, max=100

{"photo": {"sizemax": 16, "description": {"_content": "тест ТЕСТ ТЕСТ"}, <the rest of long respond in unicode skipped>

As you can see, the webserver does not return encoding specs like ";charset=UTF-8" in the content-type header.

Maksim, shouldn't it be UTF-8 by default if not specified?..

Comment 8 Maksim Orlovich 2007-09-18 16:11:26 UTC
Seems like it. Except I can't even figure out how the heck it sets it for xml by default. Encoding detection is icky.
Comment 9 Igor 2007-09-18 16:57:53 UTC
So is it a bug in konqueror or in the server?
Comment 10 Anton 2007-09-18 17:05:37 UTC
I was involved in fixing of another ajax encoding bug and know that you mean. But I guess it's time to fix FIXME lines :)

kdelibs-3.5.7/khtml/ecma/xmlhttprequest.cpp:

    decoder = new Decoder;
    if (!encoding.isNull())
      decoder->setEncoding(encoding.latin1(), Decoder::EncodingFromHTTPHeader);
    else {
      // FIXME: Inherit the default encoding from the parent document?
    }

Not 100% sure if it's the right place. My konqueror's encoding settings is "system's default" and system uses UTF8.

Igor: I'm not sure exactly. I think both, because server should tell your browser that the encoding is and the browser should use the default settings from settings->fonts->default encoding.
Comment 11 Maksim Orlovich 2007-09-18 17:29:50 UTC
I am not sure the comment is correct, since the proposed XMLHttpRequest spec gives a whole honking algorithm for determining things, that's dependent on content-type and everything. Lots of stuff in that module needs cleanup badly.. :(
Comment 12 Anton 2007-09-18 18:51:48 UTC
hm... since my original issue has been fixed I'm marking it as "Resolved".

The issue with Zooomr is duplicating the bug http://bugs.kde.org/show_bug.cgi?id=130234.
You are right about major cleanup and I'll leave it with you. Thanks for the help ;-)

ps FYI. I reported about similar bug with encoding about 2 years ago in bug http://bugs.kde.org/show_bug.cgi?id=110768 and that's why we have extra detection code (mess) over there.