Bug 157108 - Accented characters are not passed correctly to the search engine
Summary: Accented characters are not passed correctly to the search engine
Status: RESOLVED FIXED
Alias: None
Product: konqueror
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Konqueror Developers
URL:
Keywords:
: 157267 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-02-01 23:19 UTC by András Manţia
Modified: 2008-02-11 22:13 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description András Manţia 2008-02-01 23:19:56 UTC
Version:            (using Devel)
Installed from:    Compiled sources

Set Google as the default search engine, type a text like "néma" into the location bar. Instead of searching for "néma", the search will be for "n   ma".
The correct translated URL would be (taken from KDE3.5):
http://www.google.com/search?client=safari&rls=en-us&q=n%C3%A9ma&ie=UTF-8&oe=UTF-8

The KDE4, wrong version is:
http://www.google.com/search?q=n%EF%BF%BDma&ie=UTF-8&oe=UTF-8
Comment 1 FiNeX 2008-02-02 12:12:41 UTC
What version/revision did you used?
Comment 2 András Manţia 2008-02-02 12:58:28 UTC
Yesterday's trunk, r769715 for Konqueror.
Comment 3 András Manţia 2008-02-09 20:33:47 UTC
*** Bug 157267 has been marked as a duplicate of this bug. ***
Comment 4 András Manţia 2008-02-09 22:00:15 UTC
Fixed in trunk, backport will follow.
Comment 5 András Manţia 2008-02-10 00:18:15 UTC
SVN commit 773034 by amantia:

Revert r772964, because it breaks the unit test and I couldn't figure out why.
Unfortunately, I need to reopen the bug.
CCBUG: 157108

 M  +1 -1      kuriikwsfiltereng.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=773034
Comment 6 András Manţia 2008-02-10 00:25:18 UTC
Just a note about why I reopened:
- if the text is entered in Konqueror/Krunner, etc. the "query" in line 423 of kuriikwsfiltereng.cpp is encoded in utf-16, so for example é appears as \351.
Passing this string as latin1 string to QUrl::fromPercentEncoding generates the wrongly encoded uri. Passing it as utf8 makes it work.
- if the kurifiltertest.cpp is used, it tests the filtering with the "gg:é" string. But in this case the "query" at the above line will be \303 \251, which is the utf-8 encoding of the é character. QUrl::fromPercentEncoding generates wrongly encoded string if this is passes as utf8, but correct one if passed as latin1 (basically this string does not need to be changed).

Maybe it is too late, but it is beyond me why the "query" is different if it is got from the lineedit, than when it is passed from code.
Comment 7 András Manţia 2008-02-10 00:26:57 UTC
Thiago, if you understand this, please help us. :)
Comment 8 Thiago Macieira 2008-02-10 08:58:43 UTC
The test definition is correct:
   gg:é -> http://www.google.com/search?q=%C3%A9&ie=UTF-8&oe=UTF-8

But the implementation is obviously wrong (I debugged konqueror all the way to it -- any call to toLatin1() or fromLatin1() on user input is wrong by definition). 

Your fix is correct. Which means there's something wrong with the test, somewhere.
Comment 9 András Manţia 2008-02-10 09:09:04 UTC
Thanks Thiago, now I just have to understand why the test is wrong. :)

While debugging this, I found the following in kurl.h:
  /**
   * Decode %-style encoding and convert from local encoding to unicode.
   * Reverse of encode_string()
   * @param str String to decode (can be QString()).
   *
   * @deprecated use QUrl::fromPercentEncoding(encodedURL) instead, but
   * note that it takes a QByteArray and not a QString. Which makes 
sense since
   * everything is 7 bit (ascii) when being percent-encoded.
   *
   */
  static KDE_DEPRECATED QString decode_string(const QString &str) {
      return QUrl::fromPercentEncoding( str.toLatin1() );
  }


And I'm still confused if this is correct or not, because of the Qt 
docs:
 
QString QUrl::fromPercentEncoding ( const QByteArray & input )   
[static]
Returns a decoded copy of input. input is first decoded from percent 
encoding, then converted from UTF-8 to unicode.

As I read fromPercentEncoding decodes % encoding and converts from utf-8 
to unicode, while for KUrl::decode_string says it deals with 7-bit 
chars.
Comment 10 Thiago Macieira 2008-02-10 09:17:44 UTC
The answer is very simple: the autotest is broken.
Comment 11 András Manţia 2008-02-10 19:17:19 UTC
SVN commit 773275 by amantia:

Commit the fix again.

BUG: 157108

 M  +1 -1      kuriikwsfiltereng.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=773275
Comment 12 Thiago Macieira 2008-02-10 19:54:26 UTC
SVN commit 773299 by thiago:

Fix test: QString::sprintf is the wrong encoding.

CCBUG:157108


 M  +2 -2      kurifiltertest.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=773299
Comment 13 András Manţia 2008-02-11 22:13:13 UTC
SVN commit 773823 by amantia:

Backport fix for "Accented characters are not passed correctly to the search engine".

CCBUG: 157108

 M  +1 -1      kuriikwsfiltereng.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=773823