Bug 135852 - KURL::decode_string problem with encoded UTF-8 sequences
Summary: KURL::decode_string problem with encoded UTF-8 sequences
Status: RESOLVED INTENTIONAL
Alias: None
Product: kdelibs
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Stephan Kulow
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-18 01:34 UTC by Kevin Krammer
Modified: 2006-10-24 14:36 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Testcase (798 bytes, text/plain)
2006-10-22 18:19 UTC, Kevin Krammer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Krammer 2006-10-18 01:34:37 UTC
Version:            (using KDE Devel)
Installed from:    Compiled sources
OS:                Linux

There is a bug in KURL's decoding algorithm in 3.5 branch. It does not correctly handle escaped UTF-8 sequences

It can be triggered like this

$ kfmclient exec "mailto:test@foo?subject=%C3%9Cberraschung"

The subject should be an uppercase German umlaut character, if I am not mistaken the HTML entity would be Ü

I tracked it down to KApplication::invokeMailer, where KURL::decode_string is called on the content of the subject "query"

QUrl::decode works correctly (Qt3.3.6 from Debian/unstable packages)
Comment 1 Kevin Krammer 2006-10-18 01:59:46 UTC
Most of the encoding/decoding code seems to have been contributed by Waldo, CC'ing in case he has an idea where it might happen
Comment 2 Kevin Krammer 2006-10-22 18:19:46 UTC
Created attachment 18226 [details]
Testcase

Be sure to save as UTF-8 encoded to make fromUtf8() return the correct string.
Comment 3 Kevin Krammer 2006-10-22 18:26:27 UTC
Seems to be a matter of the encoding hint.

The input of my original test case comes from xdg-email, which can encode its input to UTF-8 before turning it into a mailto URI (option --utf8)

Comment 4 Stephan Kulow 2006-10-23 11:52:28 UTC
David might have more info about this problem
Comment 5 David Faure 2006-10-23 12:30:02 UTC
kfmclient exec "mailto:test@foo?subject=%C3%9Cberraschung"  works fine for me on an utf8 system (LANG="en_US.UTF-8"). And the testcase works too, but this only shows that UTF8 systems are no good for debugging such problems, since local8bit/utf8 confusions don't trigger problems.

Anyway, let's see what would happen on a non-utf8 system. The decode_string in KApplication::invokeMailer doesn't know how the url was escaped [we would need to pass the mib enum to invokeMailer, but we wouldn't know how to set it when the url comes from the command line anyway], and local8bit is assumed instead. And then the other invokeMailer line 2389 uses encore_string again, assuming local8bit again (which matches the way kmail will receive the arguments, too).
I think you're simply supposed to give invokeMailer a url that was created using the locale, not using utf8. I don't see a fix for this, there's no way to know if a url was encoded using utf8 or the locale.

For kde4 the use of encodings in kurl/qurl has been removed, it's all utf8 there, so the problem is actually gone [or rather, reversed - urls encoded with the locale won't work anymore, but the goal is to phase those out].
Comment 6 Kevin Krammer 2006-10-23 12:46:09 UTC
Thank you for the explanation

I just got confused because QUrl::decode (Qt3) worked, but I guess it just assumes UTF-8 source encoding.

I'll check how gnome-open/exo-open handle local8bit vs. utf8 in encoded URLs.
Comment 7 Kevin Krammer 2006-10-23 13:44:45 UTC
Ok, that gnome-open, exo-open statement was stupid, obviously they are also just passing on the URI to the mail application.

Anyway:
Thunderbird handles the UTF-8 encoded URI but fails at the local8bit one

Evolution handles both correctly

KMail just parses the recepient and "looses" everything else. I am going to check if there is a bug report about this (tried with KMail already running and with KMail not running in case the UniqueApp communication causes the problem)

The question is, how does Evolution manage to get both variants working?
Comment 8 David Faure 2006-10-23 15:12:45 UTC
On Monday, October 23, 2006 01:44:47 PM Kevin Krammer wrote:
> The question is, how does Evolution manage to get both variants working?

Content-based heuristics to determine the codec, I can't think of any other way.
Very fragile. Ask e.g. Thiago ;)
Comment 9 Kevin Krammer 2006-10-23 15:30:49 UTC
ok, "guessing" :)

How about adding an encoding hint to the query

mailto:test@foo?subject=%C3%9Cberraschung&encoding=utf8

Comment 10 David Faure 2006-10-23 15:44:43 UTC
On Monday, October 23, 2006 03:30:50 PM Kevin Krammer wrote:
> mailto:test foo?subject=%C3%9Cberraschung&encoding=utf8

And webpages are going to follow kde's new standard? :-)

I thought it was either
- apps call invokeMailer and can then build the url correctly (i.e. using KURL, which means locale encoding in kde3)
- webpages use mailto urls and we can't know the encoding.
Now I see, I guess you also want to support scripts and stuff... But I'm very much against
adding an option in kde3 that we know already we won't be able to support in kde4...
Comment 11 Kevin Krammer 2006-10-23 19:58:46 UTC
Yes, the idea was to add this option in xdg-email when its caller uses the --utf8 option of that script.

I guess the usual case will be calls with local encoding, but I imagine that ISV application might want to "play save" and encode to UTF-8 before calling xdg-email and then KDE will fail.

Unfortunately the other mailers likely used by GNOME or XFCE will especially handle UTF-8 correctly (see last comment) and only KDE will appear broken :(

Hmm, how about specifying the encoding as an additional argument to kfmclient

kfmclient exec mailto:utf8encodeduri "uri/mailto;utf8"

Comment 12 David Faure 2006-10-23 20:03:32 UTC
On Monday, October 23, 2006 07:58:47 PM Kevin Krammer wrote:
> kfmclient exec mailto:utf8encodeduri "uri/mailto;utf8"

Well or kfmclient --utf8 exec mailto:foo
but you'd have to check if you can implement it without touching kdecore (preferrably).
Which probably means decoding and reencoding the url, if locale!=8bit.
Comment 13 Kevin Krammer 2006-10-23 20:55:31 UTC
I would have preferred an additional parameter for keeping calling compatability with older kfmclient versions, but kfmclient
a) does a trader query for the given MIME type
and
b) doesn't understand the ";charset=encoding" part

kmailservice unfortunately also fails when given a second argument :(

Independent from the transportation of the encoding hint, I see two options on how to implement it:

- add an KApplikcation::invokeMailer method that also takes an encoding hint
- copy the code from KApplication::invokeMailer(const KURL&) to kmailservice and apply the encoding hint there.

I guess the second option is preferable.

Recoding is IMHO out of question since the UTF8 encoded text might not be encodable in local8Bit
Comment 14 Kevin Krammer 2006-10-23 22:09:29 UTC
Hmm, another idea:

Assuming that a KDE application can have a different locale setting than KDE itself if its environment is different:
if we set the environment to a UTF8 locale before calling kmailservice, KURL::decode_string would decode to utf8, right?

Comment 15 Kevin Krammer 2006-10-23 22:54:31 UTC
$ LC_ALL=C.UTF-8 kmailservice mailto:test@foo?subject=%C3%9Cberraschung

seems to work :)

I am closing this as WONTFIX

Reference of related Portland bug if interested:
https://bugs.freedesktop.org/show_bug.cgi?id=8740
Comment 16 David Faure 2006-10-23 23:12:45 UTC
> $ LC_ALL=C.UTF-8 kmailservice mailto:test foo?subject=%C3%9Cberraschung

Excellent solution, I'm impressed ;)

"UTF8 encoded text might not be encodable in local8Bit" still applies though. kmailservice
still has to convert it to local8bit before calling kmail, which expects local8bit input. I'm quite
sure that nothing fixes the case where the characters are actually not representable in the
current locale [well, kmail would have to take a url as input instead of -subject <subject>].
Comment 17 Kevin Krammer 2006-10-23 23:26:24 UTC
> Excellent solution, I'm impressed ;)


Thanks :)

> "UTF8 encoded text might not be encodable in local8Bit" still applies
> though. kmailservice still has to convert it to local8bit before calling
> kmail, which expects local8bit input. I'm quite sure that nothing fixes the
> case where the characters are actually not representable in the current
> locale [well, kmail would have to take a url as input instead of -subject
> <subject>].


Well, theoretically KMail can take an URL as a commandline option, however it 
cannot handle the query part correctly

See http://bugs.kde.org/show_bug.cgi?id=136183

I am wondering if kmailservice could use some DCOP API of kmail to pass the 
strings in unicode?
Comment 18 David Faure 2006-10-24 10:21:24 UTC
On Tue Oct 24 2006, Kevin Krammer wrote:
> I am wondering if kmailservice could use some DCOP API of kmail to pass the 
> strings in unicode?


Sounds complicated, given that kmail might or might not be running... and given
that kmail might not be the selected mail client, even.
It's a half solution given that kmailservice is a wrapper around invokeMailer so fixing
it in invokeMailer would be better anyway.
But yeah the real problem is that if kmail isn't running, the only solution is "kmail <arguments>"
since "launching kmail first and then using dcop" means getting a kmail mainwindow, which isn't wanted.
Comment 19 Kevin Krammer 2006-10-24 14:36:35 UTC
On Tuesday 24 October 2006 10:21, David Faure wrote:

> Sounds complicated, given that kmail might or might not be running... and
> given that kmail might not be the selected mail client, even.


Only in the case when kmail is the selected client, i.e. when it calls the 
kmail executable now

> It's a half solution given that kmailservice is a wrapper around
> invokeMailer so fixing it in invokeMailer would be better anyway.


You mean always (re-)encoding to UTF-8 and modifying KMail so it understands 
this?

> But yeah the real problem is that if kmail isn't running, the only solution
> is "kmail <arguments>" since "launching kmail first and then using dcop"
> means getting a kmail mainwindow, which isn't wanted.


Good point. I just thought about DCOP because it can transport the QString 
representation, thus evading the recoding problem.