Summary: | KURL::decode_string problem with encoded UTF-8 sequences | ||
---|---|---|---|
Product: | [Frameworks and Libraries] kdelibs | Reporter: | Kevin Krammer <krammer> |
Component: | general | Assignee: | Stephan Kulow <coolo> |
Status: | RESOLVED INTENTIONAL | ||
Severity: | normal | CC: | bastian, faure |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Attachments: | Testcase |
Description
Kevin Krammer
2006-10-18 01:34:37 UTC
Most of the encoding/decoding code seems to have been contributed by Waldo, CC'ing in case he has an idea where it might happen Created attachment 18226 [details]
Testcase
Be sure to save as UTF-8 encoded to make fromUtf8() return the correct string.
Seems to be a matter of the encoding hint. The input of my original test case comes from xdg-email, which can encode its input to UTF-8 before turning it into a mailto URI (option --utf8) David might have more info about this problem kfmclient exec "mailto:test@foo?subject=%C3%9Cberraschung" works fine for me on an utf8 system (LANG="en_US.UTF-8"). And the testcase works too, but this only shows that UTF8 systems are no good for debugging such problems, since local8bit/utf8 confusions don't trigger problems. Anyway, let's see what would happen on a non-utf8 system. The decode_string in KApplication::invokeMailer doesn't know how the url was escaped [we would need to pass the mib enum to invokeMailer, but we wouldn't know how to set it when the url comes from the command line anyway], and local8bit is assumed instead. And then the other invokeMailer line 2389 uses encore_string again, assuming local8bit again (which matches the way kmail will receive the arguments, too). I think you're simply supposed to give invokeMailer a url that was created using the locale, not using utf8. I don't see a fix for this, there's no way to know if a url was encoded using utf8 or the locale. For kde4 the use of encodings in kurl/qurl has been removed, it's all utf8 there, so the problem is actually gone [or rather, reversed - urls encoded with the locale won't work anymore, but the goal is to phase those out]. Thank you for the explanation I just got confused because QUrl::decode (Qt3) worked, but I guess it just assumes UTF-8 source encoding. I'll check how gnome-open/exo-open handle local8bit vs. utf8 in encoded URLs. Ok, that gnome-open, exo-open statement was stupid, obviously they are also just passing on the URI to the mail application. Anyway: Thunderbird handles the UTF-8 encoded URI but fails at the local8bit one Evolution handles both correctly KMail just parses the recepient and "looses" everything else. I am going to check if there is a bug report about this (tried with KMail already running and with KMail not running in case the UniqueApp communication causes the problem) The question is, how does Evolution manage to get both variants working? On Monday, October 23, 2006 01:44:47 PM Kevin Krammer wrote:
> The question is, how does Evolution manage to get both variants working?
Content-based heuristics to determine the codec, I can't think of any other way.
Very fragile. Ask e.g. Thiago ;)
ok, "guessing" :) How about adding an encoding hint to the query mailto:test@foo?subject=%C3%9Cberraschung&encoding=utf8 On Monday, October 23, 2006 03:30:50 PM Kevin Krammer wrote:
> mailto:test foo?subject=%C3%9Cberraschung&encoding=utf8
And webpages are going to follow kde's new standard? :-)
I thought it was either
- apps call invokeMailer and can then build the url correctly (i.e. using KURL, which means locale encoding in kde3)
- webpages use mailto urls and we can't know the encoding.
Now I see, I guess you also want to support scripts and stuff... But I'm very much against
adding an option in kde3 that we know already we won't be able to support in kde4...
Yes, the idea was to add this option in xdg-email when its caller uses the --utf8 option of that script. I guess the usual case will be calls with local encoding, but I imagine that ISV application might want to "play save" and encode to UTF-8 before calling xdg-email and then KDE will fail. Unfortunately the other mailers likely used by GNOME or XFCE will especially handle UTF-8 correctly (see last comment) and only KDE will appear broken :( Hmm, how about specifying the encoding as an additional argument to kfmclient kfmclient exec mailto:utf8encodeduri "uri/mailto;utf8" On Monday, October 23, 2006 07:58:47 PM Kevin Krammer wrote:
> kfmclient exec mailto:utf8encodeduri "uri/mailto;utf8"
Well or kfmclient --utf8 exec mailto:foo
but you'd have to check if you can implement it without touching kdecore (preferrably).
Which probably means decoding and reencoding the url, if locale!=8bit.
I would have preferred an additional parameter for keeping calling compatability with older kfmclient versions, but kfmclient a) does a trader query for the given MIME type and b) doesn't understand the ";charset=encoding" part kmailservice unfortunately also fails when given a second argument :( Independent from the transportation of the encoding hint, I see two options on how to implement it: - add an KApplikcation::invokeMailer method that also takes an encoding hint - copy the code from KApplication::invokeMailer(const KURL&) to kmailservice and apply the encoding hint there. I guess the second option is preferable. Recoding is IMHO out of question since the UTF8 encoded text might not be encodable in local8Bit Hmm, another idea: Assuming that a KDE application can have a different locale setting than KDE itself if its environment is different: if we set the environment to a UTF8 locale before calling kmailservice, KURL::decode_string would decode to utf8, right? $ LC_ALL=C.UTF-8 kmailservice mailto:test@foo?subject=%C3%9Cberraschung seems to work :) I am closing this as WONTFIX Reference of related Portland bug if interested: https://bugs.freedesktop.org/show_bug.cgi?id=8740 > $ LC_ALL=C.UTF-8 kmailservice mailto:test foo?subject=%C3%9Cberraschung
Excellent solution, I'm impressed ;)
"UTF8 encoded text might not be encodable in local8Bit" still applies though. kmailservice
still has to convert it to local8bit before calling kmail, which expects local8bit input. I'm quite
sure that nothing fixes the case where the characters are actually not representable in the
current locale [well, kmail would have to take a url as input instead of -subject <subject>].
> Excellent solution, I'm impressed ;) Thanks :) > "UTF8 encoded text might not be encodable in local8Bit" still applies > though. kmailservice still has to convert it to local8bit before calling > kmail, which expects local8bit input. I'm quite sure that nothing fixes the > case where the characters are actually not representable in the current > locale [well, kmail would have to take a url as input instead of -subject > <subject>]. Well, theoretically KMail can take an URL as a commandline option, however it cannot handle the query part correctly See http://bugs.kde.org/show_bug.cgi?id=136183 I am wondering if kmailservice could use some DCOP API of kmail to pass the strings in unicode? On Tue Oct 24 2006, Kevin Krammer wrote:
> I am wondering if kmailservice could use some DCOP API of kmail to pass the
> strings in unicode?
Sounds complicated, given that kmail might or might not be running... and given
that kmail might not be the selected mail client, even.
It's a half solution given that kmailservice is a wrapper around invokeMailer so fixing
it in invokeMailer would be better anyway.
But yeah the real problem is that if kmail isn't running, the only solution is "kmail <arguments>"
since "launching kmail first and then using dcop" means getting a kmail mainwindow, which isn't wanted.
On Tuesday 24 October 2006 10:21, David Faure wrote: > Sounds complicated, given that kmail might or might not be running... and > given that kmail might not be the selected mail client, even. Only in the case when kmail is the selected client, i.e. when it calls the kmail executable now > It's a half solution given that kmailservice is a wrapper around > invokeMailer so fixing it in invokeMailer would be better anyway. You mean always (re-)encoding to UTF-8 and modifying KMail so it understands this? > But yeah the real problem is that if kmail isn't running, the only solution > is "kmail <arguments>" since "launching kmail first and then using dcop" > means getting a kmail mainwindow, which isn't wanted. Good point. I just thought about DCOP because it can transport the QString representation, thus evading the recoding problem. |