Bug 98788 - Possible solution to IDN domain spoofing/phising
Summary: Possible solution to IDN domain spoofing/phising
Alias: None
Product: kdelibs
Classification: Unclassified
Component: general (show other bugs)
Version: 3.3.2
Platform: Gentoo Packages Linux
: NOR normal with 86 votes (vote)
Target Milestone: ---
Assignee: security
: 99358 100081 (view as bug list)
Depends on:
Reported: 2005-02-07 17:03 UTC by David Vogt
Modified: 2016-04-09 12:06 UTC (History)
13 users (show)

See Also:
Latest Commit:
Version Fixed In:

Easiest fix possible (1.03 KB, patch)
2005-02-19 01:41 UTC, Thiago Macieira
kresolver_idn_patch.patch (2.42 KB, patch)
2005-03-03 13:03 UTC, Waldo Bastian

Note You need to log in before you can comment on or make changes to this bug.
Description David Vogt 2005-02-07 17:03:22 UTC
Version:           3.3.2 (using KDE KDE 3.3.2)
Installed from:    Gentoo Packages
Compiler:          doesn't matter 
OS:                Linux

Some news sites had articles the last few days about possible phising/spoofing of domain names using the IDN naming scheme. My idea against this is to use some sorf of a filter that would check for a possible phising "attack" in the following way:
 o See if there are mixed character sets - for example alphabetic AND kyrillic, etc. My intention is that if somebody has a domain name in ASCII characters, he/she won't use kyrillic characters in it, since the name is most probably intended for people who read and write ASCII. A single character is very suspicious IMHO (https://www.pаypal.com/ for example). Normal users can't decide if this URL is a fake - but for a machine, it's pretty easy. Such filters can be set up for (almost) every character set.
 o Possibly check that against a global black/whitelist
 o Show the user a messagebox telling him/her about it, with a "never ask me about this domain again" button - similiar to the cookies warning of konqueror.
Comment 1 Jim Higson 2005-02-07 18:02:11 UTC
For anyone who comes across this who doesn't follow slashdot, you can see an example of this attack here:


I do not know if mixed sets are in themselves suspicious. For example, I might want a maths site called πinthesky.com, using greek and ASCII, which is quite valid. Also, since roman chars are easier to type on a lot of keyboards and not everyone realises chars that look the same are not necessarily treated the same, it is quite natural for people tying in non-ASCII char sets to use the odd bit of ASCII. I think a lot of legitimate companies will cover all bases by registering as many look-a-likes as they can, and we shouldn't introduce checking that makes the user uncomfortable when visiting perfectly legitimate sites. Allowing the user to turn the warnings off is a step in the right direction, but it will be difficult to come up with an explanation of IDN attacks that a layperson would understand enough to be confident ticking that box.

If warnings are given, I think we need to be more selective. Does a list of homographs exist (or can one be made?) If so we could only flag up urls for which:

* mixed sets are used
* the domain is such that swapping some combination of char for homographs can produce a string with chars from just one charset.

This would trip up https://www.pаypal.com/ because pа is a homograph of a, and as such switching can be used to get a string with just ASCII chars.

This kind of search shouldn't take so long to do. Although it is exponential on string length, the strings are likely to be very small.
Comment 2 Thiago Macieira 2005-02-07 18:11:23 UTC
This is a WONTFIX.

The code is correct, we followed every standard.

At most, I'd recommend displaying (IDN) or an icon in the Location bar.
Comment 3 Thiago Macieira 2005-02-07 18:22:56 UTC
Oh, by the way, this is a KDE-wide issue. It is nowhere specific to Konqueror. You can have the same problem in KMail, for instance.

When/if someone comes up with a solution that isn't a WONTFIX, test other programs that display URLs as well.
Comment 4 Waldo Bastian 2005-02-07 18:30:21 UTC
Just because we follow a standard doesn't mean that there isn't a problem.

I think the problem is real, I just don't see an easy solution. We should be looking for one though.

A good suggestion that I read was to keep track of the certificates 
of previously visited secure sites (and perhaps allow the user to give it an 
additional marking) so that when the user visits the site again, he could get 
an indication that it is still the same site.

Nevertheless, when the user gets a warning "you haven't visited this site 
before" he will be alerted but he will still need to have a way to detect 
that what appears to be "www.paypal.com" is actually a different site named "www.pаypal.com". Otherwise he may dismiss it as a bogus 
Comment 5 George Staikos 2005-02-07 19:06:35 UTC
  Furthermore most people turn off that warning as one of the first things 
they do when they start visiting secure sites.

Comment 6 Maksim Orlovich 2005-02-07 19:16:01 UTC
A bogus site might also not use SSL at all.  In fact, that seems likely, since providing a legit certificate that doesn't trigger a browser warning would require the attacker to identify himself. And I don't think we can warn on every random website -- that would be toooo annoying

Comment 7 Allan Sandfeld 2005-02-07 19:30:02 UTC
I think one of the unicode normalization forms solves this problem.
Comment 8 Thiago Macieira 2005-02-07 21:03:50 UTC
IDNA already does Unicode NFKC and a bit more (like micro "µ" -> mu "μ"). And it doesn't solve the problem.

Cyrillic letters are never turned into their Latin similars. Imagine what would happen if Cyrillic Capital Letter Er (Р) blended with greek Capital Letter Rho (Р) and with Latin Capital Letter P (P).

As I've said on IRC, this is a registrar problem. They should be the ones preventing similarly-looking domain names to be registered in the first place.
Comment 9 Jason Keirstead 2005-02-08 13:38:44 UTC
Couldn't KHTML just have an option to make IDN links using a charset different from your own locale visually distinctive?

For example, make the www.pаypal.com link orange or yellow or something.

Then you still have the full IDN support, but at least the user *knows* they are looking at a link with characters in it that are not of their onw locale, and it may be a spoof.
Comment 10 Thiago Macieira 2005-02-08 14:10:49 UTC
I don't think it is a good solution.

For the locale problem, I am in UTF-8. All characters are representable in my locale.

Suppose the Cyrillic site www.МОСКВА.com exists. And someone phishs with the mixed Cyrillic and Greek site www.ΜΟСΚΒΑ.com (only С is Cyrillic; all the rest is Greek). How do you tell the difference? There's not even mixed ASCII/non-ASCII there.

Also, I'm not sure you can change colours in every situation. Think of links shown in KMail or Kopete, for instance.
Comment 11 Thiago Macieira 2005-02-08 14:16:05 UTC
The site I gave as an example was a bad one, but you get the idea. I forgot about case folding when I wrote in capitals. When you hover over those links, the lowercase form is shown, so you can readily tell the difference.
Comment 12 Magnus Kessler 2005-02-08 17:57:03 UTC
An interesting proposal was put forward a few days ago by Mozilla's Gervase Markham. He proposes colour-coding of URLs. See http://weblogs.mozillazine.org/gerv/archives/007359.html for details.
Comment 13 Tommi Tervo 2005-02-14 15:20:14 UTC
*** Bug 99358 has been marked as a duplicate of this bug. ***
Comment 14 Thiago Macieira 2005-02-16 03:11:24 UTC
Before someone does it, I'm going to do it myself:

Mozilla decided to turn off IDN support by default. 

KDE has no such setting to turn on or off at will. I could write one, but I don't like the idea.

Frankly, I'd wait and see what Safari and/or Opera people choose to do.
Comment 15 Paul Hilton 2005-02-17 21:28:22 UTC
This is a serious issue related to secure sites, like Paypal,

To say WONTFIX because the code meets all the appropriate standards is not
a good service to users. At the very least allow IDN to be turned off, and
turn it off by default until a better solution is implemented.

Your average user will not be impressed by 'code meets all the appropriate standards' when he/she has fallen victim to a phishing scam and lost a bundle of money.

If the browser displays the padlock, an https URL and reads as the user thinks it should then it is hard to see what else the user can do. Saying that they should type the URL manually is wishful thinking, especially when the link has loads of cryptic stuff on the end of it to direct you to a place on the site, for example the way ebay listings do.

I don't think that this is a 'wait and see what happens' issue. I would imagine that this WILL be quoted by Microsoft as indicating that Open Source software is insecure, and your average Joe Public will agree.
Comment 16 Waldo Bastian 2005-02-19 01:02:05 UTC
Since it's unlikely that we will be able to find a good solution for this in the near future I suggest that we make IDN support configurable and turn if OFF by default for KDE 3.4
Comment 17 Thiago Macieira 2005-02-19 01:41:50 UTC
Created attachment 9711 [details]
Easiest fix possible

This is the easiest solution possible, while still allowing configurability and
setting IDN off by default.

The patch should apply in KDE 3.3 and HEAD.
Comment 18 Stephan Kulow 2005-02-19 09:37:06 UTC
yes. combine that with something in kcminit setting a kdeinit variable though. Setting an environment variable to configure a KDE feature is not good enough.
Comment 19 Thiago Macieira 2005-02-19 16:02:26 UTC
I thought of an env var because we're dealing with very low-level stuff here. And because it was easier.

In any event, I've thought of a possibility to have half-IDN support: just disable the ToUnicode conversion. That way, the www.pаypal.com URL would show up as www.xn--pypal-4ve.com but would *still* work.

I am sure no one mistakes "paypal" with "xn--pypal-4ve". But supposing one *wanted* to get to that site, he'd still be able to. This doesn't stop people from writing such URLs -- or having them shown in KMail, Kopete, etc. -- but once you browse to the site, you will notice it's not the right one.

Now, this would be a violation of RFC 3490. It explicitly says not to show the ACE form, except for debugging.
Comment 20 Thiago Macieira 2005-02-19 16:10:32 UTC
Ok, new idea: don't disable the code in any way.

Make Konqueror show an icon "IDN" in the status bar when the ACE (ASCII Compatible Encoding) form doesn't match the Unicode form, and show *both* forms in the Location bar.

When you browse to the problematic site, you would see:
http://www.pаypal.com/  (www.xn--pypal-4ve.com)

This would also happen:
http://www.オンライン.or.jp (www.xn--eckl3qmbc.or.jp)

It wouldn't solve the problem of someone sending email to IDN'ed domains, unless KMail did something similar.
Comment 21 George Staikos 2005-02-19 17:25:46 UTC
On Saturday 19 February 2005 10:10, Thiago Macieira wrote:
> Ok, new idea: don't disable the code in any way.
> Make Konqueror show an icon "IDN" in the status bar when the ACE (ASCII
> Compatible Encoding) form doesn't match the Unicode form, and show *both*
> forms in the Location bar.
> When you browse to the problematic site, you would see:
> http://www.pаypal.com/  (www.xn--pypal-4ve.com)
> This would also happen:
> http://www.オンライン.or.jp (www.xn--eckl3qmbc.or.jp)
> It wouldn't solve the problem of someone sending email to IDN'ed domains,
> unless KMail did something similar.

    Right, every application would need this, including ones we didn't 

Comment 22 Thiago Macieira 2005-02-19 17:33:55 UTC
Mozilla people have taken this approach:

https://bugzilla.mozilla.org/show_bug.cgi?id=282270 - short-term solution, for the next releases, until the bug is properly fixed. This would equate to my idea of disabling ToUnicode or IDNA completely.

https://bugzilla.mozilla.org/show_bug.cgi?id=279099 - long-term solution: find a way to detect phishing and scams.

The 279099 bug has this on comment 135: "Most domain registrars have been correctly implementing the guidelines foravoiding IDN-related spoofing problems. [...] Unfortunately, there are a few rather large exceptions to this - .com being one. So, the suggestion is to have a blacklist of those TLDs, and display the IDN in raw punycode form throughout the UI until such time as the registrars get their act together."

(punycode = the encoding used by ToASCII in order to produce the ACE form)

So, in essence, we would:
1) keep IDNA enabled
2) disable ToUnicode for domains with length(TLD) > 2, plus exceptions like .cc or .nu

They are also working on other approaches.
Comment 23 Peter Thomassen 2005-02-22 20:42:51 UTC
I think the solution from #19 is the best one. We could just convert IDNs to their ASCII representation and show this. When browsing opening such a site, a warning could show up indicating that a IDN conversion has taken place -- this will make the average user look into the address bar, and if there's xn--pypal-4ve.com instead of paypal.com, he will run away :-)

Additionally, there should be a option to disable the conversion (for standards compliance) and/or the warning that should not be accessible directly from the warning window to prevent people from doing things without thinking (so put this into Options | Web behaviour. Is this right? In German "Web-Verhalten").

This doesn't necessitate changes in other programs because clicking on a prepared link in KMail etc. shows up Konqueror's warning.
Comment 24 Thiago Macieira 2005-02-23 01:59:12 UTC
As a short-term solution, I agree with you: that's the easiest solution.

But other programs need similar support: writing an email to webmaster@paypal.com or webmaster@pаypal.com (second one contains Cyrillic A) should be distinguishable in KMail as well.
Comment 25 Daniel Teske 2005-02-23 14:44:46 UTC
*** Bug 100081 has been marked as a duplicate of this bug. ***
Comment 26 Peter Thomassen 2005-02-23 20:30:14 UTC
I also consider this idea a long-term solution since I can't see a way to solve this in all cases. Perhaps we could provide a library to check e-mail addresses and URIs, throwing a warning if necessary and desired (--> kdeadmin), but implementing this would be the third party developers' choice.

Even if there was (is?) a layer every URI or address is passed through (dunno, perhaps for syntactical tests), producing messages in general would be a bad idea (consider batch processing etc.).
Comment 27 Thiago Macieira 2005-02-25 03:55:30 UTC
More information:

Unicode Technical Report #36 - Security Considerations for the Implementation of Unicode and Related Technology: http://www.unicode.org/reports/tr36/tr36-2.html

ICANN Internationalised Domain Names: http://www.icann.org/topics/idn.html
(now contains a forum for discussing the homograph concerns)
Comment 28 Waldo Bastian 2005-02-25 13:00:22 UTC
CVS commit by waba: 

Disable IDN by default.
Patch by Thiago
CCBUG: 98788

  M +6 -0      kresolver.cpp   1.43

--- kdelibs/kdecore/network/kresolver.cpp  #1.42:1.43
@@ -33,4 +33,5 @@
 #include <arpa/inet.h>
 #include <netinet/in.h>
+#include <stdlib.h>
 // Qt includes
@@ -870,4 +871,7 @@ static QString ToUnicode(const QString& 
 QCString KResolver::domainToAscii(const QString& unicodeDomain)
+  if (getenv("KDE_USE_IDN") == 0L)
+    return unicodeDomain.latin1();
   QCString retval;
   // RFC 3490, section 4 describes the operation:
@@ -909,4 +913,6 @@ QString KResolver::domainToUnicode(const
   if (asciiDomain.isEmpty())
     return asciiDomain;
+  if (getenv("KDE_USE_IDN") == 0L)
+    return asciiDomain;;
   QString retval;

Comment 29 Thiago Macieira 2005-02-25 15:08:27 UTC
This should probably be applied to 3.3 as well. KDE 3.2 has a different code-base, and IDN can be disabled by removing the soft-dependency libidn.
Comment 30 Stephan Binner 2005-02-26 15:44:59 UTC
Opera's solution from http://www.opera.com/windows/changelogs/800b2/ :

* Added whitelist of safe top-level domains for IDN. 
  - TLDs are considered safe if they have implemented anti-homographic character policies or otherwise limited the available set of characters to prevent spoofing.
  - Current whitelist contains: :no:jp:de:se:kr:tw:cn:at:dk:ch:li:
  - List is in opera6.ini and is updated automatically in the Opera version check.
  - Domain names from other top-level domains that contain characters outside Latin 1 will be displayed in punycode, an encoding syntax designed for use with IDNA, specified in RFC3492.
Comment 31 Waldo Bastian 2005-03-03 13:03:16 UTC
Created attachment 9945 [details]

Patch to allow TLD whitelisting for IDN purposes.
Comment 32 Thiago Macieira 2005-03-03 13:11:20 UTC
I'd say commit, but, IMO, no need to retag for 3.4.0.
Comment 33 Stephan Kulow 2005-03-03 13:21:12 UTC
Am Thursday 03 March 2005 13:11 schrieb Thiago Macieira:
> ------- Additional Comments From thiago kde org  2005-03-03 13:11 -------
> I'd say commit, but, IMO, no need to retag for 3.4.0.
I would like a working döner.de on 3.4.0 ;)

Greetings, Stephan

Comment 34 Waldo Bastian 2005-03-03 14:11:52 UTC
I think there may be an additional problem with our IDN support.
According to https://bugzilla.mozilla.org/show_bug.cgi?id=279099#c14
KSSL verifies the certificate against the Unicode form instead of the puny encoded form.

I suspect that KSSL handles this incorrect.

That is something that should be verified and fixed.
Comment 35 George Staikos 2005-03-03 15:56:22 UTC
On Thursday 03 March 2005 08:11, Waldo Bastian wrote:
> I think there may be an additional problem with our IDN support.
> According to https://bugzilla.mozilla.org/show_bug.cgi?id=279099#c14
> KSSL verifies the certificate against the Unicode form instead of the puny
> encoded form.
> I suspect that KSSL handles this incorrect.
> That is something that should be verified and fixed.

   Yes it's the case.  ksslpeerinfo.cc has the relevant code.

Comment 36 Waldo Bastian 2005-03-04 12:53:14 UTC
CVS commit by waba: 

Match certificate based on the punycode versionof the hostname
CCBUG: 98788

  M +7 -0      ksslpeerinfo.cc   1.47

--- kdelibs/kio/kssl/ksslpeerinfo.cc  #1.46:1.47
@@ -31,4 +31,7 @@
 #include <kextsock.h>
 #include <netsupp.h>
+#ifndef Q_WS_WIN //TODO kresolver not ported
+#include "kresolver.h"
 #include "ksslx509map.h"
@@ -60,5 +63,9 @@ void KSSLPeerInfo::setPeerHost(QString r
+#ifdef Q_WS_WIN //TODO kresolver not ported
         d->peerHost = d->peerHost.lower();
+        d->peerHost = QString::fromLatin1(KNetwork::KResolver::domainToAscii(d->peerHost.lower()));

Comment 37 Teemu Rytilahti 2005-03-05 20:35:39 UTC
Hmm, how about adding fi (Finland) TLD to the whitelist?
Comment 38 Waldo Bastian 2005-03-06 00:04:23 UTC
Please provide conclusive evidence that the registrar for .fi has an anti-homographic character policy in place.
Comment 39 Michael Möller 2005-03-24 11:20:22 UTC
Is there an option to add user defined whitelist entries? (Is the whitlist hardcoded or defined via a configuration file)
It might be hard to implement, but whenever the user "types" an IDN, then this can not be a spoofing attack, coorect? Do you see a possibility to add such support (i.g. auto-add typed IDN-URL's to the whitelist)?
I would like to see browsing to "www.möller.de.vu" working again... :-D
Comment 40 Waldo Bastian 2005-03-24 11:33:22 UTC
You can specify a colon separated list of TLDs in the KDE_USE_IDN enviornment variable.
Make sure to set that before starting KDE, it's not enough to set it for konqueror only.
The default is "at:ch:cn:de:dk:kr:jp:li:no:se:tw"
Comment 41 Michael Möller 2005-03-24 17:37:01 UTC
Thanks for the hint (I should have reloaded this page once in a while - I looked in the source code (kresolver) and found it myself)
So I wonder if there is a reason to compare TLD's, only? What about treating the idnDomains list as a list of allowed suffixes?
Later on this list could become configurable by some control center module (e.g., like cookie allow list, JavaScript configuration, ...) and perhaps if I type a IDN domain into konqueror's location bar, that does not match the list I would be asked if I want it on the list. This is because I'm not feeling very comfartable with allowing every ".vu"-domain to use Unicode characters.
I think this would be a convenient and clear way of handling this (and innovative, too). What do usability experts think about it?

Comment 42 Peter Thomassen 2005-03-24 18:10:36 UTC
Does it really make sense to create create a black-/whitelist, which actually is intended to prevent questions? I still think that this problem is not solvable in a general way. In blacklisted TLDs, there also would be good sites, taking away the freedom of choice.

What about comment #23?
Comment 43 Michael Möller 2005-03-24 18:25:55 UTC
comment #23 is what Firefox does. For users with URL's including only a few non-ASCII characters this might be OK - but I would prefer to see "www.möller.de.vu" instead of "http://www.xn--mller-jua.de.vu/" - please notice that this has only one non-ASCII character. 
But what about, e.g, Japanese websites. Assuming I was a Japanese native speaker and type in the Japanese URL I'm looking for it would be a pain for me to see the ASCII encoded URL. For cookies, JavaScript, etc. we (or I) accept white-/blacklists so why not for IDN's?
Comment 44 Peter Thomassen 2005-03-24 18:40:47 UTC
Referring to comment #43.

Because black-/whitelists introduce additional questions which I thought could be avoided -- we will see.

You spoke about a Japanese native speaker ... If she adds .jp to her whitelist, any protection is gone. And she _will_ add .jpg. Is this good?

But this brought me to another idea: AFAIK all the Unicode characters belong to an subcharset or an "area", i.e. 0x... til 0x... is Cyrillic, 0x... til 0x... is Chinese, another range is Latin. What about whitelists for those ranges? A German user could specify that all Latin characters are ok, a Japanese one could allow her ones ... These ranges are "good", the others are "bad".

Protection (warning window, blocking) could take place for domains which consist of at least one bad range (i.e. mixed Cyrillic and US ASCII, see paypal, or pure Latin if I'm a Japanese speaker). An exception to this is _pure_ US ASCII, which always should be allowed.
Comment 45 Thiago Macieira 2005-03-24 23:34:28 UTC
Tuning down the severity now. The issue is not critical anymore because we have prevented the phishing attack, even if the solution can still be greatly improved upon.

My opinion is that we should implement a solution with:
1) whitelisting of TLDs known to be safe -- those that have implemented rules that restrict the characters allowed, such as .de (allows only ä, ö and ü aside from the normal ASCII ones)

2) blacklisting the TLDs known to be unsafe: .com, .net, .org, .biz, etc.

3) on top of 1 & 2, implement per-language list of valid characters outside the ASCII range

4) create a list of blacklisted characters (Unicode codepoints that look like /, for instance)

The algorithm would be like this:
- if the domain is ASCII-only, never mind it
- verify the #4 list. If there is any such forbidden character, refuse to use IDN and don't warn the user.
- verify the #3 list. If any characters fall outside the language rules, warn the user.
- verify the #2 list. If the domain is explicitly blacklisted, warn the user.
- verify the #1 list. If the domain isn't explicitly whitelisted, warn the user with the option to not show the warning again.

libkdecore would provide a method of checking those, but not the warnings (since that would be in libkdeui). Applications like Konqueror and KMail should provide the proper warnings when necessary.

So a German user would not see a warning if he went to möller.de, but a Portuguese-speaking one would, since "ö" doesn't occur in the Portuguese language.

For those domains that match the language rules, but aren't explicitly whitelisted or blacklisted, we should provide a warning that has the "do not show this again" option. That means a German-speaking user would see a warning for "möller.de.vu", but could turn that off for the site or globally.

This is just an idea. It has to be refined before implemented.
Comment 46 Peter Thomassen 2005-03-25 20:51:59 UTC
Referring to comment #45.

| 1) whitelisting of TLDs known to be safe -- those that have implemented rules that restrict the characters allowed, such as .de (allows only ä, ö and ü aside from the normal ASCII ones) 

This list may be outdated one day which is bad when a registry eases restrictions.

| 2) blacklisting the TLDs known to be unsafe: .com, .net, .org, .biz, etc. 
| 3) on top of 1 & 2, implement per-language list of valid characters outside the ASCII range 

Hm. I think a domain of these unsafe TLDs can be considered safe if it only consists of characters that are valid concerning the per-language list. So why introduce the blacklist which, like #1, may be outdated one day?

| 4) create a list of blacklisted characters (Unicode codepoints that look like /, for instance) 

Even though I deem this a good point, I would not implement it since some IDNs would not be accessible otherwise (even without a warning). Because those characters actually should not be in a per-language list (#3), #3 would trigger a warning anyway.

Isn't the per-language list enough?
Comment 47 Thiago Macieira 2005-03-28 06:10:39 UTC
I am in agreement with some other developers who think the IDN specs are broken when they allow punctuation characters that look like / to be allowed. It is different if a language has a character that happens to look like /: tough luck, but we can't restrict.

I think that per-language restriction isn't enough. I am a Portuguese-language speaker, which means the i-acute (í) character is allowed for me -- that means I can reasonably be expected to notice it. However, it is also true that this character in particular is very easily mistaken for the normal i, which can be used to create phishing sites like íntel.com. All you have to do is have a smallish font.

The same is also true for Turkish speakers and the dotless i (ı). Try going to mıcrosoft.com -- it exists.

So, the bottom line is: unless the strict registration rules are enforced, per-language isn't enough security. Hence the need for a white-listing of domains, and/or a blacklisting of others.
Comment 48 Peter Thomassen 2005-03-28 09:30:52 UTC
Ok, we should do blacklisting. Whitelisting, hm. Take .de as an example which allows ä, ö, ü -- this is also covered by per-language list, but white lists can become dangerous some day:

At the moment, there is a discussion to give the administration of .net to DENIC which is also the .de registry. Imagine bringing .net onto the whitelist now; in some years this can change like today, and we have a problem. We should not disable other tests by whitelisting.
Comment 49 Thiago Macieira 2005-03-28 13:19:33 UTC
No, we should not disable other tests. Your suggestion makes sense.
Comment 50 Peter Thomassen 2005-03-28 15:39:45 UTC
Then, why do we need whitelists at all? Other tests, especially per-language list checks, are also performed and either confirm the whitelist (whitelisted IDN with local characters) or trigger a warning (whitelisted TLD, but foreign characters), overriding the whitelist decision. This makes whitelists useless.

BTW, I have thought again about per-language whitelists. They should not only contain characters outside the ASCII range, but all characters valid for an IDN of that language. That is, the German list should contain [a-z]äöü, but the Cyrillic one must _not_ contain ASCII [a-z] since there aren't any ASCII characters in the Cyrillic charset. If we didn't do that (--> allowing mixtures), Cyrillic users would still be affected by the paypal.com problem. Furthermore, a mixture of characters from different per-language lists should not be allowed by the same reason. --> An IDN must be monolingual to not trigger a warning.

We have come to the agreement that blacklists are necessary because similar characters could be used to mislead people (comment #42). But this probably is true for all IDN-enabled TLDs. Can't microsöft.de as easily be mistaken with Microsoft, as mícrosoft.pt? Or, in general, isn't it often possible to register similar-looking domains? Additionally, this is also possible without IDNs (consider intel.com and inte1.com), so all TLDs are unsafe and would have to be blacklisted, making things even worse than before. Originally, blacklisting was mentioned to forbid identical-looking domains (which is not necessary because an IDN containing only characters from one per-language list is safe anyway). Taking everything into account, I don't consider blacklists to be helpful.

In short, do the following if a domain is not pure ASCII:
1) Check if all characters are in the same per-language list. If not, trigger a warning.

This rule also applies for punctuation homographs like / unless they are in a per-language list, and than they should be allowed (I don't believe that this will take place). Nevertheless, the idea of rule #4 from comment #45 is included.

(And, if my considerations above should be wrong:
2) Check if the TLD is blacklisted because it is known to be unsafe. If true, trigger a warning.)

These are, as everything, only my thoughts that have to be discussed.
Comment 51 Thiago Macieira 2005-03-28 16:09:11 UTC
You have raised a point I had forgotten about: the one label, one script rule -- what you proposed in comment #44.

So, a set of letter and letter-like glyphs glued together must belong all to the same script, or a warning is triggered. That way, Russian-speakers will still be warned if a Cyrillic A (a valid character for their script) is found in the middle of a Latin-based label, such as the paypal case.

Now, having said that, it is possible to accomplish that with the language list: the label must be either all ASCII, or fall within the language rules. So, for Greek speakers, it must be either be entirely written in ASCII, or entirely written in Greek. 

This will generate warnings for sites like www.the-α-site.com, for everyone. We could relax the rule to a "one section, one script", if we wanted to.

I think we have reached a point where we can start discussing implementation. First thing is: how do we detect the language? Given that we're talking about a KDE-wide setting, this cannot be a Konqueror config (in fact, Konqueror sends its Accept-Language header based on the global config).

Any opinions on how configurable this must be? On one extreme, we can do it all without any configuration options.
Comment 52 Allan Sandfeld 2005-03-28 16:56:52 UTC
Rather than per language, shouldn't we use a per alphabet list. Such that all latin-based characters can be mixed, and all cyrillic characters can be mixed, etc?
Comment 53 Peter Thomassen 2005-03-28 21:46:32 UTC
Referring to both comment #51 and comment #52:

Good idea, but I think charset-based character checks are better because German speakers (ISO-8859-1, Latin-1) usually don't use Celtic characters (ISO-8859-14, Latin-8) and vice versa, even though both charsets are Latin-based; there shouldn't be any need to mix charsets up. In this case, we really could avoid confusion because of an accent.

Section-wise charset mixing is good, but imagine h-p.com (Hewlett-Packard) is registered again using another charset for one or both characters. See below.

- Checkbox to enable IDN protection and show the other options (activated by default).

- Select list to activate one or more charsets, preventing attacks onto domain names that can be imitated using a single charset. By default, only enable the charset according to the localization used. Since pure ASCII always is allowed, it is not included in the charset list. UTF-8 isn't, too, because it would disable IDN protection.

- Radio boxes to allow mixture of charsets
  * never (default, this is most secure)
  * section-wise
  * level-wise (subdomain-wise)

- Maybe a checkbox to enable either only letters (default), or the whole charset (including punctuation and special symbols). Although this actually is a registry task, we shouldn't trust them ... they can change.

If the last option is not implemented (allowing the whole charset), checks are simple: Just try to convert from UTF-8 to one of the good charsets. If this fails, trigger a warning.
Comment 54 Peter Thomassen 2005-05-31 16:25:40 UTC
Do we come along with this issue? The last comment is two months old ...
Comment 55 Thiago Macieira 2005-06-01 13:12:13 UTC
There is no bug because the code in question has been disabled for zones with questionable policy. ccTLDs that implement verification are allowed to have IDN.

We will re-enable IDNs for those domains in the near future, as soon as IETF publishes its final recommendation. Other browsers are doing the same thing for the moment, and will implement the same solution so as to avoid interoperability problems.
Comment 56 Teemu Rytilahti 2005-09-27 19:42:08 UTC
Referring to #38. If that's true what Thiago says in #55 about what other browsers are doing at the moment then .fi domain could be added to whitelist as it works fine with Firefox.
Comment 57 Teemu Rytilahti 2005-09-27 21:49:26 UTC
Oh, and after starting discussion about how to get umlauts available for .fi I got these urls from a guy on the same channel. Here's the policy for .fi (maintained by Ficora): http://www.ficora.fi/englanti/internet/IDN.htm and here's the list of what Mozilla uses: http://www.mozilla.org/projects/security/tld-idn-policy-list.html -- Could we use the same?
Comment 58 Christoph Feck 2011-07-26 15:25:23 UTC
Dawit, I remember there where some commits that could potentially address this, but I cannot say for sure. Could you check the status of this bug, and maybe reassign or resolve it? Thanks.
Comment 59 Dawit Alemayehu 2011-07-27 00:35:55 UTC
(In reply to comment #58)
> Dawit, I remember there where some commits that could potentially address this,
> but I cannot say for sure. Could you check the status of this bug, and maybe
> reassign or resolve it? Thanks.

The spoofing fix that was applied recently is not related to this problem. That one only dealt with the username component of a URL being used to confuse the user about the site he/she is visiting.