Bug 70363 - Some common web pages load very slowly in konqueror
Summary: Some common web pages load very slowly in konqueror
Status: RESOLVED FIXED
Alias: None
Product: kio
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: David Faure
URL:
Keywords:
: 85503 86857 87307 87690 87732 87895 87994 88732 89340 90587 93729 96298 104468 135488 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-12-14 03:20 UTC by colesen
Modified: 2006-11-28 09:58 UTC (History)
20 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
original and modified html file as requested by Bhaduri (6.46 KB, application/x-tbz)
2004-01-29 06:18 UTC, colesen
Details
resolver-blacklist.diff (5.17 KB, text/x-diff)
2005-01-25 17:23 UTC, Thiago Macieira
Details

Note You need to log in before you can comment on or make changes to this bug.
Description colesen 2003-12-14 03:20:56 UTC
Version:            (using KDE KDE 3.1.4)
Installed from:    SuSE RPMs
OS:          Linux

Some common web pages load very slowly in Konqueror on SuSE 9 as for example

http://www.reuters.com/
http://www.informit.com/
http://www.economist.com/
http://www.theregister.co.uk/

CPU utilization does not go up nor does the system become unresponsive contrary to what is reported in for example #60316. Other pages load instantly such as news.google.com.
In addition to stock SuSE 9 with KDE-3.1.4 I've tested with KDE-3.1.94 and linux-2.6.0-test11 on stock SuSE 9 but the problem persists. These web sites are relatively common and I wish for Konqueror which is a great product that a fix/workaround can be found.
Comment 1 Tom Collins 2003-12-14 07:30:55 UTC
Pages load quickly for me, CVS 12-13-2003 build.
Comment 2 colesen 2003-12-14 07:37:55 UTC
RE: Tom Collins
What OS are you using?
Comment 3 Tom Collins 2003-12-14 18:00:34 UTC
Linux, kernel 2.6.0-test11 on Gentoo.  Sources were compiled
12-13-2003, with -Os (gcc 3.2.3).  Might I suggest you make
a minimal testcase that shows the problem?  theregister seems
like the best one to start with - do a View->View Document Source,
then save the file locally.  Isolate the problem code by slowly 
stripping out hunks of the HTML until the problem goes away.
It's a slow, tedious process, but I've done it a number of 
times and been successful in getting bugs fixed.
Comment 4 Dawit Alemayehu 2003-12-20 15:59:12 UTC
Cannot duplicate this with current CVS on Gentoo with Linux 2.4.23-ck1 either.
All those pages load quickly for me. I also do not ever remember having problems with register.co.uk even in 3.1.x releases. Anyways, this report does not really belong in this module...
Comment 5 colesen 2004-01-04 02:15:42 UTC
I took Tom Collins up on his suggestion.
www.theregister.co.uk has this statement

<SCRIPT LANGUAGE="JavaScript">ord=Math.random()*10000000000000000;document.write('<SCRIPT LANGUAGE="JavaScript" SRC="http://ad.uk.doubleclick.net/adj/theregister.co.uk/regindex;area=regindex;pos=1;ptile=1;sz=150x100;ord=' + ord + '?" ><\/SCRIPT>');</SCRIPT>

I observed that if I changed that to

<SCRIPT LANGUAGE="JavaScript">document.write('<SCRIPT LANGUAGE="JavaScript" SRC="http://ad.uk.doubleclick.net/adj/theregister.co.uk/regindex;area=regindex;pos=1;ptile=1;sz=150x100;ord=Math.random()*10000000000000000;ord=' + ord + '?" ><\/SCRIPT>');</SCRIPT>

e.g. just move the first ord= further down then the problem disappears completely.
Don't know if it is helpful. I haven't worked on any of other slow links (yet). 
Comment 6 Sashmit Bhaduri 2004-01-28 16:33:48 UTC
> e.g. just move the first ord= further down then the problem disappears completely. 

Can you attach a simplified version of that page with and without that fix?
Comment 7 colesen 2004-01-29 06:18:46 UTC
Created attachment 4412 [details]
original and modified html file as requested by Bhaduri

I still have the problem - now on SuSE 3.1.95.
Comment 8 colesen 2004-03-09 10:13:10 UTC
Fedora-development and SuSE9 both with Qt3.3.1 and KDE-3.2 both exhibit the problem. If I turn-off JavaScript under Tools->HTML Settings and click the reload button then the problem disappears.
I wish for wide spread use of Konqueror. But I think this problem is in the way. Is changing the default to Java Script off a reasonable workaround? Some pages will launch a popup to have JavaScript turned on. But I don't know if all pages that require JavaScript will do that.
Comment 9 Eugene Weiss 2004-04-20 23:02:12 UTC
I can add two more sites for this bug:
http://news.independent.co.uk/ and http://www.sfgate.com/

But... the problem isn't unique to Konqueror, I find the same behavior with Linux and Mozilla.  In windows, both IE and Mozilla do not have this problem.  Interestingly, IE running on Wine in Linux does not have this problem either, so if it's an OS problem, it's not on a very deep level.
Comment 10 colesen 2004-04-21 07:04:05 UTC
RE:Comment #9
Your links are not slow to load using the latest Opera (with JavaScript enabled) on Linux.
http://www.theregister.co.uk/ is not slow anymore. It has changed in the meantime.
www.washingtonpost.com articles are often on news.google.com (US) and also slow to load - for example right now
http://www.washingtonpost.com/wp-dyn/articles/A28720-2004Apr20.html.
Comment 11 Waldo Bastian 2004-04-21 12:33:51 UTC
Resolving ad.uk.doubleclick.net takes about forever here.
Comment 12 Stephan Binner 2004-04-21 13:01:50 UTC
As someone who has doublick.net addresses in his /etc/hosts, I heard its some misconfiguration related to IPV6? wget seems to use the same lookup method as Konqueror (slow), while nslookup/ping are fast.
Comment 13 Waldo Bastian 2004-04-21 14:03:15 UTC
CVS commit by waba: 

Make it possible to disable IPv6 by setting $KDE_NO_IPV6
CCMAIL: 70363@bugs.kde.org


  M +4 -0      kextsock.cpp   1.71
  M +2 -0      netsupp.cpp   1.41


--- kdelibs/kdecore/netsupp.cpp  #1.40:1.41
@@ -187,4 +187,6 @@ static int check_ipv6_stack()
   return 2;                     // how can we check?
 # else
+  if (getenv("KDE_NO_IPV6"))
+     return 2;
   int fd = ::socket(AF_INET6, SOCK_STREAM, 0);
   if (fd == -1)

--- kdelibs/kdecore/kextsock.cpp  #1.70:1.71
@@ -148,4 +148,8 @@ static bool process_flags(int flags, int
     (flags & KExtendedSocket::canonName ? QResolver::CanonName : 0) |
     (flags & KExtendedSocket::noResolve ? QResolver::NoResolve : 0);
+
+  if (getenv("KDE_NO_IPV6"))
+    familyMask &= ~QResolver::IPv6Family;
+
   return true;
 }


Comment 14 Stephan Binner 2004-04-24 10:45:00 UTC
Please add documentation to at least http://wiki.kdenews.org/tiki-index.php?page=Environment+Variables :-)
Comment 15 Thiago Macieira 2004-05-01 20:41:58 UTC
The change in netsupp.cpp won't do anything because it isn't used anymore in HEAD.

Should I add a similar change to the new socket classes?

Please take note I STRONGLY oppose this modification.
Comment 16 Waldo Bastian 2004-05-03 14:33:00 UTC
Thiago: It helps the discussion if you also explain why you oppose it.
Comment 17 Thiago Macieira 2004-05-04 03:04:54 UTC
Main reason: Because it's no bug in our code. It's a bug in THEIR code, in their servers. They simply don't reply for a DNS query for an AAAA record.

Personal reason: disable IPv6? I have been striving to get IPv6 support working for the past 4 years. I don't like disabling it, but I can accept if people don't want in their machines.
Comment 18 colesen 2004-05-11 10:18:43 UTC
Got an e-mail from bugs.kde.org to update status. I retested and not surprisingly - considering the above comments - then the problem is still there as of KDE-3.2.2. I don't know if it is helpful with more examples but here are a few of possibly many from http://www.hotsheet.com/ - a sheet of popular links - that all load at normal speed in Windows
http://www.nasdaq.com
https://us.etrade.com/e/t/home
http://www.schwab.com
http://www.smartmoney.com
http://www.hoovers.com

What are the implications of turning off IPv6? Is this something that the user should be able to do on the fly - similar to turning off JavaScript - and therefore accessible from the GUI rather than or in addition to the envvar?
Leaving the problem as unresolved.
Comment 19 Stephan Kulow 2004-05-11 15:52:42 UTC
Does enabling host caching in /etc/nscd.conf fix this in some way?
Comment 20 Thiago Macieira 2004-05-11 18:48:45 UTC
I don't see how it would, but please test.

There is no answer to cache, so every time a lookup is done, the server must be contacted again. And again it will not answer.
Comment 21 colesen 2004-05-15 00:13:59 UTC
Just for the record - I've just installed SuSE-9.1 as a new install, updated it with everything that SuSE has to offer (YOU and KDE) but nothing else, haven't made any configuration changes and the problem is still there.
Comment 22 Thiago Macieira 2004-05-15 01:57:10 UTC
Upgrading or configuring won't make the problem go away because it's not a problem in our code. It's in someone else's DNS server.

What we can do is write a workaround like the one Waldo made, which completely disables IPv6 lookups. Of course, this has no effect for people like me who do use IPv6 lookups.

To activate the workaround, add this to your ~/.xinitrc:
	export KDE_NO_IPV6=true

I'm tempted to close this bug with WONTFIX or FIXED.
Comment 23 colesen 2004-05-15 05:21:16 UTC
For completeness and future reference - in case this report is about to be wrapped up: 
1. Why does rearranging the html as documented in comment #5 and #7 (apparently) fix the problem?
2. What are the drawbacks, if any, of applying the workaround?
Comment 24 Thiago Macieira 2004-05-15 06:05:06 UTC
The "solution" in comment #5 isn't a solution. It tricks the Konqueror cache by not changing the URL, which is exactly what the JavaScript does. It places a random number in the URL to prevent a cached copy from being loaded.

This is not a solution nor even a workaround, since we can't change other people's pages.

The workaround works by disabling IPv6 lookups. So, if you turn it off, you won't reach any IPv6 sites (say, http://www.kame.net won't have a dancing turtle). Most people don't have IPv6 addresses, so this wouldn't affect them at all.
Comment 25 Gogs 2004-07-09 11:13:22 UTC
I was going to file a new bug report, but I don't know if this is related. I'm having (and have always had) problems with www.thescotsman.co.uk and www.autotrader.co.uk amongst others. Like the other posters, if I disable javascript the pages load immediately, but if not then it can take 40 - 50 seconds in some cases before I can see anything. Strangely enough, I don't seeem to have any problems with theregister.co.uk at all....

I'm using cvs from 02/07 with kernel 2.6.7 compiled with gcc 3.4.1
Comment 26 colesen 2004-07-09 11:41:55 UTC
They both load at normal rate here (SuSE9.1).
In the .profile I have
export KDE_NO_IPV6=true
forget www.theregister.co.uk - the layout changed.
Comment 27 Stephan Kulow 2004-08-09 18:42:26 UTC
*** Bug 86857 has been marked as a duplicate of this bug. ***
Comment 28 Corey 2004-08-09 19:39:10 UTC
I just filed a report that was marked a duplicate of this bug; and can 
verify that after using the KDE_NO_IPV6=true fix, I no longer experienced
the slow loading issue.

I also agree that this seems to be a horrible workaround, for obvious reasons.


( Note to gentoo users: put the 'export KDE_NO_IPV6=true' line into 
/etc/X11/Sessions/kde-<version> )

Comment 29 Germain Garand 2004-08-17 04:52:58 UTC
*** Bug 87307 has been marked as a duplicate of this bug. ***
Comment 30 Germain Garand 2004-08-17 05:03:30 UTC
see also: http://bugzilla.mozilla.org/show_bug.cgi?id=68796

They went for a blacklist, plus a config option disabling IPv6 entirely (defaulting to true on some browsers/platforms combination).
Comment 31 Thiago Macieira 2004-08-22 00:10:27 UTC
I like the idea of a blacklist. Maybe we could share it with the Mozilla guys.

Just as a note, if we do implement a blacklist, it'll be for all resolution. The given domain name would be completely blocked, not just for IPv6 resolution, but everything. Maybe if we start hurting them where it hurts most (their pockets), they'll fix the problem.
Comment 32 Maksim Orlovich 2004-08-22 17:34:49 UTC
*** Bug 85503 has been marked as a duplicate of this bug. ***
Comment 33 Germain Garand 2004-08-24 17:18:06 UTC
*** Bug 87895 has been marked as a duplicate of this bug. ***
Comment 34 Filip Vancoillie 2004-08-25 22:41:31 UTC
I think it's trange that by changing to kde 3.3 suddenly the behaviour 
appeared while in bug 70363 it's all about earlier versions, and I 
never noticed (allthough I updated with every 3.x.x release).

I also noticed the change of behaviour on three sites at the same time 
which would be quite a coincidence (servers changing or so..)  The 
sites are www.standaard.be www.luchtvaartnieuws.nl and www.ta.nl

I can not test the workaround for bug 70363 though because i'm on 
another computer and within a week I depart for at least 3 months to 
latin-America.

One other thing I can tell is that i'm using a 'block-list' of hosts 
in /etc/hosts  Maybe that is useful info..
Still I think it's strange that a website loads so slow just because of 
one link.  The text can appear before this one image arrives no?  But 
i'm not an expert.


wanted to mention that kde is great, good work guys!
bye
filip

Comment 35 Wilco Greven 2004-08-28 22:48:57 UTC
*** Bug 87994 has been marked as a duplicate of this bug. ***
Comment 36 Arnaud Burlet 2004-08-30 02:07:06 UTC
I switched to kde3.3 from kde 3.2.3 today, and I immediately noticed many slow down using kde apps :

- Kopete froze for 1 minute when going online.
- Kmail was very slow checking mails.
- Konqueror needed almost 1 minute to load several web sites usually loaded within seconds under kde 3.2.3.

The same sites worked fine with mozilla under all versions of kde.

I then set KDE_NO_IPV6 and everything was fine again.

kernel 2.4.24 not compiled with ipv6 support, gentoo box behind an adsl netopia router...

how may I help ?

PS: woow, 3.3 : what a release, thanks guys !
Comment 37 Thiago Macieira 2004-08-30 02:30:01 UTC
Arnaud: I think I know what your problem is. The new version of the code does not verify if you have IPv6 in your computer or not: it'll simply do the lookup.

The slowness must be caused by your DNS server being slow on answering IPv6 queries. Or then, your system may have IPv6 and it's misconfigured.

In any event, that's not the point in this bug report.
Comment 38 Germain Garand 2004-09-03 18:31:16 UTC
*** Bug 88732 has been marked as a duplicate of this bug. ***
Comment 39 Maksim Orlovich 2004-09-12 15:53:28 UTC
*** Bug 89340 has been marked as a duplicate of this bug. ***
Comment 40 Maksim Orlovich 2004-09-12 15:54:59 UTC
*** Bug 87732 has been marked as a duplicate of this bug. ***
Comment 41 Adam Wiggins 2004-11-16 11:56:54 UTC
This turned out to be the same problem I was having with paypal.com, etrade.com, and a few other sites.  I am running Fedora Core 3.

Interestingly I get the exact same problem (with the same fix, setting the environment var) on both my home and work computers.  The are on different ISPs and therefore have different nameservers.  Whatever this problem with DNS servers is, it's extremely pervasive.
Comment 42 Tommi Tervo 2004-11-16 12:10:55 UTC
*** Bug 90587 has been marked as a duplicate of this bug. ***
Comment 43 Thiago Macieira 2004-11-16 15:52:44 UTC
@Comment #41: the problem is not in your ISP's nameserver, but in the site's nameserver. So everyone, anywhere in the world, should be experiencing the same problems.

See comment #30 and comment #31 for possible workarounds, not yet implemented in KDE code.
Comment 44 Charles Samuels 2004-11-22 18:28:38 UTC
*** Bug 93729 has been marked as a duplicate of this bug. ***
Comment 45 Charles Samuels 2004-11-22 18:53:46 UTC
Please see my report 93729.  It could be a race condition because the threads are spawned, but exit (apparently) before it gets to the wait at the top of my bt.  I obviously don't know how this code is supposed to work, though.
Comment 46 Charles Samuels 2004-11-22 19:03:06 UTC
another nice workaround is adding this to your /etc/hosts:
::1 ad.doubleclick.net
Comment 47 Gregory Stark 2004-12-11 01:45:42 UTC
I have problems related to this bug as well.

I have some questions about why it works the way it does:

1) Why do IPv6 resolver queries if there's no IPv6 interface on which to communicate with any records you got back? It's a waste of network traffic, any data you get back will be useless anyways.

2) I think the logic of your resolver is broken. I have IPv4 entries in /etc/hosts for doubleclick and for my own servers, there shouldn't be any network queries for these addresses. My /etc/host.conf says hosts,bind, so if in the search for an address for a name there's a match in /etc/hosts it ought to return that and stop. Not do two independent searches, one for IPv6 and one for IPv4. There's no way to indicate in /etc/hosts that there is no IPv6 record at all, so there's no no way to create correct host names in /etc/hosts and prevent DNS queries on the name.

3) Your resolver logic seems to spam the DNS servers with hundreds of queries for the same address. You're not even caching the records for the duration of a single operation. This is terribly unfriendly to DNS servers. I'm surprised admins haven't cried bloody murder for it already. I have a suspicion this is the root of my problems as I suspect the DNS server is rate limiting responses when it gets a ton of queries from the same source.

Your use of a custom in-hour resolver makes debugging this a royal pain. What function names should I set breakpoints on to catch these resolver requests? It seems your resolver bypasses the system resolver entirely? I don't see the standard gethostbyname() or getaddrinfo() getting invoked at all.
Comment 48 Thiago Macieira 2004-12-11 02:23:44 UTC
> 1) Why do IPv6 resolver queries if there's no IPv6 interface on which to
> communicate with any records you got back? It's a waste of network traffic,
> any data you get back will be useless anyways. 

Agreed. But since KNetworkInterface is not yet implemented, we don't know that you don't have any IPv6 addresses. Therefore, we send the request.

When we have that class implemented, I'll add the code to test for IPv6 addresses and, if none are present, avoid sending IPv6 queries.

> 2) I think the logic of your resolver is broken. I have IPv4 entries
> in /etc/hosts for doubleclick and for my own servers, there shouldn't be any
> network queries for these addresses.

Agreed, but that's not how it works. The problem is in glibc, not in our code.

> Not do two independent searches, one for IPv6 and one for IPv4.

Blame glibc. We can't tell if the address came from hosts or not. And even if we ask glibc to do the whole resolving, it will still send IPv6 DNS queries, even if the IPv4 address is in /etc/hosts.

And quite frankly, I *agree* with that behaviour. If you want to turn off IPv6 addresses, add IPv6 records to /etc/hosts. Records don't have to come from the same source to be valid.

> There's no way to indicate in /etc/hosts that there is no IPv6 record at
> all, so there's no no way to create correct host names in /etc/hosts and
> prevent DNS queries on the name.

Yes, there is. How do you tell it there is no IPv4 address? Same answer for IPv6: an invalid address, not routed. In other words, comment #47, the one right above your post. Have you read it?

> 3) Your resolver logic seems to spam the DNS servers with hundreds of
> queries for the same address. You're not even caching the records for the
> duration of a single operation. This is terribly unfriendly to DNS servers.
> I'm surprised admins haven't cried bloody murder for it already. I have a
> suspicion this is the root of my problems as I suspect the DNS server is
> rate limiting responses when it gets a ton of queries from the same source. 

Again, blame glibc and *all* the resolver implementations out there (except QDns, actually). And I might add that the glibc libresolv code comes from BIND, the DNS server daemon. We DO NOT cache, never have. Most programs out there DO NOT cache and never have. And probably never will either. This is not a KDE-only problem -- if a problem at all.

We will probably cache addresses to an extent. That might help mitigate the problem by turning down the number of queries. But we don't want to reimplement the DNS protocol, nor do we want to assume that everyone uses DNS exclusively -- and your example of /etc/hosts shows that most people don't. So we can't cache the DNS TTL.

If you want to cache, you install lwresd, nscd or a local caching named. Those programs are designed to do DNS and caching. KDE and kdelibs aren't. Those programs will work for your whole system or even your whole network. Kdelibs can't do that -- even a kded module will only go as much as KDE-wide only.

In fact, I highly recommend lwresd.
Comment 49 Gregory Stark 2004-12-11 06:21:28 UTC
> Yes, there is. How do you tell it there is no IPv4 address? Same answer for 
> IPv6: an invalid address, not routed. In other words, comment #47, the one
> right above your post. Have you read it? 

But I don't _want_ an invalid address. There's a perfectly valid IPv4 address.

Let me put it another way. There are two databases. One database has higher priority than the other. We want to look up our key in the first database and return what we find there. If there's nothing then we want to look up our key in the second database and return what we find there.

I've added an entry into the first database. That's what should be returned. Not some amalgam of records from the first database plus records from the second database. And since there's no such thing as "negative" records I can add to the first database there's no way for me to prevent this mixing.

The whole point of having the two databases -- at least in this instance -- is to be able to avoid lookups on the second database if there are local customizations. Doing two independent lookups means there's no way to have certain types of local customizations (namely IPv4 records) that avoid the lookups in the DNS.

You say it's glibc's fault, but I think you're wrong. I just tested the following program:
<pre>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

main(int argc, char *argv[]) {
  struct addrinfo *ai;
  
  getaddrinfo(argv[1], NULL, NULL, &ai);
  
} 
</pre>

And passed it names that are in my /etc/hosts with IPv4 addresses but no IPv6 addresses and they do _not_ cause AAAA queries, nor any DNS queries at all in fact. It behaves exactly as I'm suggesting is the correct behaviour.

Of course using the resolver at a lower level you can cause it to perform such queries but then I think you're using it wrong. You're performing two independent queries, one for IPv6 addresses and one for IPv4 addresses. Which has different semantics than doing one standard getaddrinfo() query.

And no, adding caching is not an acceptable work-around for broken software doing unnecessary queries. You're absolutely right than software should not cache DNS information, that's the responsibility of the DNS server (and if you insist, abominations like nscd). However neither should applications resolve the same name repeatedly *in a single operation*. Applications should resolve the name once, and then use the information for the extent of a single operation. That's not caching, that's just sane coding. Doing otherwise leads to strange inconsistent behaviour. I understand though from what I read above why that it's not so easy given the modular kio design.
Comment 50 Gregory Stark 2004-12-11 06:40:49 UTC
Consider the case of a host name with IPv6 records in DNS. If I want to override those records with an IPv4 record and no IPv6 record. In the status quo there is _no_ way to do this. 

The IPv6 record from DNS has higher priority than the IPv4 record from /etc/hosts. I think that's bogus.
Comment 51 David Faure 2004-12-11 12:25:23 UTC
AFAIK Waldo is planning on implementing caching of DNS results (maybe for KDE 4), see
his thread on kde-core-devel.

Comment 52 Thiago Macieira 2004-12-11 14:43:17 UTC
> Let me put it another way. There are two databases. One database has higher
> priority than the other. We want to look up our key in the first database
> and return what we find there. If there's nothing then we want to look up
> our key in the second database and return what we find there. 

That's more or less how it works. Except it's IPv6 is the database with higher priority, IPv4 the one with lower. Not hosts, dns.

Besides, we can't change that because it's simply not possible to tell glibc to look up only in hosts or in dns.

> I've added an entry into the first database. That's what should be returned.
> Not some amalgam of records from the first database plus records from the
> second database.

Unfortunately, that's how it works.

I've tested your program -- with a few modifications -- and you're right. glibc no longer sends IPv6 queries if a host is found in /etc/hosts. That's new behaviour, because it didn't use to work like that. And we're also talking about glibc bere, but there are other resolver codes out there that I didn't test with.

The whole reason for doing two independent queries is to speed things up. The two queries are done in parallel, meaning the result should be returned at the same time. Using getaddrinfo to look up both addresses -- and it's possible with the current code, with a one-liner change -- would make the queries sequential and double the timeout (from one minute to two or more).

Unfortunately, that's the trade-off: parallel lookups against that bogus behaviour. I fully agree with you that files should override dns at any time. But we simply can't do that and have parallel lookups and have portable code. Our best bet is 2 out of those 3.

Now, as for this bug report, the only improvement that it can see is the IPv6 lookups be tied to the presence of an IPv6 address in the machine. That feature will be implemented in the future, but mind you it will *still* not solve this bug: because it's not a bug in our code.

If you want to continue discussing improvements to the resolver code, please mail me directly. I welcome any inputs.
Comment 53 Gregory Stark 2004-12-11 18:57:04 UTC
> The whole reason for doing two independent queries is to speed things up. The
> two queries are done in parallel, meaning the result should be returned at 
> the same time. Using getaddrinfo to look up both addresses -- and it's 
> possible with the current code, with a one-liner change -- would make the 
> queries sequential and double the timeout (from one minute to two or more). 
 
> Unfortunately, that's the trade-off: parallel lookups against that bogus
> behaviour. I fully agree with you that files should override dns at any time. 
> But we simply can't do that and have parallel lookups and have portable code.
> Our best bet is 2 out of those 3. 

But you're choosing performance over correctness.

And I think the performance gain is questionable. A big part of my problem with this behaviour is precisely the performance impact of all those unnecessary DNS queries. And it is a lot of DNS queries, run tcpdump and you'll see hundreds of DNS queries for a single page. Even for things in /etc/hosts.

I think you're much better off lobbying the glibc folk to implement getaddrinfo with your suggested parallel query implementation than trying to roll your own resolver.
Comment 54 Gregory Stark 2004-12-11 19:01:20 UTC
I'm also skeptical that you're still looking to solve the problem with the redundant lookups using caching. Caching DNS results locally in the application is a bad practice. It's hard to get right and it's already handled by the DNS server. It's not a good work-around for excessive resolutions.

Consider what happens if two browser windows both try to load pages on a particular web site. The second browser window shouldn't use the cached information from the first one, it should do an independent lookup. The data may in fact have changed in the moment between the two lookups.

However all the activity to complete the operation in the first browser window should happen with the same resolver response. It shouldn't repeatedly resolve the same name for a single page rendering. That's just inefficient and in that case you don't _want_ to reflect DNS changes that happened in the middle of the operation. That would only make things weird and inconsistent if the new host serves different content than the old host.

This is different from caching. Caching bad; looking up information once per operation good.
Comment 55 Gregory Stark 2004-12-12 03:43:54 UTC
From personal email there still seems to be a bit of confusion over what's going on here.

What's going on is that you've gone to some effort to reimplement getaddrinfo's IPv6, IPv4 lookups yourself using separate lookups in order to implement a performance hack. You should always be suspicious when you find yourself expending substantial energy reimplementing standard system functions. In this case notice that this bug report on this performance hack was originally triggered by *performance* problems it caused.

In reimplementing this standard function you've caused the following consequences:

1) You got the semantics wrong.

2) Your applications will behave inconsistently compared to the rest of the system. When every application rewrites some basic system function it makes the system harder to manage and use.

3) For people with no IPv6 interfaces your implementation is *slower*. getaddrinfo properly doesn't query for network addresses for address families for which no interfaces are present.

4) For people with IPv4 hosts in /etc/hosts entries your system has security implications. It leaks information about the internal network usage to outside networks in the form of DNS traffic.

Considering that the original motivation was a performance hack, and the result for me is to make it *slower* and *buggy*, what I ask for is this:

You say it's a one-line change to make KDE use the standard getaddrinfo interface? Then please, make at the very least an option, call it KDE_USE_STANDARD_RESOLVER or whatever. Better yet, make the standard function the default and add an option KDE_BROKEN_RESOLVER_PERFORMANCE_HACK to enable it.
Comment 56 Thiago Macieira 2004-12-12 06:07:32 UTC
> What's going on is that you've gone to some effort to reimplement
> getaddrinfo's IPv6, IPv4 lookups yourself using separate lookups in order to
> implement a performance hack.

We haven't reimplemented anything. We just call getaddrinfo twice.

> In this case notice that this bug report on this performance hack was
> originally triggered by *performance* problems it caused.

No. The problem in this bug report has been present long before this code was written. It used to happen when we called getaddrinfo once, normally, just like you want us to. The problem is that the buggy DNS servers do not reply for an AAAA query.

> 1) You got the semantics wrong.

I don't agree. I can't agree. How can calling getaddrinfo be wrong?

> 2) Your applications will behave inconsistently compared to the rest of the
> system.

Agreed. The bug is in glibc.

> 3) For people with no IPv6 interfaces your implementation is *slower*.

No. It is faster because the two lookups are done in parallel. The standard getaddrinfo call does them in sequence, so it is slower.

> getaddrinfo properly doesn't query for network addresses for address
> families for which no interfaces are present.

It has to. The fact that there is no IPv6 addresses in any interfaces doesn't mean IPv6 isn't supported. By simply doing socket(AF_INET6, SOCK_STREAM, 0) as a result of a getaddrinfo(3) returned value, an IPv6 address may be created and the network be set up.

Therefore, it's WRONG not to return IPv6 addresses based on interface configuration, unless asked to. There is, however, a non-standard, glibc-exclusive and, therefore, unportable switch to tell getaddrinfo to do that. It is not active by default, and it is very doubtful KDE will ever use that.

> 4) For people with IPv4 hosts in /etc/hosts entries your system has security
> implications. It leaks information about the internal network usage to
> outside networks in the form of DNS traffic.

glibc bug. Tell their maintainers about that.

> You say it's a one-line change to make KDE use the standard getaddrinfo
> interface?

It's a one-line change to make it use a sequential, one call lookup. Yes.

> Better yet, make the standard function the default and add an option
> KDE_BROKEN_RESOLVER_PERFORMANCE_HACK to enable it.

I will not, because I don't agree this is the best solution. I think the best solution is in the form of a simple patch to libnss_files. Not a one-liner, but probably very simple.

Let me summarise: the bug here is that some sites are running DNS servers that simply *drop* AAAA queries. The DNS resolution will be slow, no matter if the lookup is done sequentially or in parallel. The only solutions for that are: blacklisting those domains, or adding bogus entries in /etc/hosts. Mind you this affects Mozilla as well.

The rest is a *different* problem. Please either open another bug report, or talk to me in private.
Comment 57 Thiago Macieira 2004-12-12 06:13:21 UTC
About the flag to getaddrinfo, I was wrong. It is standardised: AI_ADDRCONFIG. It has been introduced in RFC 3493, but was not present in RFC 2553.

But, as I said, it's not on by default. So getaddrinfo, by default, does IPv6 queries even if your computer doesn't have any IPv6 interfaces. Our old code had been written before RFC 3493, so there was no AI_ADDRCONFIG at the time.
Comment 58 Germain Garand 2004-12-12 19:29:50 UTC
By the way, it seems matters have recently got better for at least doubleclick.net 
AAAA query times are now decent on country specific subdomains (ad.{uk,fr,de..})  
ad.doubleclick.net itself is still bogus.
Comment 59 Filip Vancoillie 2004-12-12 20:39:14 UTC
I'm the one who filed the bug report originally.  I don't have linux 
running now because I'm doing internship in another country.
So things I comment are based on what I read in the mails for this bug 
and also some basic knowledge I have about networks, dns, ...

1/ My question is that for loading one web page, DNS resolving should 
be necessary only once?  (Let's not consider links on a web page to 
things like images that are to be found on another domain)  
If dns is resolved more than once for www.luchtvaartnieuws.nl let's 
say, in case of a slow DNS-server response, this could slow down 
loading sites significantly, no?  This issue has nothing to do with the 
process of resolving itself...  It's only that the resolving happens 
several times, each time getting just the same DNS address.  Also the 
fact that it is resolving twice in parallel isn't very relevant...  

2/ Also I don't understand why a slow response of ad.doubleclick.net 
should make a website that includes a link to ad.doubleclick.net be 
loading slowly..  If any kind of query to ad.doubleclick.net is slow, 
the original site still can be loading in the meanwhile.

Maybe my first comment is the thing Thiago is suggesting to contact him 
on privately.  I cannot file a bug report because of lack of 
understanding the issue.  I suppose Gregory can do this better.
Anyway, forgive me the comments if I just don't understand it...  I 
don't know what a AAAA query means anyway.
I only try to contribute in getting things more clear.  


chao
filip
(kde is a great project!)

Citeren Germain Garand <germain@ebooksfrance.com>:

> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>         
> http://bugs.kde.org/show_bug.cgi?id=70363        
> 
> 
> 
> 
> ------- Additional Comments From germain ebooksfrance com  2004-12-12
> 19:29 -------
> By the way, it seems matters have recently got better for at least
> doubleclick.net 
> AAAA query times are now decent on country specific subdomains
> (ad.{uk,fr,de..})  
> ad.doubleclick.net itself is still bogus.
> 
> 



Comment 60 Thiago Macieira 2004-12-12 21:21:47 UTC
> 1/ My question is that for loading one web page, DNS resolving should 
> be necessary only once?

Yes. But the internal KDE implementation may make the resolution be tried 5 times, maybe more. That happens because we've got separate processes doing the HTTP connection. We have plans to improve that.

> 2/ Also I don't understand why a slow response of ad.doubleclick.net 
> should make a website that includes a link to ad.doubleclick.net be 
> loading slowly..  If any kind of query to ad.doubleclick.net is slow, 
> the original site still can be loading in the meanwhile.

There's a limit to the number of connections attempted simultaneously. So, yes, the rest of the site is loading in the background, but some things are held. Besides, the ads are generally at the top of the page, and while there's no respose for them, the site can't render, even if all the rest has been loaded.
Comment 61 Thiago Macieira 2005-01-04 16:28:09 UTC
*** Bug 96298 has been marked as a duplicate of this bug. ***
Comment 62 Marcel Partap 2005-01-12 13:16:11 UTC
the register loads fast here (CVS), others slow but it seems it is because of network performance. khtml can't render what's not there yet.. and if it's there, rendering happens lightning fast as always ;==)
Comment 63 Thiago Macieira 2005-01-25 17:23:12 UTC
Here's the patch that adds the blacklist.



Created an attachment (id=9292)
resolver-blacklist.diff
Comment 64 Thiago Macieira 2005-01-25 18:58:05 UTC
CVS commit by thiago: 

Commit the fixes for buggy IPv6 DNS servers.
- adds a test for IPv6 support on the running host and will disable
  direct IPv6 lookups if IPv6 not supported
- tell getaddrinfo to use AI_ADDRCONFIG, if available
  (i.e., same behaviour)
- adds support for hostname blacklisting. The file will be
  $KDEDIRS:$KDEHOME/share/config/ipv6blacklist, but it's a *full*
  blacklist, not just IPv6

We're now collecting hostnames served by buggy DNS servers. Please
mail me directly (thiago.macieira@kdemail.net) with those addresses.

Future improvement: make hasIPv6 cache its result, in a thread-safe
manner. Requires compiler thread-safe initialisation of static
variables.

BUG:70363
CCMAIL:kde-core-devel@kde.org


  M +107 -6    kresolverstandardworkers.cpp   1.15
  M +19 -0     kresolverstandardworkers_p.h   1.8



Comment 65 colesen 2005-01-26 06:21:45 UTC
What about KDE_NO_IPV6? I think SuSE 9.2 with the latest KDE has KDE_NO_IPV6=true by default.
Comment 66 Thiago Macieira 2005-01-26 10:29:49 UTC
That is no solution. It's a horrible workaround that kills the symptoms in most cases. It's also the antithesis of what I've been striving to do. See comment #17.

As for what SuSE is doing, KDE has no say in it. Our default environment doesn't have that switch. If SuSE's has, there's nothing we can do.
Comment 67 Filip Vancoillie 2005-02-17 23:06:04 UTC
A small question on the solution you implemented.  When a server changes
to support IPv6, it should be removed from the blacklist..?  Or is this
never going to cause inconvenience..  Or am I not understanding it.  :-)

chao
Filip

Citeren Thiago Macieira <thiago.macieira@kdemail.net>:

> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>          
> http://bugs.kde.org/show_bug.cgi?id=70363         
> thiago.macieira kdemail net changed:
> 
>            What    |Removed                     |Added
>
----------------------------------------------------------------------------
>              Status|UNCONFIRMED                 |RESOLVED
>          Resolution|                            |FIXED
> 
> 
> 
> ------- Additional Comments From thiago.macieira kdemail net 
> 2005-01-25 18:58 -------
> CVS commit by thiago: 
> 
> Commit the fixes for buggy IPv6 DNS servers.
> - adds a test for IPv6 support on the running host and will disable
>   direct IPv6 lookups if IPv6 not supported
> - tell getaddrinfo to use AI_ADDRCONFIG, if available
>   (i.e., same behaviour)
> - adds support for hostname blacklisting. The file will be
>   $KDEDIRS:$KDEHOME/share/config/ipv6blacklist, but it's a *full*
>   blacklist, not just IPv6
> 
> We're now collecting hostnames served by buggy DNS servers. Please
> mail me directly (thiago.macieira kdemail net) with those addresses.
> 
> Future improvement: make hasIPv6 cache its result, in a thread-safe
> manner. Requires compiler thread-safe initialisation of static
> variables.
> 
> BUG:70363
> CCMAIL:kde-core-devel kde org
> 
> 
>   M +107 -6    kresolverstandardworkers.cpp   1.15
>   M +19 -0     kresolverstandardworkers_p.h   1.8
> 
> 



Comment 68 Thiago Macieira 2005-02-18 05:12:08 UTC
This is to be used ONLY when sites use buggy DNS servers, which is the major cause of slowness in webpage loading. As soon as they start using decent servers, they can be removed.

Being in the list causes no effect otherwise.
Comment 69 Thiago Macieira 2005-04-24 18:02:45 UTC
*** Bug 104468 has been marked as a duplicate of this bug. ***
Comment 70 Thiago Macieira 2005-08-02 04:47:24 UTC
SVN commit 442249 by thiago:

Importing the first ipv6blacklist file. All the domains listed in this
file will do only IPv4 resolution. This is a workaround for broken DNS
servers out there and wouldn't be needed if admins were all competent
and used proper software.

When adding a domain to this file, please note the bug report that
indicated the brokenness. Also, all domains in this file must be
rechecked periodically.

doubleclick.net: Bug #70363, also blacklisted in Firefox
linebourse.fr: Bug #109984
banquepopulaire.fr: Bug #109984

CCBUG:70363
BUG:109984


 M  +3 -0      Makefile.am  
 A             ipv6blacklist  


--- branches/KDE/3.5/kdelibs/kdecore/network/Makefile.am #442248:442249
@@ -50,5 +50,8 @@
 	ksocketbuffer_p.h \
 	syssocket.h
 
+configdir = $(kde_confdir)
+config_DATA = ipv6blacklist
+
 # let automoc handle all of the meta source files (moc)
 METASOURCES = AUTO
Comment 71 Tommi Tervo 2005-10-03 15:22:28 UTC
*** Bug 87690 has been marked as a duplicate of this bug. ***
Comment 72 Tommi Tervo 2006-11-28 09:58:27 UTC
*** Bug 135488 has been marked as a duplicate of this bug. ***