Bug 68894 - Cache failed connections
Summary: Cache failed connections
Status: RESOLVED FIXED
Alias: None
Product: kio
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: LO wishlist
Target Milestone: ---
Assignee: Thiago Macieira
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-11-23 23:45 UTC by Hasso Tepper
Modified: 2018-04-23 18:27 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hasso Tepper 2003-11-23 23:45:42 UTC
Version:            (using KDE Devel)
Installed from:    Compiled sources
OS:          Linux

There are hosts out there which have both IPv4 and IPv6 addresses and both A and AAAA entries in DNS. Mozilla's behavior - attempt IPv6 at first and if it fails, fallback to IPv4. It also remembers that IPv6 failed and doesn't attempt it any more during current session. Konqueror tries connection to IPv6 address only. If it fails it gives error message (timeout) and doesn't fallback to IPv4.

Example - http://www.yzis.org. It has both A and AAAA entries, but there is no webserver listening on IPv6 address (not sure if it's temporary or not).
Comment 1 Thiago Macieira 2003-11-24 20:42:23 UTC
The current behaviour is as you want it to be, even though we don't cache failed IPv6 addresses.

The problem with the timeout is that it's set to 20-30 seconds for HTTP, while one single failed connect(2) attempt takes 60 seconds to timeout. Therefore, it will never get to the second or following entries. And since IPv6 always comes first than IPv4, if NONE the IPv6 addresses reply, then the site is as good as dead.

Also note here that I'm not talking about webservers not running. I'm talking about addresses that don't reply at all, such as this one. This is a very serious admin fault, because he placed an invalid IP in his DNS pool.

I've just tried Mozilla 1.4 and Konqueror on that site: Konqueror bailed out after 30 seconds, but Mozilla spent two or three minutes doing nothing, and only then displayed the site. Konqueror can behave the same way if you increase your (global!) timeout values to more than 3 minutes. The only problem there is that multiple kio_http processes will be launched, meaning that the time for the page to load would be around 6 minutes.

As I said, this is a very serious admin fault.
Comment 2 Mickael Marchand 2003-11-25 01:41:24 UTC
yzis.org should be back on ipv6 now.
that was temporary following a hard drive crash and a reinstallation of the box

as far as I understand ipv6, Thiago is fully right ... but I did not think to it before ;)

cheers,
Mik
Comment 3 Thiago Macieira 2003-11-25 02:56:16 UTC
Yep, I can contact it now.

I'm leaving this report open now as a wishlist for the future. It might be interesting to cache failed connections for a while in the io-master (so that all slaves to one application benefit from it). 

We've also had a problem with round-robin FTP servers that were not entirely equal in contents. That's obviously a site bug, but it might be interesting as well.
Comment 4 Hasso Tepper 2003-11-28 09:04:34 UTC
Subject: Re:  Fallback to IPv4 if IPv6 fails

> The current behaviour is as you want it to be, even though
> we don't cache failed IPv6 addresses.

I'm not sure.

> The problem with the timeout is that it's set to 20-30 seconds for
> HTTP, while one single failed connect(2) attempt takes 60 seconds
> to timeout. Therefore, it will never get to the second or following
> entries. And since IPv6 always comes first than IPv4, if NONE the
> IPv6 addresses reply, then the site is as good as dead.

Should we change default timeouts?

And there is more than that. What happens if we get host/port 
unreachable or someting like this from IPv6? Mozilla tries IPv4 after 
that, Konq doesn't. It's easy to test with putting bogus IPv6 address 
to /etc/hosts file for slashdot.org for example.

I searched for documentation how applications should behave in such 
situations and found RFC3338 5.5.

And more on this topic. Even if behavior will be same as it is, we 
should give to the user more feedback what's going on. At the moment 
it's easy to get confused "why it works from this machine and why it 
doesn't from other". You don't get any info from error messages what 
might be wrong.

Comment 5 Thiago Macieira 2003-11-28 13:11:19 UTC
> And there is more than that. What happens if we get host/port 
> unreachable or someting like this from IPv6? Mozilla tries IPv4 after 
> that, Konq doesn't.  
 
Yes it does. Haven't you ever tried?

ftp://localhost on Konqueror generates:
kdecore (KSocket): Starting connect to localhost|21: have 0 local entries and 2 remote
kdecore (KSocket): Trying to connect to [::1]:21
kdecore (KSocket): Socket 8 did not connect: Connection refused
kdecore (KSocket): Trying to connect to 127.0.0.1:21
[then it connects]

http://norway.local.lan from Konqueror:
kdecore (KSocket): Trying to connect to [fec0::8000:200:21ff:fe69:43a7]:80
kdecore (KSocket): Socket 9 did not connect: Connection refused
kdecore (KSocket): Trying to connect to 172.26.0.3:80
[then it connects]

By the way, http://localhost does:
kdecore (KSocket): Starting connect to localhost|80: have 0 local entries and 3 remote
kdecore (KSocket): Trying to connect to [::1]:80
[then it connects]

> It's easy to test with putting bogus IPv6 address 
> to /etc/hosts file for slashdot.org for example.

Yea, sure. Put a bogus IPv4 address in there and let's see what happens. It's no different. And as I said, it doesn't work because the connection timeout expires before any other addresses are tried.

> Even if behavior will be same as it is, we 
> should give to the user more feedback what's going on.

That much I agree with you.

I'll provide a connectingTo(QSocketAddress) signal from QClientSocketBase. That signal will have to be connected to a method in KIO::SlaveBase which would tell the application what it is doing.

That's for KDE 3.3.
Comment 6 Hasso Tepper 2003-11-28 15:02:39 UTC
Hmmm. Now I see your point, but problem is that it DOESN'T (and never did) work for me. I added bogus entry for slashdot.org in my /etc/hosts file:

kdecore(KSocket): Trying to connect to Inet6 2009::1 port 80
kdecore(KSocket): Socket 7 did not connect: Network is unreachable
kdecore(KSocket): Failed to connect
kio (KRun): ERROR: 0x834c980 ERROR 23 Ühenduse loomine serverisse www.slashdot.org ebaonnestus
[in estonian connection failed]

It's HEAD compiled about week ago. Any ideas?
Comment 7 Thiago Macieira 2003-11-28 22:14:24 UTC
That's the name resolution code. It's not my fault. Bug the glibc people then.

Anyways, the new resolver code does this:
QGetHostByNameWorker::run() for [slashdot.org]:80: wantV6 = 1, wantV4 = 1
ResolveThread::run(): started threaded gethostbyname for slashdot.org (af = 2)
ResolveThread::run(): started threaded gethostbyname for slashdot.org (af = 10)
ResolveThread::run(): gethostbyname for slashdot.org (af = 10) returned: 0
ResolveThread::run(): gethostbyname for slashdot.org (af = 2) returned: 0
QStandardWorker::processResults: adding 66.35.250.150:80
QStandardWorker::processResults: adding [2009::1]:80
[...]
kdecore (KSocket): Starting connect to slashdot.org|80: have 0 local entries and 2 remote
kdecore (KSocket): Trying to connect to [2009::1]:80
kdecore (KSocket): Socket 9 did not connect: Network is unreachable
kdecore (KSocket): Trying to connect to 66.35.250.150:80
[connected]

It's not HEAD. It's the a branch, for KDE 3.3.

And, as I said, it's the exact same behaviour as an invalid IPv4 address. The socket connection code doesn't even know it's trying an IPv6 address. I made the code completely agnostic to the IP version or even the Address Family.
Comment 8 Thiago Macieira 2004-08-11 00:48:41 UTC
The code I mentioned in comment #7 has been since included in KDE HEAD and has been released in KDE 3.3 RC2. You may want to re-test this bug.

As for the caching of non-responsive or negative-responding sites in the iomaster, see Bug #63088.
Comment 9 Thiago Macieira 2005-03-04 03:37:39 UTC
*** Bug 100777 has been marked as a duplicate of this bug. ***