Bug 162600 - KIO::TCPSlaveBase::connectToHost nearly always fails. makes konqueror and friends completely unusable
Summary: KIO::TCPSlaveBase::connectToHost nearly always fails. makes konqueror and fri...
Status: RESOLVED FIXED
Alias: None
Product: kio
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR major
Target Milestone: ---
Assignee: Thiago Macieira
URL:
Keywords:
: 155157 166366 168619 168921 171230 176576 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-05-25 15:45 UTC by Armin Berres
Modified: 2011-05-20 03:45 UTC (History)
27 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Separated host lookup and connectToHost in TCPSlaveBase. (1.15 KB, patch)
2008-05-26 21:30 UTC, Roland Harnau
Details
Separated host lookup and connectToHost in TCPSlaveBase. (1.15 KB, patch)
2008-05-26 22:35 UTC, Roland Harnau
Details
Separated host lookup and connectToHost in TCPSlaveBase. (1.15 KB, patch)
2008-05-26 22:35 UTC, Roland Harnau
Details
Traffic dump (29.11 KB, text/plain)
2008-07-12 20:43 UTC, Thomas McGuire
Details
complete verbose dump (25.93 KB, text/plain)
2008-07-13 18:22 UTC, Robin Knapp
Details
dump in pcap format (2.44 KB, application/octet-stream)
2008-07-13 18:32 UTC, Robin Knapp
Details
Traffic dump with tcpdump while trying to connect to www.test.de with Konqueor4 (1.59 KB, application/octet-stream)
2008-07-19 12:25 UTC, Thomas McGuire
Details
Traffic dump with tcpdump while trying to connect to www.test.de with Konqueor3 (1.14 KB, application/octet-stream)
2008-07-19 12:25 UTC, Thomas McGuire
Details
pcap trace of the FB 7170, fw 29.04.59, AAAA query (1.44 KB, application/octet-stream)
2008-10-23 14:44 UTC, Matthias Raffelsieper
Details
DNS with FritzBox as DNS-Proxy (110.57 KB, application/octet-stream)
2008-10-23 19:58 UTC, Thomas Schuetz
Details
Direct DNS-request to an DNS-Server in the internet (282.35 KB, application/octet-stream)
2008-10-23 19:59 UTC, Thomas Schuetz
Details
DNS-request with IPv6 disabled (302.30 KB, application/octet-stream)
2008-10-23 20:00 UTC, Thomas Schuetz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Armin Berres 2008-05-25 15:45:20 UTC
Version:            (using Devel)
Installed from:    Compiled sources
OS:                Linux

Konqeror, akregator and ktorrent nearly always fail to connect to the remote server when using a FQDN. When using a IP-Adress instead things are working better.

With "KDE_FORK_SLAVES=true" I always see a similar error message.
Here is an example from konqueror:
###########
kio_http(9761)/kio_http_debug HTTPProtocol::get: "http://heise.de/"                                                                                           
kio_http(9761)/kio_http_debug HTTPProtocol::checkRequestUrl: "http://heise.de/"                                                                               
kio_http(9761)/kio_http_debug HTTPProtocol::resetSessionSettings: Using proxy: false URL:  "" Realm:  ""                                                      
kio_http(9761)/kio_http_debug HTTPProtocol::resetSessionSettings: Enable Persistent Proxy Connection:  false                                                  
kio_http(9761)/kio_http_debug HTTPProtocol::resetSessionSettings: Window Id = "65011713"                                                                      
kio_http(9761)/kio_http_debug HTTPProtocol::resetSessionSettings: ssl_was_in_use = ""                                                                         
kio_http(9761)/kio_http_debug HTTPProtocol::retrieveContent:                                                                                                  
kio_http(9761)/kio_http_debug HTTPProtocol::retrieveHeader:                                                                                                   
kio_http(9761)/kio_http_debug HTTPProtocol::httpOpen:                                                                                                         
kio_http(9761)/kio_http_debug HTTPProtocol::isOffline: networkstatus <unreachable>                                                                            
kio_http(9761)/kio_http_debug HTTPProtocol::httpCheckConnection: Keep Alive: true First: false                                                                
kio_http(9761)/kio_http_debug HTTPProtocol::httpOpen: Calling checkCachedAuthentication                                                                       
kio_http(9761)/kio (kioslave) KIO::SlaveBase::checkCachedAuthentication: window = 65011713 url = KUrl("http://heise.de/")                                     
kio_http(9761) HTTPProtocol::httpOpen: ============ Sending Header:                                                                                           
kio_http(9761) HTTPProtocol::httpOpen: "GET / HTTP/1.1"                                                                                                       
kio_http(9761) HTTPProtocol::httpOpen: "Connection: Keep-Alive"                                                                                               
kio_http(9761) HTTPProtocol::httpOpen: "User-Agent: Mozilla/5.0 (compatible; Konqueror/4.0; Linux) KHTML/4.0.74 (like Gecko)"                                 
kio_http(9761) HTTPProtocol::httpOpen: "Accept: text/html, image/jpeg, image/png, text/*, image/*, */*"                                                       
kio_http(9761) HTTPProtocol::httpOpen: "Accept-Encoding: x-gzip, x-deflate, gzip, deflate"                                                                    
kio_http(9761) HTTPProtocol::httpOpen: "Accept-Charset: utf-8, utf-8;q=0.5, *;q=0.5"                                                                          
kio_http(9761) HTTPProtocol::httpOpen: "Accept-Language: en-US, en"                                                                                           
kio_http(9761) HTTPProtocol::httpOpen: "Host: heise.de"                                                                                                       
kio_http(9761)/kio_http_debug HTTPProtocol::httpOpenConnection:                                                                                               
kio_http(9761)/kssl KIO::TCPSlaveBase::disconnectFromHost:                                                                                                    
kio_http(9761)/kssl KIO::TCPSlaveBase::connectToHost: before connectToHost: Socket error is 0 , Socket state is 0                                             
kio_http(9761)/kssl KIO::TCPSlaveBase::connectToHost: after connectToHost: Socket error is 0 , Socket state is 1                                              
konqueror(9740)/kdeui (KMainWindow) KMainWindow::saveMainWindowSettings: KMainWindow::saveMainWindowSettings  "Profile"                                       
kio_http(9761)/kssl KIO::TCPSlaveBase::connectToHost: after waitForConnected: Socket error is 6 , Socket state is 0 , waitForConnected returned  false        
kio_http(9761)/kio_http_debug HTTPProtocol::httpOpen: Couldn't connect, oopsie!                                                                               
kio_http(9761)/kio_http_debug HTTPProtocol::httpClose:                                                                                                        
kio_http(9761)/kio_http_debug HTTPProtocol::httpClose: keep alive ( 60 )
###########


An here one from ktorrent:
###########
kio_http(7590)/kio_http_debug HTTPProtocol::reparseConfiguration:                                                                                             
kio_http(7590)/kio_http_debug HTTPProtocol::setHost: Hostname is now: "tracker.opensuse.org" ( "tracker.opensuse.org" )                                       
kio_http(7590)/kio_http_debug HTTPProtocol::get: "http://tracker.opensuse.org:6969/announce?peer_id=-KT31B2-aAQczbJuj2Za&port=6881&uploaded=0&downloaded=0&left=688128000&compact=1&numwant=100&key=398488384&event=started&info_hash=%96!%e9%f2%15%cdd%88%af%23%0fT%fa%1d%22H6%08%d2%03"                                   
kio_http(7590)/kio_http_debug HTTPProtocol::checkRequestUrl: "http://tracker.opensuse.org:6969/announce?peer_id=-KT31B2-aAQczbJuj2Za&port=6881&uploaded=0&downloaded=0&left=688128000&compact=1&numwant=100&key=398488384&event=started&info_hash=%96!%e9%f2%15%cdd%88%af%23%0fT%fa%1d%22H6%08%d2%03"                       
kio_http(7590)/kio_http_debug HTTPProtocol::resetSessionSettings: Using proxy: false URL:  "" Realm:  ""                                                      
kio_http(7590)/kio_http_debug HTTPProtocol::resetSessionSettings: Enable Persistent Proxy Connection:  false                                                  
kio_http(7590)/kio_http_debug HTTPProtocol::resetSessionSettings: Window Id = ""                                                                              
kio_http(7590)/kio_http_debug HTTPProtocol::resetSessionSettings: ssl_was_in_use = ""                                                                         
kio_http(7590)/kio_http_debug HTTPProtocol::retrieveContent:                                                                                                  
kio_http(7590)/kio_http_debug HTTPProtocol::retrieveHeader:                                                                                                   
kio_http(7590)/kio_http_debug HTTPProtocol::httpOpen:                                                                                                         
kio_http(7590)/kio_http_debug HTTPProtocol::isOffline: networkstatus <unreachable>                                                                            
kio_http(7590)/kio_http_debug HTTPProtocol::httpCheckConnection: Keep Alive: true First: false                                                                
kio_http(7590)/kio_http_debug HTTPProtocol::httpOpen: Calling checkCachedAuthentication                                                                       
kio_http(7590)/kio (kioslave) KIO::SlaveBase::checkCachedAuthentication: window = 0 url = KUrl("http://tracker.opensuse.org:6969/announce?peer_id=-KT31B2-aAQczbJuj2Za&port=6881&uploaded=0&downloaded=0&left=688128000&compact=1&numwant=100&key=398488384&event=started&info_hash=%96!%e9%f2%15%cdd%88%af%23%0fT%fa%1d%22H6%08%d2%03")
kio_http(7590) HTTPProtocol::httpOpen: ============ Sending Header:
kio_http(7590) HTTPProtocol::httpOpen: "GET /announce?peer_id=-KT31B2-aAQczbJuj2Za&port=6881&uploaded=0&downloaded=0&left=688128000&compact=1&numwant=100&key=398488384&event=started&info_hash=%96!%e9%f2%15%cdd%88%af%23%0fT%fa%1d%22H6%08%d2%03 HTTP/1.1"
kio_http(7590) HTTPProtocol::httpOpen: "Connection: Keep-Alive"
kio_http(7590) HTTPProtocol::httpOpen: "User-Agent: KTorrent/3.1beta2"
kio_http(7590) HTTPProtocol::httpOpen: "Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2"
kio_http(7590) HTTPProtocol::httpOpen: "Accept-Encoding: x-gzip, x-deflate, gzip, deflate"
kio_http(7590) HTTPProtocol::httpOpen: "Host: tracker.opensuse.org:6969"
kio_http(7590)/kio_http_debug HTTPProtocol::httpOpenConnection:
kio_http(7590)/kssl KIO::TCPSlaveBase::disconnectFromHost:
kio_http(7590)/kssl KIO::TCPSlaveBase::connectToHost: before connectToHost: Socket error is 6 , Socket state is 0
kio_http(7590)/kssl KIO::TCPSlaveBase::connectToHost: after connectToHost: Socket error is 6 , Socket state is 1
kio_http(7590)/kssl KIO::TCPSlaveBase::connectToHost: after waitForConnected: Socket error is 6 , Socket state is 0 , waitForConnected returned  false
kio_http(7590)/kio_http_debug HTTPProtocol::httpOpen: Couldn't connect, oopsie!
###########

If you need more information or want me to debug something just ask.
Comment 1 Thibaut Cousin 2008-05-25 23:39:03 UTC
Is this related to the "Unknown error" that happens in Konqueror and KGet and was mentioned in bugreport #154774?

The conclusion in this bugreport is that the problem isn't in Konqueror, but deeper.
Comment 2 Armin Berres 2008-05-25 23:45:45 UTC
If I understood #kde-devel correctly this should be the same bug.
Does ktorrent work for you?

And what happens if you set "KDE_FORK_SLAVES=true" to true? Do you also find "kio_http(9761)/kssl KIO::TCPSlaveBase::connectToHost: after waitForConnected: Socket error is 6 , Socket state is 0 , waitForConnected returned  false      " in the debug output?
Comment 3 Armin Berres 2008-05-26 00:56:18 UTC
So, Modestas Vainius found the problem in my wireshark logs:
Kio sends two dns requests at nearly the same time. The router gets confused by this problem and answers with *one* request on the port of the first request with the transaction id of the second package. The resulting package is invalid and gets discarded by glibc then.

Now we got the reason, the question is just if there's a way to work around kio (or something in qt, or...) making two requests in a row.
Another issue is that kio makes a new dns request for each query and doesn't share the resolved IP between several kio instances. But that's another story...
Comment 4 Armin Berres 2008-05-26 02:50:35 UTC
So, here you can find the debug output of the qt deno browser and konqueror with enabled qtnetwork debugging information: http://alioth.debian.org/~trigger-guest/the_broken_dns_case/
Seems as if they are both doing the same, but the qt browser succeeds, while konqueror fails. 
Who sees the important difference? I don't right now.
Comment 5 Armin Berres 2008-05-26 02:54:43 UTC
Oh, one thing: konqueror also can't load trolltech.com and the qt browser can load heise.de...
Comment 6 Christoph Feck 2008-05-26 03:09:09 UTC
The difference between the KDE and Qt logs from comment #4 is that KDE uses waitForConnected(), while Qt/WebKit uses some hostFound() slot. A quick peek into Qt reveals that waitForConnected() aborts the current DNS lookup and starts another one (cf. file qabstractsocket.cpp, line 1464 from qt-4.4.0 release)
Comment 7 Jure Repinc 2008-05-26 10:01:14 UTC
*** This bug has been confirmed by popular vote. ***
Comment 8 Roland Harnau 2008-05-26 21:30:44 UTC
Created attachment 24956 [details]
Separated host lookup and connectToHost in TCPSlaveBase.

waitForConnected()  indeed aborts a running asynchronous DNS lookup initiated
by connectToHost(QString host,...), looks up the host itself and eventually
runs into a Qt bug. The single QHostInfoAgent instance is apparently not
thread-safe due to improper locking of QHostInfoAgent::queries, but is
nevertheless accessed from two different threads: The one initiated by
connectToHost(..), and the main thread, when waitForConnected() tries to abort
the lookup.

Attached is a patch for TCPSlaveBase::connectToHost. The host lookup is done
via QHostInfo::fromName(), and then KTcpSocket::connectToHost(QHostAddress,..)
is called instead of KTcpSocket::(QString host,..).
Comment 9 Thiago Macieira 2008-05-26 21:41:12 UTC
Please give me the details of the QHostInfo issues. Or, better yet, report them to qt-bugs@trolltech.com. I'll take care of fixing it there.
Comment 10 Christoph Feck 2008-05-26 21:51:47 UTC
There may be a small problem with the patch: It simply uses the first resolved IP, and never tries the others (if any). Maybe just a foreach over the list of returned IPs, until it finds an IP for that waitForConnected() does not return failure? Maybe try them all at once, and pick the first one that works? Maybe the lookup returns both the IPv6 and IPv4 addresses, and selecting the wrong one causes a further delay.

I cannot test the patch, as I do not run KDE from compiled sources. Does it work around the Fritz router problem? Armin? Daniel?
Comment 11 Armin Berres 2008-05-26 21:54:30 UTC
I will test the patch in a minute.
Isn't it the job of the glibc to give you a random IP? I don't remember the details, but I thought this is the case.
The only problem is the ipv6 lookup btw. If this doesn't succeed an ipv4 lookup will happen afterwards.
Comment 12 Roland Harnau 2008-05-26 22:35:18 UTC
Created attachment 24959 [details]
Separated host lookup and connectToHost in TCPSlaveBase.

waitForConnected()  indeed aborts a running asynchronous DNS lookup initiated
by connectToHost(QString host,...), looks up the host itself and eventually
runs into a Qt bug. The single QHostInfoAgent instance is apparently not
thread-safe due to improper locking of QHostInfoAgent::queries, but is
nevertheless accessed from two different threads: The one initiated by
connectToHost(..), and the main thread, when waitForConnected() tries to abort
the lookup.

Attached is a patch for TCPSlaveBase::connectToHost. The host lookup is done
via QHostInfo::fromName(), and then KTcpSocket::connectToHost(QHostAddress,..)
is called instead of KTcpSocket::(QString host,..).
Comment 13 Roland Harnau 2008-05-26 22:35:45 UTC
Created attachment 24960 [details]
Separated host lookup and connectToHost in TCPSlaveBase.

waitForConnected()  indeed aborts a running asynchronous DNS lookup initiated
by connectToHost(QString host,...), looks up the host itself and eventually
runs into a Qt bug. The single QHostInfoAgent instance is apparently not
thread-safe due to improper locking of QHostInfoAgent::queries, but is
nevertheless accessed from two different threads: The one initiated by
connectToHost(..), and the main thread, when waitForConnected() tries to abort
the lookup.

Attached is a patch for TCPSlaveBase::connectToHost. The host lookup is done
via QHostInfo::fromName(), and then KTcpSocket::connectToHost(QHostAddress,..)
is called instead of KTcpSocket::(QString host,..).
Comment 14 Harri Porten 2008-05-27 00:37:40 UTC
Regarding the getaddrinfo() usage in qhostinfo_unix.cpp:

Recommendation to always set AI_ADDRCONFIG: http://people.redhat.com/drepper/linux-rfc3484.html
Patch applied to APR: http://marc.info/?l=apr-dev&m=105836879006735&w=2

But also note this not always having been effective: http://sources.redhat.com/ml/glibc-bugs/2007-06/msg00014.html
Comment 15 Armin Berres 2008-05-27 00:48:29 UTC
So people, we got news from #kde-devel.
It would be cool if somoone could try the following workaround for crappy crappy routers:
Find 'hints.ai_family = PF_UNSPEC;' in src/network/kernel/qhostinfo_unix.cpp of your Qt copy and add the following in the next line: 'hints.ai_flags = AI_ADDRCONFIG;'.
Afterwards rebuild libQtNetwork. If you disable ipv6 for your box now (add 'alias net-pf-10 off' in /etc/modprobe.d/aliases on Debian systems e.g.) and restart your computer.
Now Qt shouldn't emit any ipv6 DNS lookups and your router shouldn't go crazy anymore.
thiago wanted me to try this, but didn't comment so far if this will go upstream (but I guess there are no reasons not to add it).

But anyway kio shouldn't make two lookups at once and cache dns requests... ;)
Comment 16 Thiago Macieira 2008-05-27 11:48:15 UTC
I'm applying that fix to Qt 4.4.1.
Comment 17 Roland Harnau 2008-05-27 14:31:55 UTC
After rereading run() and fromName() in QHostInfoAgent the supposed Qt bug dissolves and is reduced to the fact that a call to waitForConnected() almost always fails in aborting a host lookup query triggered by connectToHost(), so that fromName() and eventually getaddrinfo is called at the same time from different threads and blocks under certain circumstances. 

The proposed  hints.ai_flags = AI_ADDRCONFIG (or even AF_INET) solves the problem only partially, because  parallel calls to getaddrinfo with a nonexisting name like "mumbel.grumbel"  still block for a long time. So I think it is still better if TcpSlaveBase "serializes" calls to getaddrinfo, either by waiting for the connected() signal or by separating host lookup and connectToHost. After all, the Qt Demo Browser works fine with
hints.ai_flags = AF_UNSPEC. The weak spot is its habit to post duplicates on bugs.kde.org ...             
Comment 18 Thiago Macieira 2008-07-09 22:42:44 UTC
SVN commit 830140 by thiago:

Make IOSlaves based on TCPSlaveBase request DNS resolution via the
application. And make the application cache results for 5 minutes.

This should avoid the DNS request storm that happens when loading
webpages. Whereas this is completly normal and has been done for
years, apparently we're doing something different now that causes some
cheap routers to lock up or fail to respond.

Those defective routers should be replaced, but while they aren't, we
introduce a cache.

Patch by Roland Harneau <truthandprogress@googlemail.com>

BUG:162600
CCMAIL:<truthandprogress@googlemail.com>


 M  +1 -0      CMakeLists.txt  
 M  +2 -1      kio/global.h  
 A             kio/hostinfo.cpp   [License: LGPL]
 A             kio/hostinfo_p.h   [License: LGPL]
 M  +33 -0     kio/slavebase.cpp  
 M  +11 -0     kio/slavebase.h  
 M  +15 -0     kio/slaveinterface.cpp  
 M  +5 -2      kio/slaveinterface.h  
 M  +3 -1      kio/slaveinterface_p.h  
 M  +29 -17    kio/tcpslavebase.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=830140
Comment 19 usa 2008-07-11 23:10:35 UTC
like to reopen this bug.
It's inacceptable.

Every other browser works with the fritz boxes. Only konqueror 4 is ..
forget it.
Comment 20 Thiago Macieira 2008-07-12 00:45:59 UTC
Reopen? Why? Have you tested the patch I posted, from Roland? If it doesn't work, let us know!

Mind you that there's a big, recent DNS vulnerability found this week. This will require *most* DNS servers in the world to be replaced. You may want to start doing it now.
Comment 21 Thomas McGuire 2008-07-12 16:35:54 UTC
> Reopen? Why? Have you tested the patch I posted, from Roland? If it doesn't work, let us know!
Well, it didn't seem to work for me. I updated kdelibs, but still Konqueror was unusable for browsing because hostname resolving took so much time.

In the meantime, I've configured the DNS of my system to bypass my router and go directly to the DNS servers of my ISP, which fixes the problem.

If you want to, I can provide network traffic dumps with my old configuration.
Comment 22 Thiago Macieira 2008-07-12 16:45:32 UTC
If you have traffic dumps with the patch applied (KDE trunk, to-be-4.2, it's not in the 4.1 line *yet*), it will probably be useful.
Comment 23 Thomas McGuire 2008-07-12 20:37:40 UTC
*** Bug 155157 has been marked as a duplicate of this bug. ***
Comment 24 Thomas McGuire 2008-07-12 20:38:32 UTC
Reopening, as I can reproduce it on trunk in r831382.
Comment 25 Thomas McGuire 2008-07-12 20:43:54 UTC
Created attachment 26063 [details]
Traffic dump

This is a traffic dump of Konqueror trying to connect to www.test.de.
I only captured UDP port 53.
My kdelibs is trunk r831382, so it has the "DNS caching" already applied
(although I don't really see how caching helps, since 80% of the time you visit
a new webpage anyway)
Comment 26 Thomas McGuire 2008-07-13 12:36:11 UTC
*** Bug 166366 has been marked as a duplicate of this bug. ***
Comment 27 Robin Knapp 2008-07-13 18:20:59 UTC
I have a fritz box, too and suffering from this issue.

I just did a tcpdump and wondering, whether KDE or the router behaves incorrectly.

I'll attach the whole dump, here's a short partial summary:

Time      Src     Dst     Src P. Dst P. IP ID   DNS Trans ID

(
15.013162 client  router  27138  53     0x5dcb 	0x8f09
15.014707 client  router  6241   53     0x5dcc 	0xf83d
15.054842 router  client  53     27138  0x0284 	0xf83d
)
20.013119 client  router  24440  53     0x7154 	0x0279
20.016952 client  router  13523  53     0x7155 	0xdbe4
20.051192 router  client  53     24440  0x0285 	0xdbe4
25.009072 client  router  24440  53     0x7155 	0x0279
25.020018 client  router  13523  53     0x7156 	0xdbe4
25.057100 router  client  53     24440  0x0286 	0xdbe4

So you can see, that client requests twice on port 24440 with a different IP Identifier, but it also sends requests from port 13523 (maybe the other parallel running kio slave?) with the same IP Identifier.

Looking at the reply, it i.e. sends it to source port 24440 but uses the DNS transaction ID of the request which came from port 13523.

Maybe the router gets confused because both source ports use the same IP Identifier?
I'm not an networking expert, but something is wrong here...
Comment 28 Robin Knapp 2008-07-13 18:22:45 UTC
Created attachment 26086 [details]
complete verbose dump
Comment 29 Robin Knapp 2008-07-13 18:32:55 UTC
Created attachment 26087 [details]
dump in pcap format
Comment 30 Robin Knapp 2008-07-13 19:13:59 UTC
Me again ;)

I could reproduce the long delays without KDE as follows:

1. get example getai c-program from http://www.logix.cz/michal/devel/various/getaddrinfo.c
(I found this in google)
2. compile
gcc -o getaddrinfo getaddrinfo.c
3. call it parallel
./getaddrinfo google.com & ./getaddrinfo google.com

It takes 20 second to perform this lookup. Using to different hostnames returns immediately.

So it's definately a Bug in the Router.
It sends the reply to the udp port of process 1 but uses the DNS transaction ID of process 2. That surely doesn't work.

I've contacted AVM support with a detailed description. Let's see what they'll tell me (I hope they come up with a solution and not a stupid standard answer).
I have an older model (7050) which may not be supported any longer, so they might just say "buy a newer model".


Thanks for the dns cache which will make surfing faster and reduce DNS traffic, but this really seems to be a bug in the AVM Fritz Boxes.
Comment 31 Thiago Macieira 2008-07-13 19:39:56 UTC
Indeed, Robin, your router is replying to port 24440 the request of ID 0xdbe4 (56292). But that request came from port 13523.

That is definitely a router bug.

Also note that the entirety of the DNS handling is done by glibc. We have no control over source ports or request IDs.

I can't see anything in Thomas's log.
Comment 32 Alex 2008-07-13 20:01:07 UTC
I also own a Fritz!Box (7270) and have these issues but I can reproduce the problem in the vpn at the university so I don't think it depends especially on the fritz!box...

I will try to capture a log like Thomas' one.
Comment 33 Thiago Macieira 2008-07-13 20:08:31 UTC
A capture like Thomas's is much more difficult to read. Please attach a pcap file. (tcpdump -w /tmp/capture.pcap port 53)
Comment 34 Pierre Schmitz 2008-07-13 20:24:14 UTC
I have got the same issue with an 7170. But I use pdnsd as a local DNS cache; so I would assume that a built-in cache wouldn't help that much here.
Comment 35 Armin Berres 2008-07-13 20:41:21 UTC
Uhm, am i wrong, or did you all just rediscover, what was already mentioned in comment  #3?
The DNS-Storm should be gone with this patch, but the two DNS-lookups of one which will be terminated (see comment #8) is still there. This double dns lookup at the time is what kills these freaking routers.
Comment 36 Armin Berres 2008-07-13 20:56:10 UTC
WRT comment #34: Pdnsd can't help here, because there is nothing it could cache. The ipv6 lookup fails, so it has nothing in the cache and will forward all and every request directly to the router.
Comment 37 Thiago Macieira 2008-07-13 21:56:59 UTC
The DNS cache patch makes the double lookup also disappear. As you can see in http://websvn.kde.org/trunk/KDE/kdelibs/kio/kio/tcpslavebase.cpp?r1=830140&r2=830139&pathrev=830140, we no longer call connectToHost with a hostname, but with each of the IPs looked up.
Comment 38 Robin Knapp 2008-07-13 22:34:33 UTC
@#35, well I tried to find out if KDE might do something wrong which may confuse the router. I found the duplicate IP Identification and thought this might have caused the confusion. But the getaddrinfo procedure I described does not produce this duplicate IP Identification but still triggers the bug in the router. So it's verified that the router really is buggy.

I'd like to test the dns cache patch but don't have time to compile KDE from svn. I'll have to wait for some updated opensuse RPMs...
Comment 39 Roland Harnau 2008-07-13 22:43:50 UTC
The patch prevents parallel resolutions of the same name (with glibc's getaddrinfo() a "lookup"  in the typical case (existing A record but no AAAA record) consists of two AAAA and one A request)), but allows concurrent resolutions of different names. My router (AVM fritzbox WLAN 3030) can handle the last case.

To prevent concurrent lookups in general try the following:  
In kdelibs/kio/kio/slaveinterface.cpp  insert the line

#include <QtNetwork/QHostInfo>

and change line  343 from

HostInfo::lookupHost(hostName, this, SLOT(slotHostInfo(QHostInfo)));

to

QHostInfo::lookupHost(hostName, this, SLOT(slotHostInfo(QHostInfo)));

      
Comment 40 usa 2008-07-19 10:29:09 UTC
fritz boxes are working.

No browser shows any problems with them.
The problem is konqueror, kio.. kde.


Btw, the Fritz boxes are possibly the most popular routers in Germany.
Comment 41 Thomas McGuire 2008-07-19 12:25:03 UTC
Created attachment 26256 [details]
Traffic dump with tcpdump while trying to connect to www.test.de with Konqueor4
Comment 42 Thomas McGuire 2008-07-19 12:25:45 UTC
Created attachment 26257 [details]
Traffic dump with tcpdump while trying to connect to www.test.de with Konqueor3
Comment 43 Thomas McGuire 2008-07-19 12:29:13 UTC
> To prevent concurrent lookups in general try the following:
> In kdelibs/kio/kio/slaveinterface.cpp  insert the line 
> [SNIP]
This didn't make a difference for me. Hostname lookups are still slow with current kdelibs from trunk, with or without that change.

For the record, my router is a D-Link DSL-G664T.

Note that the webkit demo browser from qt-copy also has the same problem, and KDE3 doesn't have this problem at all.
Seems to be a change in Qt which triggered this new behavior of host lookups.
Comment 44 Harri Porten 2008-07-19 14:39:51 UTC
Regarding router brands: I am using a Netgear WGR614v5. Given the variety of models being used I am careful not to blame the problem on a router bug. It must be rather the software on my system - or maybe my ISPs DNS which is said to be bad.

But things have changed for me suddenly: since yesterday my connection problems are gone! What did I do? Unfortunately too many things at the same time. Did the first "svn up" of kdelibs since day and got lots of new packages through "apt-get upgrade" from Debian testing. Previously I had already applied the qhostinfo_unix.cpp patch and turned on IPv6 support which seemed to improve things a little bit. And I believe I had already been using the DNS cache patch but didn't notice a difference (could be wrong).
Comment 45 Thiago Macieira 2008-07-19 15:26:17 UTC
@Thomas: your system is still sending IPv6 requests. Either you have IPv6 addresses in your machine or you haven't patched Qt to not send the requests. Aside from that, there's nothing wrong with your trace.

The entire exchange of your Konqueror paste is less than 30 seconds. There's absolutely nothing wrong with it, neither from Konqueror's side nor from your router.

It does lookup twice, however.

@comment 40 (usa): please see comment 30.
Comment 46 Roland Harnau 2008-07-20 01:59:47 UTC
@Thomas: the suggested change from KIO::HostInfo::lookupHost to QHostInfo::lookupHost should exactly mimic the behavior of Qt's WebKit showcase regarding DNS requests. But if the latter fails this is of course pointless. Your Konqui4 traffic dump is somewhat puzzling, e.g. No. 1 und 3 come from the same port (1116) but request the resolution of different names (www.test.de and www.test.de.site) which is not typical for getaddrinfo on glibc-based Linux. Are you behind a firewall? Anyway, you should follow Thiago's suggestion, i.e. patch Qt according to #15 and disable IPv6 in your system.

@Harri: The AVM Fritzbox router are definitely buggy, they can't handle parallel requests to resolve the same name, and Thomas' D-Link seems to have problems with IPv6 queries in general. Apparently every vendor has a unique way to implement crappiness.

       


               
Comment 47 Thomas McGuire 2008-07-20 20:02:34 UTC
Patching qt-copy as described in comment #15 fixed this issue for me.

Thiago and Roland: Sorry for stealing your time. I thought the KIO patch alone would fix the problem and didn't see comment #15 before.
Comment 48 Pierre Schmitz 2008-07-22 20:51:15 UTC
Just for your information: The patch referenced by Comment #18 From Thiago Macieira will introduce problems with ssl authentication. See Bug #167166
Comment 49 Kevin Kofler 2008-08-09 07:17:46 UTC
Any chance this (and this: http://websvn.kde.org/?view=rev&revision=832072 which fixes the regression) can be backported for 4.1.1?
Comment 50 Thiago Macieira 2008-08-09 09:28:43 UTC
Not yet. It's not without issues, so we have to work on it a little bit more.
Comment 51 Thomas McGuire 2008-08-16 17:01:25 UTC
*** Bug 168921 has been marked as a duplicate of this bug. ***
Comment 52 Thomas Schuetz 2008-08-30 10:00:49 UTC
The bug still exists in kde 4.1.1 in the normal arch-packages (not kdemod).
Comment 53 Thiago Macieira 2008-08-30 10:27:40 UTC
The bug is not closed. You don't have to tell us it still exists: we're not claiming otherwise.
Comment 54 Becheru Petru-Ioan 2008-09-10 16:33:29 UTC
I have installed privoxy on my home(default settings) pc and i changed konqueror settings to use it. Seems to make it connect faster to the websites. Still slower than ff.
Comment 55 Moritz Moeller-Herrmann 2008-09-11 21:44:57 UTC
The bug is still present in KDE-4.1.1 (amd64, precompiled debian experimental packages). I am using a fritzbox. No program besides KDE4 is affected.
Comment 56 Thiago Macieira 2008-09-11 22:10:31 UTC
Comment 55: this bug is still open, so yeah we are aware that it's still present.
Comment 57 Thiago Macieira 2008-09-11 22:22:14 UTC
This just occurred to me:

Has anyone contacted the manufacturer of the defective hardware and reported the issue?

If no one does, they'll never know they have a bug to fix.
Comment 58 Pavel Zheltobryukhov 2008-09-15 15:45:48 UTC
I send a problem description to ASUS Technical Support, because I have such issue with my ASUS 6020. I got an answer on my native language, so I try to reproduce here this answer in English

"Hello! Thank you for call to ASUS Technical Support Service

The ASUS Company doesn't support OS Linux. There is no such problem with computers, there OS Windows was installed"

Very strange answer! Because ASUS 6020 is Linux-embedded device, AFAIK.
I think that ASUS Russian support team is not competent in this questions, so they gave me so stupid answer.

Who can get more?

Comment 59 Kevin Kofler 2008-09-15 16:29:30 UTC
Maybe try to politely remind them they're selling GNU/Linux machines themselves, in particular some editions of the eeePC...
Comment 60 Pavel Zheltobryukhov 2008-09-16 19:48:59 UTC
I wrote 2nd request to ASUS - now in English. The answer came from Russian office again. Of course, in Russian language. I note them about Linux on ASUS eePC 7xx/9xx. The answer was: (translate from russian again)

"Due numerous amount of Linux distro's, The ASUS Company support that users only, who use Linux distro developed by ASUS only and installed on computer, manufactured by ASUS only"

The problems of Indians doesn't disturb the Sheriff, isn't it?

I shall no buy any ASUS notebook in near or far future, eePC Linux doesn't include KDE4, so I don't know, how I can call ASUS support again?

Anybody install KDE4 on Windows? Does this bug present in this OS?

Comment 61 Thomas McGuire 2008-09-18 18:23:25 UTC
*** Bug 171230 has been marked as a duplicate of this bug. ***
Comment 62 Sam 2008-09-23 02:39:41 UTC
Come on guys. Be realistic!
It can't be a problem of the hardware. The fact that EVERYTHING, except some KDE apps, is working shows that it has to be a problem of kde or qt.

Windows: Everything works
Firefox: No problems
Konqueror in KDE3.x: No problems

By the way: Konqueror in Kde4 was always very very slow on my Pc. But since a couple of weeks it does't even connect to any website. 
No idea what it could be. Tell me what additional infos you need.
Kmail is only connecting sporadically connecting to the pop server, too.
The same for Khotnewstuff. 
Comment 63 Kevin Kofler 2008-09-23 02:41:55 UTC
KDE 4 with other hardware works fine too. It's only the combination which doesn't work, and all evidence points to the hardware being at fault.
Comment 64 Sam 2008-09-23 03:04:13 UTC
Ok I see. It could be the combination. But i think it is rather the software than the hardware. But ok... You are the real experts. But even if it really is a hardware problem... We can sit and wait for centuries, waiting for avm to fix that bug (I bet they will never do). Or change the way the kde4 apps are accessing the internet. I'm sure there is a proper way, because all the other apps can do it, also with the bug in the hardware.
Comment 65 Thiago Macieira 2008-09-23 03:20:15 UTC
If you know what we should change, tell us. We don't know yet (or we'd have fixed this bug a long time ago).
Comment 66 Thiago Macieira 2008-09-23 03:27:32 UTC
Ok, cranky me. Please ignore comment #65. Read instead as follows:

We understand that the hardware getting fixed is a long-shot. We are willing to change the KDE code to make it not trigger the bug on the faulty hardware. However, we don't know yet what we're doing that is making the router go nuts.

So we also don't know what we should change to resolve the issue. That's why this bug is still open. Once we do know what we should change, we will change, and close the report.
Comment 67 Thomas McGuire 2008-09-23 19:42:56 UTC
> If you know what we should change, tell us. We don't know yet (or we'd have fixed this bug a long time ago).

Does modifying Qt like described in comment #15 not help everyone here? Have those who still comment here tried that? It at least works like a charm for me.
Comment 68 Thiago Macieira 2008-09-23 21:31:56 UTC
The change from comment #15 was made permanent to Qt 4.4.1 and 4.4.2. Since people are still complaining, I believe the bug hasn't been fixed.
Comment 69 Sam 2008-09-24 12:29:51 UTC
I'm sorry. I got you wrong. I thought you don't want to change the code since it's a bug of the hardware, because the bug is there for such a long time.

Do you think the other things mentioned in comment #62 belong to the same bug?
Comment 70 Thomas McGuire 2008-09-24 12:40:12 UTC
> Do you think the other things mentioned in comment #62 belong to the same bug?

Yes, seems to be exactly the same problem.
Comment 71 Sam 2008-09-24 13:33:06 UTC
*** Bug 168619 has been marked as a duplicate of this bug. ***
Comment 72 Roger Larsson 2008-09-25 02:51:54 UTC
1) I have read the RFC in question RFC1035
   http://tools.ietf.org/html/rfc1035

4.1.1. Header section format
- - -
ID              A 16 bit identifier assigned by the program that
                generates any kind of query.  This identifier is copied
                the corresponding reply and can be used by the requester
                to match up replies to outstanding queries.

I think the router problem is that it uses ID (maybe together with IP) to track where the future answer should be forwarded.

2) glibc/kernel or... fills in this field. But not randomly enough - two programs/threads that opens a port and sends this request at the same time often gets the same IDs! (seeded in the same way - with time?)
- on my dual core it is almost 100% of the time. [but I do not get the timeouts, as I use my internet providers DNS directly]

If this analyze is correct, fixing all routers is not possible. Then it will be difficult to fix in KDE alone. (But letting one program do all queries should help, caching or not). Standard tools like bind might be possible to configure to do this - adding one working process in front of non working router.
Comment 73 m.wege 2008-09-25 20:24:47 UTC
Hi, I just stepped on this bug, because I was reading userbase. I am do not have the problem, since KDE4 is not (yet) running on my system. But I would really recommend that someone who can explain in detail the problem contacts AVM, the company behind Fritz!Box-routers. They have a very good and competent phone and email support. The company is known for supporting Linux and they provide frequent updates and even test builds for router improvements. So if the problem is on their side, I guess the will help to fix it soon.
Comment 74 Thomas Schuetz 2008-09-27 21:39:15 UTC
Okay, I did it, I contacted the AVM-support and talked to them about this problem. If I get any news from them I will inform you
Comment 75 Thomas Schuetz 2008-10-09 12:50:26 UTC
I need your help! The AVM-support asks if the problem exists with actual fritzboxes (7170, 7270) and the actual firmware (xx.04.59) Could somebody test this and post the result? Thanks!
Comment 76 Kevin Kofler 2008-10-09 15:14:46 UTC
Just to clarify: you mean "current" ("aktuell" in German).
"actual" in English means "eigentlich" or "tatsächlich", that's not what was meant here.
Comment 77 Sebastian Koerner 2008-10-09 22:18:21 UTC
To Thomas Schulz:
There is a related error "Unable to check multiple pop3 accounts at the s.."
http://bugs.kde.org/show_bug.cgi?id=166366

which definately occurs on
FRITZ!Box Fon WLAN 7141 (UI) 	Firmware-Version 40.04.59

while a local "bind" DNS server works fine.
So the answer is: Yes. Error occurs on .04.59..

Comment 78 Tobias Leupold 2008-10-12 22:16:39 UTC
Hi :-) I have read this after having installed KDE 4.1.2 on Gentoo. Some time ago, I also answered Bug #154774, which depends on this issue (and the problem still exists, as already mentionned above)

I think it's okay you guys don't want to produce some crappy code to make things work with the Fritz-Boxes when it's a Fritz-Box bug. But a lot of people (including me) do use those routers (I have a FRITZ!Box WLAN 3030). And even if the AVM guys fix the problem, I bet, this will only happen with recent versions. There has been no firmware update for my Router for two years or so and not everyone wants to buy a new router because Konqueror won't work – especially because every single other browser does work fine. And if this also affects KMail, Akregator, Kopete, etc. (I haven't tested it), this _will_ be a reason simply not to use KDE 4 for me and many others, as it's just not useable.

I hope there will be some solution for this, as KDE 4 is really great work.
Comment 79 Matthias Raffelsieper 2008-10-13 20:45:02 UTC
Answer to Comment #75:
Yes, this problem exists on my fritz box 7170 and firmware version 29.04.59.

If required, I can provide tcpdumps that shows how the fritz box mixes the DNS transaction IDs and the ports, which my provider's DNS server doesn't.
Comment 80 Alex 2008-10-19 12:37:57 UTC
The problem exists on FritzBox 7270 with firmware 54.04.63-12365, too.

The FritzBoxes are Linux-powered and I have full ssh access to mine, so is there anything I can do on the machine itself?
I'm no network-expert so could someone explain to me the meaning of "bug" in this case? I mean is it a misconfigured DNS-daemon on the box or a buggy kernel or something completely different?
Comment 81 Thomas Schuetz 2008-10-23 10:43:17 UTC
I mailed a lot with AVM the company behind the fritzboxes. They told me that the problem is with IPv6-DNS-requests. The fritzboxes can't handle them so they just pass them through to the external DNS-server. If the external server has no IPv6-IP or can't handle IPv6-requests it gives only an reply that it got the request. The fritboxes pass them through, too, you can see them with wireshark.
You get those replies with the fritzbox as DNS or with a direct DNS in the internet, but konqueror doesn't react if they come from the fritzbox and runs into timeout instead of doing an IPv4-request.
AVM said, it must be KDE, because they give the correct DNS-answer, they just pass them through. I don't know enough about those things, so, is he correct, or is there something else, I could tell him?
(Thiago: I tried to talk to you about it, but it was difficult to be at the computer at the same time and I had connectivity-problems)
Comment 82 Thiago Macieira 2008-10-23 14:19:03 UTC
Do you have that wireshark trace? If so, please attach the packet capture file to this bug report.

Also, please try turning IPv6 off in your machine. If you don't have an IPv6 connection, you probably don't want it on at all. We have code to avoid sending IPv6 requests if IPv6 isn't active. See if that solves the problem.
Comment 83 Matthias Raffelsieper 2008-10-23 14:44:20 UTC
Created attachment 28093 [details]
pcap trace of the FB 7170, fw 29.04.59, AAAA query

I attached the trace from my Fritz!Box 7170 with firmware 29.04.59. This is konqueror doing a request to www.heise.de, where I only kept the first few requests that show the problem. The following requests had the same issue from time to time. Note that my NIC does Checksum offloading, since the packets do get into the internet... ;-)

Packets 7, 8, and 9 never get answered correctly. The response in packet 11 however goes to port 49059 (which was used in packet 7) with transaction ID 0xd9d5, which is the ID from packet 9 (port 50730).
Comment 84 Alex 2008-10-23 19:54:11 UTC
I can confirm that disabling IPv6 in the kernel completely solves the problem (at least for me) :)
Comment 85 Thomas Schuetz 2008-10-23 19:58:28 UTC
Created attachment 28100 [details]
DNS with FritzBox as DNS-Proxy
Comment 86 Thomas Schuetz 2008-10-23 19:59:38 UTC
Created attachment 28101 [details]
Direct DNS-request to an DNS-Server in the internet
Comment 87 Thomas Schuetz 2008-10-23 20:00:12 UTC
Created attachment 28102 [details]
DNS-request with IPv6 disabled
Comment 88 Thomas Schuetz 2008-10-23 20:02:43 UTC
Okay, I added three wireshark-files, the normal behavior with fritzbox as a DNS-Server, direct request to an DNS-server in the internet and a request with disabled ipv6. Disabling IPv6 seems to solve the problem.
Comment 89 Alex 2008-10-24 10:41:15 UTC
I can confirm that disabling IPv6 fixes the problem, which was likely to be discovered :)
Comment 90 Thomas Schuetz 2008-10-25 15:23:25 UTC
Okay, I got another answer from AVM. The devs there think the same thing what is mentioned in comment #4. The 2 requests are following too fast after each other and the fritzboxes only answer to one of them because the other one is recognized as "retransmit". They look if it is possible to optimize the behaviour of the fritzboxes.
Comment 91 Armin Berres 2008-10-25 17:22:54 UTC
What's really bad is that the Fritzboxes send one mixed answer for both packages. So neither of them gets a proper answer.
It was comment #3 FWIW ;-)
Comment 92 m.wege 2008-11-08 17:20:24 UTC
So how about providing a) a gui option or deactivating IPv6 and b) an error message leading to this config. So this temporary "fix" could be part of KDE 4.2 and normal users could deal with it easily.
Comment 93 Armin Berres 2008-11-08 18:37:15 UTC
Amybe I'm mistaken, but it's not that easy to deactivate IPv6 from within KDE. You have to tell the Kernel about not loading the respective module.
And another problem: How can KDE detect it is running into exactly this problems? There are various other reasons why DNS lookups could take some time. But maybe one should document this issue at some prominent place.
Comment 94 David Faure 2008-12-03 22:43:53 UTC
*** Bug 176576 has been marked as a duplicate of this bug. ***
Comment 95 Thomas Schuetz 2008-12-21 13:57:32 UTC
I got a mail from AVM they closed my bugreport there. They will not change the behaviour of their FritzBoxes. They say it is a retransmit if a second DNS-request is sent during the answer of the first one. Firefox could handle it that there is only one answer for two requests, they say, that is enough in their opinion, because the DNS-request is answered.

Original german mail-text:
Abschließend: Es wird in FRITZ!Box Fon WLAN 7050 keine Änderung diesbezgl.
geben.
Als ein Retransmit wird es in FRITZ!Box angesehen, wenn während einer
DNS-Antwort aus dem Internet noch ein 2. DNS-Request aus dem LAN kommt.
Der Firefox kommt damit klar, dass er nur auf einen von zwei Requests eine
Antwort bekommt.
Das sollte nach unserer Meinung auch reichen. Die DNS-Info liegt ja dann
vor.
Comment 96 usa 2008-12-22 15:59:21 UTC
ob(In reply to comment #95)
> I got a mail from AVM they closed my bugreport there. They will not change the
> behaviour of their FritzBoxes. 

Of course, it's a KDE-only bug.
Fritz boxes are solid workers.
Comment 97 Robin Knapp 2009-01-07 13:00:17 UTC
(In reply to comment #96)
> Of course, it's a KDE-only bug.
> Fritz boxes are solid workers.

No, it also happens outside KDE, for example running: "getent kde.org & getent kde.org" is enough to trigger this bug.

(In reply to comment #95)
> Firefox could handle it
> that there is only one answer for two requests, they say, that is enough in
> their opinion, because the DNS-request is answered.

They don't seem to understand that the fritzbox mixes data from both requests; see comment #3, which is definately a bug (imho)
Comment 98 Robin Knapp 2009-01-07 13:03:29 UTC
(In reply to comment #97)
> (In reply to comment #96)
> > Of course, it's a KDE-only bug.
> > Fritz boxes are solid workers.
> 
> No, it also happens outside KDE, for example running: "getent kde.org & getent
> kde.org" is enough to trigger this bug.
> 

Sorry for the traffic, meant "getent hosts kde.org & getent hosts kde.org"
Comment 99 Daniel Winter 2009-01-13 00:23:57 UTC
Well, it seems if some mac os x update triggers the same problem in safari and others..  

AVM got informed by a dns problem in their boxes by the german computer magazine c't and well AVM fixed the dns issue. For me it seems very likely that it i s the same bug in those boxes which is causing this bug.  (although AVM always said when asked about this bug, that is not a bug). Not as Macs are hitten, they fixed it. Firmeware update should be out soon.

There the (german) article about it:

http://www.heise.de/newsticker/Fritz-Box-bremst-Mac--/meldung/121555

If someone can test the new firmeware and see if it solves the issue, please  report.
Comment 100 Robin Knapp 2009-01-13 20:18:23 UTC
(In reply to comment #99)

> There the (german) article about it:
> 
> http://www.heise.de/newsticker/Fritz-Box-bremst-Mac--/meldung/121555
> 
> If someone can test the new firmeware and see if it solves the issue, please 
> report.

Thanks for this information, I overlooked it (I'm a regular reader of heise.de)

Unfortunately, I have a FritzBox 7050 and doubt that there will be an update for it...
Comment 101 Karsten König 2009-01-14 15:58:14 UTC
According to forum posts this is fixed in the Fritz!Box beta firmware release:

http://www.avm.de/de/Service/Service-Portale/Service-Portal/Labor/labor.php
(Only for 7270)

Could someone with 7270 please test status with the beta firmware?
Comment 102 m.wege 2009-01-17 18:46:41 UTC
@Robin #100: I would recommend to write to support of AVM. They are very friendly people. They don't support older boxes with new features. But if there is a known bug, I believe they provide an update. Just be friendly, describe the problem and refer to the heise report and this bug.
Comment 103 m.wege 2009-01-22 15:35:10 UTC
AVM support wrote to me that they are working on a bug fix firmware. This will be published on their site. So it would be nice, if anyone who notices a new firmware there puts a note here.
Seems like this bug can be closed, since it is not a problem of KDE anymore.
Comment 104 Armin Berres 2009-01-22 15:41:41 UTC
(In reply to comment #103)
> Seems like this bug can be closed, since it is not a problem of KDE anymore.

From my point of view KDE should try anyway not to fire an DNS storm at the servers.
Comment 105 Roland Harnau 2009-01-28 20:10:08 UTC
(In reply to comment #104)
> (In reply to comment #103)
> > Seems like this bug can be closed, since it is not a problem of KDE anymore.
> 
> From my point of view KDE should try anyway not to fire an DNS storm at the
> servers.

It doesn't since 4.2. This bug is not even a problem for KDE anymore and can really be closed. Thiago?
Comment 106 Andreas Hartmetz 2009-02-02 00:28:31 UTC
As stated by Roland Harnau, this one shold be fixed in 4.2 and trunk.
Comment 107 m.wege 2009-02-04 12:28:12 UTC
There is a new firmware @ www.avm.de/labor . It does not say if the bugfix is in it, but I assume this is the case. I can not try myself at the moment, but may be others can try and if the bugs still exists, can give this feedback to avm.
Comment 108 m.wege 2009-02-10 11:23:50 UTC
I have installed the new firmware and it seems like the bug is fixed there, but I am not 100% sure. Just when I wrote everything is fine, it took half a minute to display this bug page. But this may be a server problem. Now people with older versions of the fritz box should ask AVM to release bug fixed firmwares for them too.
Comment 109 Karsten König 2009-02-10 11:45:46 UTC
You can test it with:
'host test.de & host test.de &'

If it is not fixed it will spawn
';; Warning: ID mismatch: expected ID Y, got X'
(X != Y obviously)

Would be happy if it is fixed and ask AVM for a backport to 7050 as the bug has been reported while 7050 was still supported.

And btw., there also is a new 7170 beta Firmware, so you guys can be happy now to I hope =)
Comment 110 m.wege 2009-02-10 11:51:26 UTC
It works. BTW tested with the 7170 firmware.
Comment 111 Jochen 2009-02-10 19:50:05 UTC
Why are there two dns queries for the same hostname in parallel? Which process is responsible for this? Is there no local dns cache?
You can blame AVM for this bug in their routers, but this situation is really unwanted. You should avoid avoid this by using a local dns cache. Even f$%&%/() Windows has such a cache!
Comment 112 Thiago Macieira 2009-02-10 20:28:29 UTC
Local caching is not always enabled. Don't blame us.
Comment 113 Jaime Torres 2009-02-22 11:44:34 UTC
*** Bug 171230 has been marked as a duplicate of this bug. ***
Comment 114 Dawit Alemayehu 2011-04-19 23:47:55 UTC
I am re-opening this bug because the fix committed to TCPSlaveBase in comment #18 as a "workaround for broken routers" makes no sense for
several reasons.

#1. The single most important problem with the "workaround fix" is that it causes TCPSlaveBase to do a DNS lookup of request host names even
when using a proxy server. This causes bug reports such as https://bugs.kde.org/show_bug.cgi?id=207550.

This is also the cause of tunneled proxy connections, aka https over http proxy using CONNECT, requests that originate from KDE based
application always use IP address instead of hostname in the CONNECT request, which is also not a desired behavior.

#2. There are many DNS queries that take place before a request even makes it to the TCPSlaveBase level. See my response to the bug report
mentioned above to understand what else might cause DNS queries. Hint: It is not KIO.

#3. How is it that TCPSlaveBase gets a "workaround fix" when QAbstractSocket aborts a name lookup that was started by another one of its own
member functions, connectToHost ? What TCPSlaveBase::connectToHost used to do is exactly the same thing
KSocketFactory::synchronousConnectToHost does now! I just do not see why the issue is worked around in one location and not the other !?!?
That means if this bug was truly caused by what was described in comment #8, then kio_ftp should a victim to this bug today since it uses
KSocketFactory::synchronousConnectToHost.

Anyhow, this needs to be fixed another way. If waitForConnected should not be called before the connectToHost has completed its host lookup,
then we need to find a workaround for that specific issue, but not by making TCPSlaveBase perform a name look up.
Comment 115 Thiago Macieira 2011-04-20 03:47:27 UTC
This bug can no longer be tested, since the broken routers that were the source of the report have since got firmware upgrades (when Safari started suffering from the same problem).

Also, I believe that the sending-of-IP-addresses-in-proxy problem was fixed several years ago too.
Comment 116 Dawit Alemayehu 2011-04-20 05:30:50 UTC
(In reply to comment #115)
> This bug can no longer be tested, since the broken routers that were the source
> of the report have since got firmware upgrades (when Safari started suffering
> from the same problem).

Well that may be true, but unfortunately the ramifications of the workaround that was committed to TCPSlaveBase is still around wrecking havoc today.

> Also, I believe that the sending-of-IP-addresses-in-proxy problem was fixed
> several years ago too.

It might be fixed in Qt's networking classes, but because the aforementioned workaround commit the sending-of-IP-address-in-proxy is alive and well in KDE. But don't take my word for it, set up HTTPS proxy in KDE to a proxy server, and browse to an SSL site. Look at the log file of the proxy server and you will clearly see that the resulting CONNECT message contains an IP address and not a host name.

And then there is the matter of of using QAbstractSocket::waitForConnected. I still do not comprhend why it aborts host name lookup in progress just to turn around and to the same lookup in a blocking mode. Perhaps that is done to make that function a synchronous function at a cost of duplicate host name lookups ?? 

Anyhow, that is not its only problem. Unlike the QAbstractSocket::connectToHostImplementation function, waitForConnected seems not to even  bother with optimizing for the case where the supplied host name is actually an IP address. Instead it seems to perform a blind blocking lookup. Of course that causes unnecessary reverse lookup.
Comment 117 Dawit Alemayehu 2011-04-20 05:33:34 UTC
(In reply to comment #115)
> This bug can no longer be tested, since the broken routers that were the source
> of the report have since got firmware upgrades (when Safari started suffering
> from the same problem).

Well that may be true, but unfortunately the ramifications of the workaround that was committed to TCPSlaveBase is still around wrecking havoc today.

> Also, I believe that the sending-of-IP-addresses-in-proxy problem was fixed
> several years ago too.

It might be fixed in Qt's networking classes, but because the aforementioned workaround commit the sending-of-IP-address-in-proxy is alive and well in KDE. But don't take my word for it, set up HTTPS proxy in KDE to a proxy server, and browse to an SSL site. Look at the log file of the proxy server and you will clearly see that the resulting CONNECT message contains an IP address and not a host name.

And then there is the matter of using QAbstractSocket::waitForConnected. I still do not comprehend why it aborts a host name lookup in progress just to turn around and do the same lookup in a blocking mode. Was that done to make the function a synchronous function ? Anyhow, that is not its only problem. Unlike the QAbstractSocket::connectToHostImplementation function, waitForConnected seems not to even bother with optimizing for the case where the supplied host name is actually an IP address. Instead it seems to blindly perform a blocking lookup. Of course that results in an very unnecessary reverse lookup.
Comment 118 Thiago Macieira 2011-04-20 12:40:13 UTC
(In reply to comment #117)
> > Also, I believe that the sending-of-IP-addresses-in-proxy problem was fixed
> > several years ago too.
> 
> It might be fixed in Qt's networking classes, but because the aforementioned
> workaround commit the sending-of-IP-address-in-proxy is alive and well in KDE.

If that's the case, then the issue has regressed. I am 100% sure that I tried this with kio_http at one point and it worked. If that's the case, then it's also very likely that the very fix for this bug is the cause.

> And then there is the matter of using QAbstractSocket::waitForConnected. I
> still do not comprehend why it aborts a host name lookup in progress just to
> turn around and do the same lookup in a blocking mode. Was that done to make
> the function a synchronous function ? 

Yes. It needs to be fully synchronous and there's no QHostInfo::waitForFinished. So the only way of ensuring that the results get in without starting a nested event loop is to cancel the lookup and restart it.

With *any* sane caching DNS server, this makes absolutely no difference. The problem is when you get insane and braindead servers, like the Fritzboxes had.
Comment 119 Kevin Kofler 2011-04-20 16:53:12 UTC
> This bug can no longer be tested, since the broken routers that were the source
> of the report have since got firmware upgrades (when Safari started suffering
> from the same problem).

Then I suggest we just drop the workaround.
Comment 120 Thiago Macieira 2011-04-20 18:39:32 UTC
(In reply to comment #119)
> > This bug can no longer be tested, since the broken routers that were the source
> > of the report have since got firmware upgrades (when Safari started suffering
> > from the same problem).
> 
> Then I suggest we just drop the workaround.

Note that the workaround does introduce some interesting functionality. It provides some level of DNS pinning.
Comment 121 Dawit Alemayehu 2011-04-20 21:11:42 UTC
(In reply to comment #120)
> (In reply to comment #119)
> > > This bug can no longer be tested, since the broken routers that were the source
> > > of the report have since got firmware upgrades (when Safari started suffering
> > > from the same problem).
> > 
> > Then I suggest we just drop the workaround.
> 
> Note that the workaround does introduce some interesting functionality. It
> provides some level of DNS pinning.

But that is happening at the wrong location. If such functionality is interesting, then it should happen at the socket level, be it KTcpSocket or Q*Socket. TCPSlaveBase is too high level for performing any sort of name lookups for such purposes IMO.
Comment 122 Thiago Macieira 2011-04-20 21:32:57 UTC
(In reply to comment #121)
> (In reply to comment #120)
> > Note that the workaround does introduce some interesting functionality. It
> > provides some level of DNS pinning.
> 
> But that is happening at the wrong location. If such functionality is
> interesting, then it should happen at the socket level, be it KTcpSocket or
> Q*Socket. TCPSlaveBase is too high level for performing any sort of name
> lookups for such purposes IMO.

I disagree. In fact, I would even say that TCPSlaveBase is still not high enough.

DNS pinning should happen from the application/use layer. That is, from the HTML engine: all loads from a given address in the same page should come from the same RRset, even if the DNS result would have changed.

The socket level doesn't know what other sockets are in use, so it doesn't know how it should apply the pinning.

That said, QHostInfo does implement a 5-minute cache these days, so the same level of pinning that this workaround afforded will be kept. What it won't keep is the pinning across slaves: two kio_http launched for the same address will still do two DNS queries and could end up with different results.
Comment 123 Dawit Alemayehu 2011-04-20 22:20:47 UTC
(In reply to comment #122)
> (In reply to comment #121)
> > (In reply to comment #120)
> > > Note that the workaround does introduce some interesting functionality. It
> > > provides some level of DNS pinning.
> > 
> > But that is happening at the wrong location. If such functionality is
> > interesting, then it should happen at the socket level, be it KTcpSocket or
> > Q*Socket. TCPSlaveBase is too high level for performing any sort of name
> > lookups for such purposes IMO.
> 
> I disagree. In fact, I would even say that TCPSlaveBase is still not high
> enough.
> 
> DNS pinning should happen from the application/use layer. That is, from the
> HTML engine: all loads from a given address in the same page should come from
> the same RRset, even if the DNS result would have changed.

Well there is already such a feature in both KHTML & KWebkitPart under the ospesis of DNS Prefetching. Granted that information is not shared/ used by the socket class and I am unsure whether or not the engines automatically use prefetched IP address if that functionality is enabled. 

However, to me all the idea of DNS pinning and/or prefetching will only work correctly as intended when the entire stack shares the same DNS caching mechanism much like the 3rd party DNS caches available in Linux. That way everything, not just KDE application, gets to benefit from using the cached 

> The socket level doesn't know what other sockets are in use, so it doesn't know
> how it should apply the pinning.
> 
> That said, QHostInfo does implement a 5-minute cache these days, so the same
> level of pinning that this workaround afforded will be kept. What it won't keep
> is the pinning across slaves: two kio_http launched for the same address will
> still do two DNS queries and could end up with different results.

But that is currently happening at the cost of user's privacy and/or security when using proxies. And I say that because unlike other places where lookups do occur, this lookup cannot be disabled.
Comment 124 Dawit Alemayehu 2011-04-20 23:59:50 UTC
(In reply to comment #118)
> (In reply to comment #117)
> > > Also, I believe that the sending-of-IP-addresses-in-proxy problem was fixed
> > > several years ago too.
> >
> > It might be fixed in Qt's networking classes, but because the aforementioned
> > workaround commit the sending-of-IP-address-in-proxy is alive and well in KDE.
>
> If that's the case, then the issue has regressed. I am 100% sure that I tried
> this with kio_http at one point and it worked. If that's the case, then it's
> also very likely that the very fix for this bug is the cause.

It is hard for me to see where the regression could have occured. The code that was committed as a workaround does exactly what it was intended to do. Resolve the host name and use the ip address when connecting to the server. That is very obvious from looking at TcpSlaveBase::connectToHost. As a result, in a https over http proxy connection (aka CONNECT), ip address will be used when constructing the CONNECT header because that is the only thing the Q*Socket classes have. So as far as I can tell, there is no regression there. Only the side effect of the work around.

> > And then there is the matter of using QAbstractSocket::waitForConnected. I
> > still do not comprehend why it aborts a host name lookup in progress just to
> > turn around and do the same lookup in a blocking mode. Was that done to make
> > the function a synchronous function ?
>
> Yes. It needs to be fully synchronous and there's no
> QHostInfo::waitForFinished. So the only way of ensuring that the results get in
> without starting a nested event loop is to cancel the lookup and restart it.

I figured as much. However, the question is what would the side effect or negative impact of using a local event loop be in case of TCPSlaveBase ? IOW, what potential problems would be encountered if one were to add the code below in between the calling d->socket.connectToHost and d->socket.waitForConnected:

       if (d->socket.state() == KTcpSocket::HostLookupState) {
           QEventLoop loop;
           QTimer timer;
           int elapsedTime = 0;
           timer.setInterval(500);
           timer.setSingleShot(true);
           QObject::connect (&timer, SIGNAL(timeout()), &loop, SLOT(quit()));
           Q_FOREVER {
              timer.start();
              loop.exec();
              if (d->socket.state() != KTcpSocket::HostLookupState || elapsedTime >= timeout)
                break;
              elapsedTime += timer.interval();
           }
       }

> With *any* sane caching DNS server, this makes absolutely no difference. The
> problem is when you get insane and braindead servers, like the Fritzboxes had.

Though I agree that what the Fritzboxes did was "insane", that very same argument could be leveled against what waitForConnected does. I do not think any developer that uses this API expects waitForConnected to do what it currently does. Perhaps a single note or some kind of heads up in the API documentation would have informed developers about such unexpected behavior. I can literally give you an example of where unexpected behavior of a function is the cause of password caching bug in KDE.

Anyhow, it would be nice if waitForConnected gets fixed so that it does not do a reverse lookup when the supplied host name is actually an IP address already.
Comment 125 Thiago Macieira 2011-04-21 00:43:09 UTC
(In reply to comment #124)
> (In reply to comment #118)
> > Yes. It needs to be fully synchronous and there's no
> > QHostInfo::waitForFinished. So the only way of ensuring that the results get in
> > without starting a nested event loop is to cancel the lookup and restart it.
> 
> I figured as much. However, the question is what would the side effect or
> negative impact of using a local event loop be in case of TCPSlaveBase ? IOW,
> what potential problems would be encountered if one were to add the code below
> in between the calling d->socket.connectToHost and d->socket.waitForConnected:
> 
>        if (d->socket.state() == KTcpSocket::HostLookupState) {
>            QEventLoop loop;

Let's stop here. I said "without starting a nested event loop" and the line above exists only to do exactly what I said mustn't be done.

Ask any proficient Qt developer and they'll tell you that nested event loops are evil and must be avoided. Having code like socket connections spin the event loop are really unexpected. Moreover, introducing an event loop where there was none is also potentially catastrophic.

> > With *any* sane caching DNS server, this makes absolutely no difference. The
> > problem is when you get insane and braindead servers, like the Fritzboxes had.
> 
> Though I agree that what the Fritzboxes did was "insane", that very same
> argument could be leveled against what waitForConnected does. I do not think
> any developer that uses this API expects waitForConnected to do what it
> currently does. Perhaps a single note or some kind of heads up in the API
> documentation would have informed developers about such unexpected behavior. I
> can literally give you an example of where unexpected behavior of a function is
> the cause of password caching bug in KDE.

I'm sorry, which behaviour? The behaviour of attempting a name lookup again? That's an implementation detail and completely irrelevant for the discussion, except for that it triggered a bug in the fritzboxes.

The bug was clearly in the fritzboxes, not in Qt code. That is not up for discussion.

Implementation details are just that. Application developers don't have to know them and they should never rely on them, for they may change. In fact, I think that the DNS caching functionality present in Qt 4.7 has changed this behaviour in many ways, including the fact that it may not execute a second query at all if the first one is running in a different thread.

> Anyhow, it would be nice if waitForConnected gets fixed so that it does not do
> a reverse lookup when the supplied host name is actually an IP address already.

Why? What's the consequence?
Comment 126 Dawit Alemayehu 2011-04-21 02:32:32 UTC
On Wed, Apr 20, 2011 at 6:43 PM, Thiago Macieira <thiago@kde.org> wrote:
> https://bugs.kde.org/show_bug.cgi?id=162600
>
>
>
>
>
> --- Comment #125 from Thiago Macieira <thiago kde org>  2011-04-21 00:43:09 ---
> (In reply to comment #124)
>> (In reply to comment #118)
>> > Yes. It needs to be fully synchronous and there's no
>> > QHostInfo::waitForFinished. So the only way of ensuring that the results get in
>> > without starting a nested event loop is to cancel the lookup and restart it.
>>
>> I figured as much. However, the question is what would the side effect or
>> negative impact of using a local event loop be in case of TCPSlaveBase ? IOW,
>> what potential problems would be encountered if one were to add the code below
>> in between the calling d->socket.connectToHost and d->socket.waitForConnected:
>>
>>        if (d->socket.state() == KTcpSocket::HostLookupState) {
>>            QEventLoop loop;
>
> Let's stop here. I said "without starting a nested event loop" and the line
> above exists only to do exactly what I said mustn't be done.

I know what you said. That is exactly why I asked whether doing this
would be detrimental to TCPSlaveBase. I was simply curious how adding
such a local loop would impact the use case of TCPSlaveBase which is
neither thread safe nor re-entrant. Neither does it connect to any
signals or emit signals itself. IOW it is completely isolated unto
itself. Regardless, it was a hypothetical question that required a
specific answer as to why it would be bad. I know nested event loops
in general are dangerous.

> Ask any proficient Qt developer and they'll tell you that nested event loops
> are evil and must be avoided. Having code like socket connections spin the
> event loop are really unexpected. Moreover, introducing an event loop where
> there was none is also potentially catastrophic.

>> > With *any* sane caching DNS server, this makes absolutely no difference. The
>> > problem is when you get insane and braindead servers, like the Fritzboxes had.
>>
>> Though I agree that what the Fritzboxes did was "insane", that very same
>> argument could be leveled against what waitForConnected does. I do not think
>> any developer that uses this API expects waitForConnected to do what it
>> currently does. Perhaps a single note or some kind of heads up in the API
>> documentation would have informed developers about such unexpected behavior. I
>> can literally give you an example of where unexpected behavior of a function is
>> the cause of password caching bug in KDE.
>
> I'm sorry, which behaviour? The behaviour of attempting a name lookup again?
> That's an implementation detail and completely irrelevant for the discussion,
> except for that it triggered a bug in the fritzboxes.
>
> The bug was clearly in the fritzboxes, not in Qt code. That is not up for
> discussion.
>
> Implementation details are just that. Application developers don't have to know
> them and they should never rely on them, for they may change. In fact, I think
> that the DNS caching functionality present in Qt 4.7 has changed this behaviour
> in many ways, including the fact that it may not execute a second query at all
> if the first one is running in a different thread.

Let me give you an example why I have issues about implementation
details. SlaveBase::openPasswordDialog in KDE, which I originally
wrote in KDE 2.x or early KDE 3.x days, was changed in KDE 3.1 to
automatically cache the password if the user checks the "Remember
password" checkbox and clicks OK in the dialog. Unfortunately, that
was not the purpose of the openPasswordDialog. It was designed to
simply prompt the user and return the result. Then any ioslave that
wanted to save the result would manually call another function called
SlaveBase::cacheAuthentication. There was a reason for this madness.
You do not want to cache an incorrect password ; so until the ioslave
can successfully login the password should not be cached at all.
Viola, the unexpected behavior change in the implementation caused
countless bug reports that has yet to be addressed. In my many years
of software development, I have seen unexpected behaviors that were
chocked up to "implementation details" cause many such hidden
headaches and bugs.

As a result I personally frown upon functions that purport to do
something, but behind the scenes do something unexpected simply
because that is an implementation detail, no matter how valid the
reason behind it might be. Anyhow, this really does not matter since
that is not the issue at hand.

>> Anyhow, it would be nice if waitForConnected gets fixed so that it does not do
>> a reverse lookup when the supplied host name is actually an IP address already.
>
> Why? What's the consequence?

Hmm... let me turn around and ask you the same question. What is the
point of doing a reverse name lookup at this point ? Specially since
the QAbstractSocket::connectToHostImplementation function in that same
class seems to specifically protect against this by using QHostAddress
to avoid looking up an ip address. Does
QAbstractSocket::waitForConnected need to look up the host name
associated with a given ip address ?

Regards,
Dawit A.
Comment 127 Thiago Macieira 2011-04-21 02:42:25 UTC
(In reply to comment #126)
> Hmm... let me turn around and ask you the same question. What is the
> point of doing a reverse name lookup at this point ? Specially since
> the QAbstractSocket::connectToHostImplementation function in that same
> class seems to specifically protect against this by using QHostAddress
> to avoid looking up an ip address. Does
> QAbstractSocket::waitForConnected need to look up the host name
> associated with a given ip address ?

Please file a Qt bug report about this. I thought you meant the behaviour that it might execute two name lookups.
Comment 128 Dawit Alemayehu 2011-04-21 03:58:15 UTC
(In reply to comment #127)
> (In reply to comment #126)
> > Hmm... let me turn around and ask you the same question. What is the
> > point of doing a reverse name lookup at this point ? Specially since
> > the QAbstractSocket::connectToHostImplementation function in that same
> > class seems to specifically protect against this by using QHostAddress
> > to avoid looking up an ip address. Does
> > QAbstractSocket::waitForConnected need to look up the host name
> > associated with a given ip address ?
> 
> Please file a Qt bug report about this. I thought you meant the behaviour that
> it might execute two name lookups.

http://bugreports.qt.nokia.com/browse/QTBUG-18881
Comment 129 Dawit Alemayehu 2011-05-14 16:42:14 UTC
I am going to commit a patch that will revert back the commit in comment #18. You can view the revert patch at https://git.reviewboard.kde.org/r/101338/.
Comment 130 Dawit Alemayehu 2011-05-20 03:45:13 UTC
Git commit 65aabc8c6df6d25fc35d06ad880ecdc9a2e43291 by Dawit Alemayehu.
Committed on 01/05/2011 at 17:46.
Pushed by adawit into branch 'master'.

Avoid resolving host names in TCPSlaveBase::connectToHost.

This basically reverts commit 79c4ed8a7c7fe18f4c1d02d5faba5e7a412f57ae which
was a workaround for bugs in hardware that was caused by QAbstractSocket's
potential propensity to perform multiple look ups when connectToHost and
waitForConnected are called successively.

BUG: 207550
BUG: 162600
REVIEW: 101338

M  +13   -29   kio/kio/tcpslavebase.cpp     

http://commits.kde.org/kdelibs/65aabc8c6df6d25fc35d06ad880ecdc9a2e43291