Version: 3.3.1 (using KDE KDE 3.3.1) Installed from: Debian testing/unstable Packages OS: Linux For example, I just had to reload five times to get this page. I kept getting "Unknown host bugs.kde.org". On the fifth reload it worked fine. It seems to happen in bursts. Once it starts happening it happens for a whole bunch of images on a page or for a page repeatedly every time i reload, until the problem subsides for a while. I suspect the problem is related to how konqueror spams the name servers with a whole ton of queries for every image on the page. It doesn't cache results even for a single page rendering. I suspect the name server is rate-limiting responses to a single requestor either because of a bug in bind or as a defense against DDOS. But I'm having trouble debugging this. I attached to konqueror with gdb and set breakpoints on gethostbyname and getaddrinfo expecting to be able to catch the failure to see what happened. But the breakpoint never triggered. How is konqueror managing to use the resolver without calling any resolver functions?
Having a similar problem even for pages with not so many images (e.g. www.google.de). Trying the same page with Firefox retrieves the page w/o any problemevery time, doing a "host" or "nslookup" gives no errors at all, trying the same with konqueror gives me a "An error occurred while loading http://www.google.com/: Unknown host www.google.de" about every 2-4 times. Same machine, same user, same time, same everything. This is with 3.3.2 Level "a" from the SuSE RPM's.
I can't reproduce.
Please reopen if you can reproduce with KDE 3.4 beta 2.
*** Bug 99254 has been marked as a duplicate of this bug. ***
I see the same thing both in KDE 3.3.2 and KDE 3.4. It especially happens when a page reloads itself (we have a network monitoring system, which refreshes the page every 30 seconds). If you create a page which reloads itself every 30 seconds you should see the error after a while. It seems random how quickly the bug shows up, but it usually happens a couple of times a day.
Is it possible to tell me what the DNS traffic was at the time the page failed to reload? I think the problem is in your DNS server and/or Internet connection.
I can set something up, but I won't have time for it until next week. Also it can take more than an hour for the problem to arise. I can gather the data, but I can't see any way to correlate the data with the time where the browser fails? Any ideas? By the way, I know that there is no AAAA record for the domain (I have read some reports where the cause was missing IPv6 network. This isn't the case here). However, normaly when I encounter it, it happens with an internal DNS-server on the LAN, which isn't loaded. When it happens, I can press reload, and the page appears, so it doesn't seem to be in periods like the original poster (I missed that when I wrote the first time). The problem started around version 3.3 (i think), and we have seen it on two different Gentoo installations, installed through emerge, and one Debian Woody installation, created with konstruct. It happens for different pages, but having a page which refreshes every 30 seconds seems to make it happen more often. A page like the one it usually happens on can be seen here: http://www.emdrupborg.dk/sysorb/index.cgi?path=1.1&tld=Connectivity&username=viewer&passwd=viewer&server=localhost:3241 But it also happens for other pages, even some without images.
I don't need a traffic dump (tcpdump -w). A copy & paste of tcpdump's normal output will suffice (tcpdump -pn port 53). So, leave it running in the background. When the problem occurs, copy & paste the last screenful or so that it should be enough to indicate why the resolution failed. If I am right, you will see unresponded queries. There was a bug in glibc that caused DNS failures, but it should only affect people using DNS servers reached by IPv6.
Let me give a bit more info too. This problem is difficult because of its transient nature. The replies so far indicate it's not distro-specific (for me, all of SuSE 8.2, 9.1, 9.2). I decided to concentrate on DNS resolving issues, although I am also pointing a finger at KDE because mozilla has never for me shown this total failure on these (I assume) resolver problems. In my case I seriously doubt a problem with the net connection - I have an extremely reliable cable connection. It is possible that the ISP's name server was overloaded and randomly responded with "no data". That there was a response is clear, if there hadn't been a response konqueror would have sat idling until timeout but the no-domain error came as soon as I clicked the link. It's also clear that the error "domain doesn't exist" itself is utter bullocks. I tried 3 different ISPs' name servers, and observed konqueror failures with all of them. Sorry, but I find that hard to believe. I then set up a caching DNS on the local workstation (bind 9, 60 seconds work in yast). No difference, regardless of which ISP's name server I forwarded to. I then deleted all name servers in the bind9 config, forcing resolution through root servers. No difference. Problems with external name servers? Uhhhhm, I don't think so. Summary: konquerer dies with bogus name resolution failures. The KDE developers must understand that THIS PROBLEM IS CAUSED ENTIRELY ON THE LOCAL WORKSTATION. STOP BLAMING OUTSIDE NAME SERVERS. I then disabled ipv6 in the kernel (not so easy, as most instructions are wrong for kernel 2.6, and SuSEfirewall2 in its default setting forces loading of ipv6 modules, which are impossible to unload once loaded). Much better, but I still got errors. Back to the locally caching bind9 forwarding to ISP name server. I've seen no problems since. My conclusion: Either KDE/konqueror doesn't work with ipv6 (the claimed fix is bogus) and it's still not working properly with ipv4 either, or else the problem is somewhething else but for some reason it shows up more often (but not only) in ipv6. It also seems to be restricted to KDE. Is there any debugging I could do, given above situation? HTH, Volker
Reopening the bug report. I am now convinced it's a local error. But please understand the situation: we use the standard name-resolution calls, the very ones Mozilla uses. So, in theory, either both should work, or both should fail. The only difference is that we do two calls at once, simultaneously, (in threads) while Mozilla sends the two queries in series, one after the other. So, again, in theory, we should even be faster by tens to hundreds of milliseconds, under normal circumstances. Just to be sure: in you /etc/resolv.conf, have you ever had an IPv6 nameserver (i.e., nameserver ::1, or similar line)? However, if that were the problem, you'd be having issues in Mozilla as well. Are you using KDE 3.4.0?
Sorry for not saying, thought I'd started this report. I have the current updates for SuSE 9.2, which are KDE 3.3.0. I don't use the KDE packages from supplementary. The problem has been the same with earlier versions of SuSE and KDE, I think going back to 8.2 / KDE 3.1.1. My /etc/resolv.conf: nameserver 127.0.0.1 search site some.other.nz I never had ::1 in there - perhaps I should have had. No ISP in New Zealand offers ipv6 so it's not much use to anyone here and I tend to ignore it.
Can anyone reproduce this at will? Or at least, after some trying, can get it to happen? I cannot solve the problem if I can't find its source. An strace could help me.
I have the same problem here on different systems with different dns server. After reload the site works fine but its a poor usability and I mean it's a strong bug. With other browser works fine.
I know it is a big problem, but I can't solve it if I can't find it. I've said it already. KDE resolves all hosts properly for me.
*** Bug 89613 has been marked as a duplicate of this bug. ***
please everyone: try to stop nscd and see if it stays reproducible
> please everyone: try to stop nscd and see if it stays reproducible Been there, tried that. Still getting the same resolver errors. Wouldn't other browsers go through nscd too? If so, those other browsers don't show resolver problems. I doubt it has to do with nscd. Volker
Can someone who can reproduce this problem try this: killall kio_http strace -o /tmp/kdeinit.trace -f -p <kdeinit's PID> Then make the problem show and send us the trace file. Just for the heck of it: can you also try to run "kdeinit" and see if the problem disappears?
CVS commit by thiago: Fixing the random resolver failures in the code. It was a local error after all, so I apologise for being hard on the bug reporters. You know how developers are protective of their own code :-) Many thanks to the patient bug reporters and to Coolo for his analysis of the problem. BUG:94703 The reason this bug happened was quite insidious. It was related to some events occurring in a very particular order in different threads, that's why it appeared to be random. - the lookups are started (KResolver::start()) - KResolver::wait() is called on the master thread - the lookups finish on the auxiliary threads - the resolver code detects the auxiliary lookups being done and processes the results (KResolverManager::doNotifying()), thereby waking up all threads on KResolver::wait() - the master thread is woken up now - here's the catch: while the master thread is waking up, the manager thread has started processing the final results (KResolverManager::handleFinishedItem()) and sets status to KResolver::Success - the master thread thinks the resolving is done and emits the finished(...) signal with an empty KResolverResult list! - after that, the manager thread collects the auxiliary results, builds the main results and emits the signal again, but it's too late, since an error will have already been reported After understanding the error, I am actually surprised it hasn't happened more often, least of all with me. I am betting it's the different threading implementations that cause the different behaviour, or the fact that people were using dual-processor or dual-core systems (which can do threading better than my single-core CPU). M +3 -10 kresolvermanager.cpp 1.35 --- kdelibs/kdecore/network/kresolvermanager.cpp #1.34:1.35 @@ -413,11 +413,5 @@ void KResolverManager::releaseData(KReso if (data->obj) { - if (data->nRequests > 0) - // PostProcessing means "we're done with our blocking stuff, but we're waiting - // for some child request to finish" data->obj->status = KResolver::PostProcessing; - else - // this may change after post-processing - data->obj->status = data->worker->results.isEmpty() ? KResolver::Failed : KResolver::Success; } @@ -484,5 +478,5 @@ bool KResolverManager::handleFinishedIte // this one has finished if (curr->obj) - curr->obj->status = KResolver::Success; // this may change after the post-processing + curr->obj->status = KResolver::PostProcessing; // post-processing is run in doNotifying() if (curr->requestor) @@ -531,6 +525,5 @@ KResolverWorkerBase* KResolverManager::f // good, this one says it can process if (worker->m_finished) - p->status = !worker->results.isEmpty() ? - KResolver::Success : KResolver::Failed; + p->status = KResolver::PostProcessing; else p->status = KResolver::Queued;
CVS commit by thiago: Backporting the "random resolver failure" problem to KDE 3.4.x. BACKPORT:1.34:1.35 CCBUG:94703 M +3 -10 kresolvermanager.cpp 1.34.2.1 --- kdelibs/kdecore/network/kresolvermanager.cpp #1.34:1.34.2.1 @@ -413,11 +413,5 @@ void KResolverManager::releaseData(KReso if (data->obj) { - if (data->nRequests > 0) - // PostProcessing means "we're done with our blocking stuff, but we're waiting - // for some child request to finish" data->obj->status = KResolver::PostProcessing; - else - // this may change after post-processing - data->obj->status = data->worker->results.isEmpty() ? KResolver::Failed : KResolver::Success; } @@ -484,5 +478,5 @@ bool KResolverManager::handleFinishedIte // this one has finished if (curr->obj) - curr->obj->status = KResolver::Success; // this may change after the post-processing + curr->obj->status = KResolver::PostProcessing; // post-processing is run in doNotifying() if (curr->requestor) @@ -531,6 +525,5 @@ KResolverWorkerBase* KResolverManager::f // good, this one says it can process if (worker->m_finished) - p->status = !worker->results.isEmpty() ? - KResolver::Success : KResolver::Failed; + p->status = KResolver::PostProcessing; else p->status = KResolver::Queued;