Summary: | Konqueror goes 100% CPU on some pages (possibly plug-in related) | ||
---|---|---|---|
Product: | [Applications] konqueror | Reporter: | P. Varet <p.varet> |
Component: | general | Assignee: | Konqueror Developers <konq-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | adawit, johannes.hirte, maksim, squan, stefan.fleckenstein |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Traceback from a kill -11 during the infinite loop.
Scheduler bug that can potentically cause 100% CPU usage Proposed fix for scheduler infinite loop... |
Description
P. Varet
2010-01-30 11:55:42 UTC
Created attachment 40379 [details]
Traceback from a kill -11 during the infinite loop.
Whew, didn't take long.
Apparently the issue may have to do with KIO scheduling.
That code was introduced in r1071917. adawit? *** Bug 224827 has been marked as a duplicate of this bug. *** Tommi suggested that bug 224827 is a duplicate of this. Since there I reported 100% reproducability for visiting imdb.com I straced that. This turns out that konqueror is frantically reading fd 7 in alternation with a poll on a set of several other descriptors. Here the part which gets endlessly repeated: read(7, 0x8072198, 4096) = -1 EAGAIN (Resource temporarily unavailable) read(7, 0x8072198, 4096) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, {fd=7, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=20, events=POLLIN}, {fd=13, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}, {fd=19, events=POLLIN}, {fd=27, events=POLLIN}, {fd=28, events=POLLIN}, {fd=38, events=POLLIN}, {fd=30, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}, {fd=25, events=POLLIN}, {fd=26, events=POLLIN}], 19, 0) = 0 (Timeout) If I can believe my strace output then file descriptor is a connection to the X-server: socket(PF_FILE, SOCK_STREAM, 0) = 7 connect(7, {sa_family=AF_FILE, path=@"/tmp/.X11-unix/X0"}, 20) = 0 getpeername(7, {sa_family=AF_FILE, path=@"/tmp/.X11-unix/X0"...}, [20]) = 0 Also konqueror stdout (luckily KDE betas are verbose on that) has something interesting: nspluginviewer(30985)/nspluginviewer (Qt/Xt) main: 4 - create XtEvents and GlibEvents nspluginviewer(30985)/nspluginviewer (Qt/Xt) main: 5 - dbus requestName nspluginviewer(30985)/nspluginviewer (Qt/Xt) main: 6 - new NSPluginViewer Connecting to deprecated signal QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString) *** Bug 225256 has been marked as a duplicate of this bug. *** SVN commit 1084737 by orlovich: The connection limit is clearly broken --- once we hit it, the scheduler spins at 100% CPU (startTimer(0)?) and doesn't appear to actually make any progress, either. So, disable this for 4.4.0 (if I made it) as a workaround. Might want to consider using the scheduler from trunk for 4.4.1... CCMAIL: adawit@kde.org BUG: 224857 M +2 -1 scheduler.cpp WebSVN link: http://websvn.kde.org/?view=rev&revision=1084737 > So, disable this for 4.4.0 (if I made it) as a workaround
Dear users, this is not meant as an instruction for you on how to make a workaround. Instead this means that a change to kdelibs has been applied as a quick fix for the 4.4 release.
Re-opened... The problem needs to be fixed correctly. If not for 4.4.0 for 4.4.1... Breaking one workaround for another one is not a solution. Can anyone one of you who run into this problem please tell share the spec of your machine ? I believe the issue stems from the problems identified above by several people but I cannot directly duplicate the issue at all. Perhaps on faster or slower machine the problem becomes apparent ? I have already identified the culprit in this code that can potentially be the cause this issue. However, I am unable to duplicate the problems as described in this ticket or the ones that were marked duplicate of this ticket. It may make more sense to backport the trunk scheduler once proven stable than to spend more time on this one. (In reply to comment #9) > It may make more sense to backport the trunk scheduler once proven stable than > to spend more time on this one. That may be true for the long term. However, the fix you provided is not a solution for anyone using kdewebkit. It causes an explosion of slaves to be launched for simple requests. And I mean in the hundereds or possible thousands of ioslaves being launched for visiting few sites in several tabs. As such this problem has to be addressed properly right away. Anyhow, I already have a patch for the problem and I will post it here for people to test. Hopefully this time someone can provide a feedback to see if this problem is fixed with this patch... Created attachment 40519 [details]
Scheduler bug that can potentically cause 100% CPU usage
This patch should address the high CPU usage reported in this bug. Need feedback from those that run into this issue to see if this proposed patch fixes your issues.
Thanks...
That's not a fix, and it doesn't address the jobs not actually being dispatched. (In reply to comment #12) > That's not a fix, and it doesn't address the jobs not actually being > dispatched. It is... just not the whole or even the most important fix! I only found that out after I posted the patch by testing with the limits in my http(s) protocol files set to very very low values (4 max and 2 max per host). I made the timer change because firing the timer as soon as events are done (slaveTimer.start(0)) means recursively executing startStep without any pause. Image what happens if you have lots of queued up jobs. Makes no sense to add a simple delay of 1 sec. there. Specially since that point will never be reach if a single job is scheduled properly. Anyhow that is obviously not the cause of the actual bug in the first place... The cause is as you mentioned the queued up jobs not being dispatched at all and the scheduler ending up in an infinite loop. That happens because jobs handled through startJobDirect are also incorrectly taken into consideration when the function checkLimits is called to enforce the limits. Since startJobDirect does not care or obey the limits from the protocols files, the number of active slaves can easily exceed the maximum amount specified there. Wait it gets even worse... When and how you encounter the bug can get even more complicated... Depending on what sites you visit in what sequence and how long those sites will keep connections alive as well as the timeout used to reclaim an ioslave all play a role in whether or not you will see this bug! Anyhow, I already have a fix for the above problem as well and I will post it once I am satisfied with the results... Comment on attachment 40519 [details]
Scheduler bug that can potentically cause 100% CPU usage
Obseleted by the fix iteration #2...
Created attachment 40520 [details]
Proposed fix for scheduler infinite loop...
Tested with both konqueror+khtml and konqueror+kwebkitpart plus very very small values for the maximum limits of ioslaves. The infinite loop should be addressed with this patch. Please test and provide feedback...
(In reply to comment #15) > Created an attachment (id=40520) [details] > Proposed fix for scheduler infinite loop... > > Tested with both konqueror+khtml and konqueror+kwebkitpart plus very very small > values for the maximum limits of ioslaves. The infinite loop should be > addressed with this patch. Please test and provide feedback... Seems that this patch has fixed it for me. > please tell share the spec of your machine ? I experience that on two out of two machines with 4.4 RC2 from opensuse 11.2 KDE repositories: an elder notebook and a dual core PC. As already said with high reproducibility for imdb.com (if not on the home page then by following any page link). > very very small values for the maximum limits of ioslaves How can this be configured? (In reply to comment #17) > > please tell share the spec of your machine ? > I experience that on two out of two machines with 4.4 RC2 from opensuse 11.2 > KDE repositories: an elder notebook and a dual core PC. > As already said with high reproducibility for imdb.com (if not on the home page > then by following any page link). > > > very very small values for the maximum limits of ioslaves > How can this be configured? You can change the "maxSlaveInstances" and "maxSlaveInstancesPerHost" parameters in http{s}.protocol files. Simply search your system for those files and change the values for these two parameters. Having said that my suggestion to you is to update to the final 4.4.0 release. It contains a temporary workaround for 4.4.0 and the next bug fix release, 4.4.1, will have correct fix for that will fix this issue... SVN commit 1088091 by adawit: Fixed bug# 224857 correctly and removed the temporary workaround for 4.4.1. BUG: 224857 M +29 -18 scheduler.cpp WebSVN link: http://websvn.kde.org/?view=rev&revision=1088091 |