Bug 175967

Summary: Error in gzread while loading from cache gives infinite loop in kio_http
Product: [Unmaintained] kio Reporter: Brendon Higgins <brendon>
Component: httpAssignee: kdelibs bugs <kdelibs-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: annma, auxsvr, bugs.kde.j
Priority: NOR    
Version: 4.1   
Target Milestone: ---   
Platform: Debian testing   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Patch to gracefully handle corrupt cache

Description Brendon Higgins 2008-11-24 10:25:29 UTC
Version:            (using KDE 4.1.3)
OS:                Linux
Installed from:    Debian testing/unstable Packages

Symptom: Occasionally (frequent enough to be annoying) I find runaway kio_http processes devouring an entire CPU core.

Investigation: I attached gdb to one such process and discovered that it was looping, presumably infinitely, at the while loop beginning at line 4244 of kdelibs/kioslave/http/http.cpp (KDE 4.1.3 version). Indicitavely, nbytes is -1. I.e., gzread is failing.

Looking at the documentation I've found, it seems that gzeof does not return 1 on error. There seems to be numerous documentations of the function on the net that indicate that it does, however note that http://www.zlib.net/manual.html#gzeof (the authoritative source, I would expect) only specifies that the function returns 1 if EOF has been previously detected, otherwise 0, which says nothing about errors.

Solution: I notice code within this section of http.cpp to check gzerror has been commented out. Maybe someone thought it was superfluous (given the erroneous documentations I've found, this is possible), or perhaps the error handling is incomplete, or something; I don't know. Either this ought to be used, or nbytes checked for negativity.

Pondering what actually might trigger this, I wonder: Is the cache thread safe? I mean, what would happen if I were to open two tabs of the same site, such that one tab starts writing to the cache, but the other tab attempts to read it before it's written completely? It may be a symptom of a bigger problem.
Comment 1 auxsvr 2009-03-25 10:52:39 UTC
I have a similar problem here (KDE 4.2.1): kio_http processes loop occasionally, i.e. they keep consuming bandwidth and (probably) prevent konqueror from exiting. I already have here one kio_http process looping after access to blogspot.com, apparently with content compression. Here's the trace near the loop:

HTTPProtocol::readDelimitedText (this=0xbf838f20,
    buf=0xbf818cac "HTTP/1.1 401 Malformed security token e=AOG8GaCE4gJfXFLXNMwCtW%2BVdgGFIddFcvp88UUbYTO3xE2U8fizRsFNT3FnAVvTuQ%2BaDJtsoVrA%2BreZiOwa3u5K8QtfhuVvcVd%2B6PHxDDr9QiPx7n%2Bg422gCBSvyWVUq5WXJN9tkKW3Mjsw1M9OBV"..., idx=0xbf818c80, end=131072, numNewlines=1)
    at /usr/src/debug/kdelibs-4.2.1/kioslave/http/http.cpp:1918
#1  0xb5ca7abf in HTTPProtocol::readResponseHeader (this=0xbf838f20) at /usr/src/debug/kdelibs-4.2.1/kioslave/http/http.cpp:2626
#2  0xb5cafa41 in HTTPProtocol::proceedUntilResponseHeader (this=0xbf838f20) at /usr/src/debug/kdelibs-4.2.1/kioslave/http/http.cpp:568
#3  0xb5cb05c9 in HTTPProtocol::proceedUntilResponseContent (this=0xbf838f20, dataInternal=false)
    at /usr/src/debug/kdelibs-4.2.1/kioslave/http/http.cpp:536
#4  0xb5cb1416 in HTTPProtocol::special (this=0xbf838f20, data=@0xbf838ee0) at /usr/src/debug/kdelibs-4.2.1/kioslave/http/http.cpp:3859
#5  0xb77b5766 in KIO::SlaveBase::dispatch (this=0xbf838f28, command=77, data=@0xbf838ee0)
    at /usr/src/debug/kdelibs-4.2.1/kio/kio/slavebase.cpp:1159
#6  0xb77b0624 in KIO::SlaveBase::dispatchLoop (this=0xbf838f28) at /usr/src/debug/kdelibs-4.2.1/kio/kio/slavebase.cpp:282
#7  0xb5ca150b in kdemain (argc=4, argv=0x8079760) at /usr/src/debug/kdelibs-4.2.1/kioslave/http/http.cpp:110
#8  0x0804dda9 in launch (argc=4, _name=0x8078dcc "kio_http", args=0x8078e42 "", cwd=0x0, envc=0, envs=0x8078e47 "", reset_env=false,
    tty=0x0, avoid_loops=false, startup_id_str=0x8050c9f "0") at /usr/src/debug/kdelibs-4.2.1/kinit/kinit.cpp:692
#9  0x0804e4ed in handle_launcher_request (sock=7) at /usr/src/debug/kdelibs-4.2.1/kinit/kinit.cpp:1273
#10 0x0804e9f4 in handle_requests (waitForPid=0) at /usr/src/debug/kdelibs-4.2.1/kinit/kinit.cpp:1466
#11 0x0804f614 in main (argc=2, argv=0xbf839734, envp=0xbf839740) at /usr/src/debug/kdelibs-4.2.1/kinit/kinit.cpp:1951
Comment 2 auxsvr 2009-03-25 12:00:03 UTC
Some more information: http://linuxhaters.blogspot.com/ triggers this easily (1 or 2 kio_https left consuming ~2 kB/s), I also have wireshark logs that demonstrate this, should anyone ask for them.
Comment 3 Anne-Marie Mahfouf 2009-03-25 13:02:09 UTC
So what you are saying is: after closing some URL, a few kio_https processes are left and keep connecting to the URL.
This is reproducible when opening http://linuxhaters.blogspot.com/ and then closing it. The kio_https processes then consume bandwidth. They do not consume CPU as the initial report says. So is it related or not to the initial report?
Comment 4 auxsvr 2009-03-26 09:26:00 UTC
They don't consume 100% CPU time, so it might be a different problem (I didn't notice this at first, sorry).
Comment 5 Anne-Marie Mahfouf 2009-03-26 19:58:52 UTC
Seems to be a duplicate of https://bugs.kde.org/show_bug.cgi?id=182778
Comment 6 Maksim Orlovich 2009-04-24 20:29:47 UTC
Comment #1, #2 are bug 187753, unrelated.
Comment 7 Brendon Higgins 2009-04-25 02:39:39 UTC
Maksim says: Comment #1, #2 are bug 187753, unrelated.

...which you've just closed my duplicate bug report for. Hehe, small world (of bugs). :-) Thank you.
Comment 8 Brendon Higgins 2009-06-29 11:36:41 UTC
Something triggered this bug for me again, recently, and it's REALLY giving me the irrits (since the scheduler is being useless in Debian's Linux 2.6.26 image). My suspicion now is that some file in the cache has gotten corrupted, so every time I load a webpage that uses that file from the cache the process goes haywire.

I went digging a little further into gzeof. This is the relevant code from gzio.c:
    if (s == NULL || s->mode != 'r') return 0;
    if (s->z_eof) return 1;
    return s->z_err == Z_STREAM_END;
Note that this function only ever returns 0 or 1, and only returns 1 if the error is end-of-stream, no other error. The kio_http code is plainly wrong to assume that any error will be exposed by gzeof, and it's obvious that if there is a decompression error, kio_http *will go into an infinite loop*.

The offending part of http.cpp has shifted up to line 4172 since I submitted this bug.
Comment 9 Brendon Higgins 2009-07-05 04:46:06 UTC
Created attachment 35051 [details]
Patch to gracefully handle corrupt cache

I modified the code as in the attached patch, and I haven't had a problem since. Someone who has a better overall understanding of this code could suggest a more appropriate way to handle a corrupt cache file, perhaps falling back to retrieving the original file as usual.

I'm frankly surprised that code that deterministically goes into an infinite loop on a file error has not been sanitised for this long.
Comment 10 Michael Pyne 2009-07-05 05:50:52 UTC
SVN commit 991487 by mpyne:

Handle the case where gzread() returns <0 to avoid an infinite loop with corrupted compressed data
in kio_http.  Patch graciously provided by Brendon Higgins (though touched up by myself).

This will be in KDE 4.4, I will backport to KDE 4.3.

CCBUG:175967


 M  +14 -2     http.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=991487
Comment 11 Michael Pyne 2009-07-05 05:53:02 UTC
SVN commit 991488 by mpyne:

Backport fix for bug 175967 to KDE 4.3.  This handles the case where gzread() returns <0,
which can occur for corrupted compressed data, and would cause an infinite loop.

Patch provided by Brendon Higgins, modified slightly by myself.

BUG:175967


 M  +14 -2     http.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=991488
Comment 12 Michael Pyne 2009-07-05 05:57:23 UTC
*** Bug 182778 has been marked as a duplicate of this bug. ***