Bug 193597 - kio_smtp hangs, waiting to acquire lock; sending mail is then impossible
Summary: kio_smtp hangs, waiting to acquire lock; sending mail is then impossible
Status: CLOSED NOT A BUG
Alias: None
Product: kmail
Classification: Unmaintained
Component: general (show other bugs)
Version: unspecified
Platform: FreeBSD Ports FreeBSD
: NOR normal
Target Milestone: ---
Assignee: kdepim bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-22 02:46 UTC by Gareth McCaughan
Modified: 2011-11-11 09:12 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gareth McCaughan 2009-05-22 02:46:25 UTC
Version:            (using KDE 4.2.2)
Compiler:          gcc 4.2.1 
OS:                FreeBSD
Installed from:    FreeBSD Ports

I am running a local SMTP server; kmail is configured to talk to it to send mail. This has been working fine, but for no obvious reason the following things have started happening at some point in the last day or so.

1. Attempts to send mail fail, leaving the message in my outbox. kmail spins for about 20s after I click the "send" button, not responding to mouse clicks or repainting its windows; then it creates a window but doesn't draw anything in it; then about 25s later it fills in the window with: "Sending failed: Unable to create SMTP job. The message will stay in the 'outbox' folder [...] The following transport was used: Unnamed". (Plus, since this is not the first time it failed, the offer to either continue with other messages in my outbox or give up.)

After each of the two long pauses just mentioned, one copy of the following message is written to, I suppose, kmail's stderr: 'kmail(1280): couldn't create slave: "Cannot talk to klauncher: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken."' 1280 is, indeed, the PID of my kmail process.

2. I have a kio_smtp process sitting in the state that ps on my machine calls "L": "a process that is waiting to acquire a lock". Attempts to kill it fail. There is only one such process; trying to send a message doesn't create another. It never seems to leave the "L" state. So far as I can tell, it isn't doing anything other than waiting idly.

Exiting kmail doesn't make the kio_smtp process go away. (kmail itself really does, though.) The kio_smtp process remains unkillable.

I also have a klauncher process in state "L". It too is unkillable. This and the kio_smtp process are the only two in that state. Each waiting for a lock held by the other?

I haven't made any changes to my system's hardware or OS, or to the way kmail is configured, or to my MTA, since long before the trouble began. (In particular, I have sent and received plenty of mail since the last time my configuration changed.)

Doing other things with kmail doesn't seem problematic. Incoming mail is processed OK. The SMTP server itself appears to be fine. I can talk to it on port 25 and send mail with it.

I'm running FreeBSD 7.1 on a dual-core Intel processor.
Comment 1 Thomas McGuire 2009-05-22 12:31:02 UTC
KMail communicates with the kio_smtp process when sending mail, and it seems that this communication fails, possibly because kio_smtp got stuck.

All that works for me here on Linux, I have to admit I don't have a clue where to look at, the app<->slave communication is low-level kdelibs stuff.
You might try to get a backtrace of kio_smtp at the point where it hangs, with gdb.
Comment 2 Gareth McCaughan 2009-05-23 03:41:08 UTC
Unfortunately, attempting to attach gdb (version 6.1.1, in case it matters) to either the kio_smtp process or the klauncher process doesn't do anything useful for me. I've tried two ways. Firstly, just with -p [process ID]. That gives me this internal error:

/usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1443: internal-error: legacy_fetch_link_map_offsets called without legacy link_map support enabled.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

which I get again if I ask it not to quit. After that, gdb just sits there doing nothing. (I'm guessing that it's trying to talk to the process I've asked it to debug, which isn't responding because it's too busy waiting for a lock it's never going to get.)

If instead I provide gdb with a path to the kdeinit4 executable (that's the right thing for attaching to kio_smtp, yesno?) I don't get the errors; instead gdb goes straight to sitting there ignoring me.

(In both cases, gdb is not killable with ctrl-C, but I can e.g. ctrl-Z it and then kill the gdb process.)

My gdb-fu is pretty weak and I may very well be missing something simple. Is there some cleverer way I could get a backtrace? Any point poking at the process's /proc entry in search of its stack, or anything?
Comment 3 Gareth McCaughan 2009-06-19 03:13:20 UTC
It looks like (1) someone else has has this problem and (2) it's a FreeBSD kernel bug.

There's a thread in the freebsd-hackers mailing list, entitled "How best to debug locking/scheduler problems", where just today John Baldwin posted a patch that allegedly fixes the problem. The message-ID is <h1bmmg$s7t$1@FreeBSD.cs.nctu.edu.tw>.

I haven't yet tried applying the patch and seeing whether the problem goes away. Since it's very intermittent -- it's happened to me twice, with an interval of a few weeks -- it may be difficult to tell, but I'll apply the patch and report back here if the problem recurs.
Comment 4 Björn Ruberg 2010-02-28 01:14:54 UTC
Any update on this?
Comment 5 Gareth McCaughan 2010-02-28 01:35:26 UTC
Well, it hasn't recurred. Whether that has anything to do with the patch, I don't know. I've upgraded both OS and KDE at least once since the original report, so of course either of those might be partly responsible for the absence of recurrences. I'm not inclined to anti-patch my OS in the hope of provoking the problem again :-).

I have no objection to this bug's being marked resolved (presumably as INVALID since there's reason to think the bug wasn't in KDE) and/or closed. I don't know what the KDE project's conventions for this are; if the Right Thing is for me (as reporter) to do that, let me know.
Comment 6 Björn Ruberg 2010-02-28 01:47:26 UTC
Okay, I'm closing this. If you find out that this is a kde problem, just reopen.
Comment 7 Gareth McCaughan 2010-02-28 03:06:04 UTC
OK. Resolved -> closed on the assumption that that bit should be done by the reporter.
Comment 8 Samu Voutilainen 2011-11-11 09:12:47 UTC
Sorry for commenting such old ticket, but this just happened here.

I have a Linux and might be some updates causing internal API/ABI not matching and causing such problems. So I guess not a real problem. Just leaving a note for others who happens to search information about this problem.