Bug 322200 - limit restarts of crashing resources
Summary: limit restarts of crashing resources
Status: RESOLVED UNMAINTAINED
Alias: None
Product: Akonadi
Classification: Frameworks and Libraries
Component: server (show other bugs)
Version: 1.9.2
Platform: Debian unstable Linux
: NOR normal
Target Milestone: ---
Assignee: kdepim bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-10 15:23 UTC by Martin Steigerwald
Modified: 2017-01-07 21:58 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Steigerwald 2013-07-10 15:23:38 UTC
In bug #322199 I saw akonadi_imap resource crashing and being restarted all the time.

It spammed our perdition proxy like mad. It has 509MB now with:


rproxy:~# grep "Error reading authentication information" /var/log/mail.log | grep "wsip" | wc -l
286409
rproxy:~# grep "Error reading authentication information" /var/log/mail.log | grep "notebookip" | wc -l
1800668

(Yes, thats no joke)


Reproducible: Always

Steps to Reproduce:
Use bug #322199
Actual Results:  
Akonadi IMAP resource seems to be crashing and restarted indefinitely.

ms@mango:~/.local/share/akonadi> cat akonadi_control.error.old ProcessControl: Application /usr/bin/akonadi_imap_resource stopped unexpectedly ( "Process crashed" )


Expected Results:  
Restarted crashing processes is limited. I think the resource shouldn't crash so I'd limit it to the first crash. But I wouldn't accept more than 3 or 5 crashes before giving up. A crash is a bug.

Akonadi 1.9.2-2 with KDEPIM 4.10.4 on Debian Sid.
Comment 1 Martin Steigerwald 2013-07-10 15:24:46 UTC
Raising to major, as this can fill up mail server logs quickly.
Comment 2 Christophe Marin 2013-07-10 17:06:02 UTC
this is not a 'major' bug
Comment 3 Martin Steigerwald 2013-08-13 11:55:47 UTC
Christophe, I perceive filling up the logs of a mail proxy server upto the point that it would fill the available disk space of it a major bug. This breaks infrastructure that is not just there for that one or two mail clients.

Now one can argue why does the perdition log this. Well its just a new connection attempt and in my oppinion it makes sense to have it logged. I consider doing hundred thousands of new connections to the same mail server in a relatively short timeframe a major bug.

I.e. it doesn't even just load the local machine, but the mail server up to the point that is near to a denial of service attack. Even a distributed one in that case cause it only happens when two or more Akonadi IMAP clients access the same server.

So the software does not only crash. And it also insists of restarting itself after crashing again and again and again. So the software knows that it is broken but even insists of being running like this.

If that is no major bug, I do not know what is. Seriously.


So what is your reasoning for this not to be a major bug?
Comment 4 Christophe Marin 2013-08-14 14:55:29 UTC
(In reply to comment #3)
> Christophe, I perceive filling up the logs of a mail proxy server upto the
> point that it would fill the available disk space of it a major bug. This
> breaks infrastructure that is not just there for that one or two mail
> clients.

A major bug for what you call a "mail proxy server", maybe. in case of a real DOS attack, it could behave strangely.

> 
> So what is your reasoning for this not to be a major bug?

There's already a crash counter:
in server/control/processcontrol.cpp: static const int s_maxCrashCount = 2;
Comment 5 Christophe Marin 2013-08-14 15:07:28 UTC
your issue most likely comes from your proxy configuration or from what it redirects to (eg a mail server that doesn't accept concurrent connections)
Comment 6 Martin Steigerwald 2013-08-14 15:18:06 UTC
Christophe: No.

No. For testing I have two Icedove mail clients configured. Using the exact same perdition proxy server. *They work just fine*

Aside from that ten or more co-workers also use this proxy server.

> There's already a crash counter: in server/control/processcontrol.cpp: static const int
> s_maxCrashCount = 2;

Well then that oviously doesn't work here.

Processes get restarted over and over again, client port numbers change, new connection is attempted.

Chritophe, I have seen what I have seen and I pretty much bet that I am capable of seeing whether processes crash and get restarted or not. They do. This bug is real.
Comment 7 Daniel Vrátil 2013-08-28 13:59:44 UTC
There is a timer that will reset the crash counter to zero after one minute without a crash, so theoretically the resource can indeed be restarted indefinitely.

Martin, can you please try to provide a backtrace of the crash?
Comment 8 Martin Steigerwald 2013-08-28 14:13:10 UTC
Well, trying this again spams our perdition log again. I think I can do this for a short time, but I need clear instructions on how it works as I don't want to play around with our infrastructure needlessly.

Are there any instructions available? Basically I need to attach a gdb to akonadi_imap_resource?

Are you sure that the one minute timer actually works? What I have seen here was way more restarts than once a minute.

rproxy:~# grep "Error reading authentication information" /var/log/mail.log | grep "wsip" | wc -l 286409
rproxy:~# grep "Error reading authentication information" /var/log/mail.log | grep "notebookip" | wc -l 1800668

This happened within a few hours. And with a minute timer it can only be restarted three or four times a minute if I understand the crash count except from Christophe correctly, thus at most 14400 times a hour.

Here an excerpt from the log:

Jul 10 16:29:25 rproxy perdition.imaps[15127]: Fatal Error reading authentication information from client clientip:51242->serverip:993: Exiting child
Jul 10 16:29:25 rproxy perdition.imaps[15129]: Connect:  clientip:51244->serverip:993
Jul 10 16:29:25 rproxy perdition.imaps[15128]: Fatal Error reading authentication information from client clientip:51243->serverip:993: Exiting child
Jul 10 16:29:25 rproxy perdition.imaps[15130]: Connect:  clientip:51245->serverip:993
Jul 10 16:29:25 rproxy perdition.imaps[15129]: Fatal Error reading authentication information from client clientip:51244->serverip:993: Exiting child
Jul 10 16:29:25 rproxy perdition.imaps[15131]: Connect:  clientip:51246->serverip:993
Jul 10 16:29:25 rproxy perdition.imaps[15130]: Fatal Error reading authentication information from client clientip:51245->serverip:993: Exiting child
Jul 10 16:29:25 rproxy perdition.imaps[15132]: Connect:  clientip:51247->serverip:993
Jul 10 16:29:25 rproxy perdition.imaps[15131]: Fatal Error reading authentication information from client clientip:51246->serverip:993: Exiting child
Jul 10 16:29:25 rproxy perdition.imaps[15133]: Connect:  clientip:51248->serverip:993
Jul 10 16:29:25 rproxy perdition.imaps[15132]: Fatal Error reading authentication information from client clientip:51247->serverip:993: Exiting child
Jul 10 16:29:25 rproxy perdition.imaps[15134]: Connect:  clientip:51249->serverip:993
Jul 10 16:29:25 rproxy perdition.imaps[15133]: Fatal Error reading authentication information from client clientip:51248->serverip:993: Exiting child
Jul 10 16:29:25 rproxy perdition.imaps[15135]: Connect:  clientip:51250->serverip:993
Jul 10 16:29:25 rproxy perdition.imaps[15134]: Fatal Error reading authentication information from client clientip:51249->serverip:993: Exiting child

Thats about 8 attempts within *one* second.
Comment 9 Daniel Vrátil 2013-08-28 14:26:55 UTC
Are you 100% sure that the IMAP resource actually crashes and it's just not trying to reconnect upon SSL failure? 

See https://bugs.kde.org/show_bug.cgi?id=316840
Comment 10 Martin Steigerwald 2013-08-28 14:34:03 UTC
No, not 100%.

I took this as a hint:

ms@mango:~/.local/share/akonadi> cat akonadi_control.error.old ProcessControl: Application /usr/bin/akonadi_imap_resource stopped unexpectedly ( "Process crashed" )

I admit aside above I have no proof of any restarting as I did not monitor the PIDs. Sorry, Christophe, I may have been too bold of this – as I got the impressed you'd wanted to imply that the bug is with the proxy server. I can only prove the changing port numbers.

I wonder whether there is an easy way to limit the connection attempts to make testing this again a bit safer.

Is there a way to start akonadi_imap resource on command line and see what happens?
Comment 11 Martin Steigerwald 2013-08-28 14:49:09 UTC
Thing is, it only seemed to happen here if *two* instances of KMail access the server. Well, I may try running one instance and see whether I see any reconnects.
Comment 12 Martin Steigerwald 2013-08-28 15:20:31 UTC
Okay, I tested again with one client. Worked.

With two clients. Worked too.

But well, last time it only happened after a while. Maybe something else triggers it. Thats the thing, I think it works, forget it, and if it still happens, it will at some time starts to spam the server. Is it possible to rate limit connection attempts to be on the safe side for this case?

I captured Akonadi IMAP logs on both workstation and laptop. If you are interested in some excerpts please ask. The logs contain confidential information, so I don't want to share them in whole.

I am in office again Wednesday next week.

Thanks,
Martin
Comment 13 ancow 2013-11-10 13:10:06 UTC
I'd like to mention that I've had akonadi mail resources crash and restart several times per second in KDE 4.10 (either .4 or .5, I don't remember exactly). The crash counter was definitely not working, as the crashes caused popup notifications and were therefore easy to observe.

I haven't had the crashes in several months, so I don't exactly know whether the problem persists.
Comment 14 Denis Kurz 2016-09-24 20:40:58 UTC
This bug has only been reported for versions older than KDEPIM 4.14 (at most akonadi-1.3). Can anyone tell if this bug still present?

If noone confirms this bug for a recent version of akonadi (part of KDE Applications 15.08 or later), it gets closed in about three months.
Comment 15 Denis Kurz 2017-01-07 21:58:25 UTC
Just as announced in my last comment, I close this bug. If you encounter it again in a recent version (at least 5.0 aka 15.08), please open a new one unless it already exists. Thank you for all your input.