Bug 212585 - frequent error message: there was an unrecoverable error in the protocol
Summary: frequent error message: there was an unrecoverable error in the protocol
Status: RESOLVED FIXED
Alias: None
Product: kopete
Classification: Applications
Component: Jabber Plugin (show other bugs)
Version: unspecified
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Kopete Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-01 02:38 UTC by doc.evans
Modified: 2011-03-18 00:00 UTC (History)
11 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Proposed patch (1.13 KB, patch)
2010-05-05 14:36 UTC, Stephan Krause
Details

Note You need to log in before you can comment on or make changes to this bug.
Description doc.evans 2009-11-01 02:38:10 UTC
Version:            (using KDE 4.3.2)
OS:                Linux
Installed from:    Ubuntu Packages

Every few minutes, kopete 0.80.2 is giving me the error message "there was an unrecoverable error in the protocol" when using jabber. No other version has ever done this, nor does any other jabber client.

The message seems to be harmless -- kopete seems to work just fine even after reporting this error -- but it's very annoying because it just pops up in the middle of the screen for a few seconds every few minutes and then disappears (it seems to bypass the regular KDE4 notification system). 

Either there should be some way to configure kopete so that it doesn't report this message, or it simply shouldn't report it at all, since it seems to be harmless.
Comment 1 Bob 2009-11-17 09:54:37 UTC
I have the same problem using Kopete 0.80.2 on KDE 4.3.3.
MSN protocol works fine but jabber pops these messages every few minutes. I am disconnected, then reconnected.
Everything is working well with other IM client.
Comment 2 doc.evans 2009-11-17 16:09:32 UTC
I thought at first that this might be a compatibility problem with the server since it's an old server, but I've now tried with several servers, including Google's, and this message occurs with all of them.

And as Comment #1 says, this message is followed by a disconnect and then a reconnect.

The disconnect/reconnect annoys people on my buddy list as much or more than the "unrecoverable error" message annoys me.

There definitely seems to be some sort of bug in the protocol code that is causing this to happen.
Comment 3 pouet 2009-11-17 17:16:38 UTC
Hello,
I have the same problem under KDE 4.3.2 with kopete 0.80.2 Ubuntu 9.10.
Comment 4 Roman Jarosz 2009-11-17 17:43:56 UTC
doc.evans could you post here the settings for google server (without login/password), and also how long it takes to disconnect?
Comment 5 doc.evans 2009-11-17 18:44:51 UTC
(In reply to comment #4)
> doc.evans could you post here the settings for google server (without
> login/password), and also how long it takes to disconnect?

I don't understand the question. What settings?

As far as I can tell it behaves exactly the same as every other XMPP server: Kopete gives the protocol error message and disconnects/reconnects at least about once per hour. Sometimes much more frequently.
Comment 6 Roman Jarosz 2009-11-17 19:59:40 UTC
Server settings like address, port, if ssl encryption is on, etc., basically all the settings on Connection tab in Account properties.

Btw. I use jabber.cz and never saw this bug.
Comment 7 doc.evans 2009-11-18 00:24:38 UTC
Allow plain text authentication is checked.
Server: gmail.com
Port: 5222

All the options in "Notifications" are checked.

Basically, I simply added the account and didn't change any settings myself.
Comment 8 Rolf Offermanns 2009-11-23 18:21:27 UTC
Same here. Openfire 3.6.4 in company intranet. Other clients (Gajim, Pidgin) do not show this behaviour. Does Kopete have a "log all pakets" option (w/o recompiling)?
Comment 9 Iacopo 2009-12-10 22:18:34 UTC
Same here
Both on kde 4.3.2 and 4.3.4 in Debian Sid (amd64).
I tried on internal jabber 1.4 server; i tried  a fresh install of jabber server without significant benefit.
I'll try external jabber servers in the next days
Comment 10 Curtis Smith 2009-12-15 15:53:49 UTC
Our Jabber server would disconnect me regularly. I tracked down one place I think it's occurring: kopetesockettimeoutwatcher.cpp SocketTimeoutWatcher::ackTimeoutCheck line 98. The info structure looked like this when it disconnected:

info = {tcpi_state = 1 '\001', tcpi_ca_state = 0 '\000',
  tcpi_retransmits = 0 '\000', tcpi_probes = 0 '\000',
  tcpi_backoff = 0 '\000', tcpi_options = 7 '\a', tcpi_snd_wscale = 7 '\a',
  tcpi_rcv_wscale = 6 '\006', tcpi_rto = 210000, tcpi_ato = 40000,
  tcpi_snd_mss = 1448, tcpi_rcv_mss = 1301, tcpi_unacked = 1, tcpi_sacked = 0,
  tcpi_lost = 0, tcpi_retrans = 0, tcpi_fackets = 0, tcpi_last_data_sent = 10,
  tcpi_last_ack_sent = 0, tcpi_last_data_recv = 188710,
  tcpi_last_ack_recv = 55000, tcpi_pmtu = 1500, tcpi_rcv_ssthresh = 72280,
  tcpi_rtt = 18750, tcpi_rttvar = 10000, tcpi_snd_ssthresh = 2147483647,
  tcpi_snd_cwnd = 3, tcpi_advmss = 1448, tcpi_reordering = 3,
  tcpi_rcv_rtt = 18750, tcpi_rcv_space = 38926, tcpi_total_retrans = 0}
Comment 11 Serhiy Kachanuk 2009-12-17 10:59:12 UTC
Yes, i think this is related to carrier lost detection and i also have this. at work with internal and exeternal servers, approx. every 30 minutes, and at home, but at approx. 1.5 hour interval
Comment 12 Laura Frasca 2009-12-28 18:39:56 UTC
*** This bug has been confirmed by popular vote. ***
Comment 13 Laura Frasca 2009-12-28 18:56:38 UTC
I am experiencing this problem as well.  I'm not sure exactly how frequent the error is (I haven't been tracking it precisely), but it's probably once or twice an hour.  I'm always reconnected successfully in about 10 seconds, but the messages are annoying.

Software:
Kopete package version 4.3.2-1 on Debian squeeze (testing), i386 architecture

I'm using my company's internal OpenFire server, just installed about a week ago so I assume it's the latest version of OpenFire (3.6.4).  Like other commenters, I have not seen this sort of disconnection error in other clients (Pidgin or Spark).

Account connection settings:
"Use protocol encryption (SSL)" is checked
"Allow plain-text password authentication" is checked
"Override default server information" is unchecked (defaults are the internal IM server hostname and port 5223).
Comment 14 Thomas Caswell 2010-01-08 18:19:34 UTC
I see this problem with uchicago's internal jabber server(couple of times an hour), but not with the google server (or at least much less frequently with the google server).  I am typically reconnected in less than a second. I haven't a clue how to debug this but am willing to do tests people want to see.

account settings for uchicago server:
"use protocol encryption (ssl)" is checked (won't connect otherwise)
"allow plain text password" is checked (won't connect otherwise)
using default hostname and port.

Version 0.80.2
Using KDE 4.3.2 (KDE 4.3.2)
from main kubuntu repositories,
Comment 15 Stéphane Berthelot 2010-02-04 14:32:43 UTC
I can confirm this bug on Slackware 13.0 / Kopete 0.80.2 (kde 4.3.4) / openfire 3.6.4

Why not use kernel per-connection tcp keepalive feature with setting a sufficiently low value ? ie.
TCP_KEEPIDLE 10
TCP_KEEPINTVL 1
TCP_KEEPCNT 5
which would cause disconnection on 15 sec with no answer.

It seems preferable to me to let the kernel do the job :) (disconnect) rather that trying to manipulate its internal structures. On very fast networks or very congested it may lead to false positives...

I'm on GigE at work and it disconnects more frequently that at home (DSL connection to the same openfire server)

This howto may be useful http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#codeneeding

If some people confirm me that this idea is good enough, I'll write and test a patch.
Comment 16 Tobias Wächter 2010-03-12 12:02:50 UTC
I can confirm that this bug seems to occur especially on fast internet connections. On my university network kopete disconnects every few minutes, while it doesn't happen at home (DSL connection).
Comment 17 Stephan Krause 2010-05-05 14:36:36 UTC
Created attachment 43267 [details]
Proposed patch

I can confirm this bug relates to the sockettimeoutwatcher. There seem to be three problems in the timeout check that, combined, lead to a false timeout

[From kopete/libkopete/kopetesockettimeoutwatcher.cpp]
if ( info.tcpi_last_ack_recv >= info.tcpi_last_data_sent && (info.tcpi_last_ack_recv - info.tcpi_last_data_sent) > mTimeoutThreshold )

1. (info.tcpi_last_ack_recv - info.tcpi_last_data_sent) is pretty arbitrary. If the current ACK is not yet received, we check the time difference between the ACK for the packet before and the current packet (which, on an idle connection, is aprox 55sek). A much more useful check would be "Was the last send more than mTimeoutThreshold ago?" (i.e. just info.tcpi_last_data_sent > mTimeoutThreshold)

2. On a fast network (i.e. gigabit LAN and a local jabber server) due to the limited accuracy of the times in the tcp_info struct the ack arrives with the same timestamp als last_send, i.e. tcpi_last_ack_recv == tcpi_last_data_send. So the check is triggered even if the ACK is already received.

3. If I understand the code correctly, the SocketTimeoutWatcher is supposed to check mTimeoutThreshold after the last send if there was an ACK. However, I noticed that ackTimeoutCheck was sometimes called almost imediately after the sending of a packet, so that the ACK was not yet arrived and 1. triggered

The proposed fix is attached (it fixes 1. and 2., so the bug is not triggered any more). I do not know why 3. happens, but if 1 and 2 are fixed, 3. is not exactly pressing.

To be honest, I think the cleenest solution would be  to use the tcp keepalive as proposed by Stéphane Berthelot. But this is a quick fix without changing the timeout architecture.
Comment 18 doc.evans 2010-05-05 15:36:24 UTC
I thought I had commented on Stéphane's post long ago, but apparently I didn't. His proposed solution is definitely the right one. 

I do quite a bit of socket programming and he is absolutely right that the TCP stack in the kernel should be the entity that is doing this job -- that's what these features in the stack are for, and trying to duplicate them in user space with running into problems in different network situations is (as this situation demonstrates) a nightmare.
Comment 19 Lamarque V. Souza 2011-03-17 23:48:40 UTC
Is this problem still happening?
Comment 20 Stéphane Berthelot 2011-03-17 23:58:02 UTC
I didn't have this problem for some time. I'm now on kde 4.6.1 and didn't have this for at least 2 months.

Sorry I didn't post a comment in-between to say it seems/was fixed.

Thanks anyway for fixing that.
Comment 21 Lamarque V. Souza 2011-03-18 00:00:51 UTC
Ok, so closing this bug.