Bug 297198

Summary: Violation of rfc4155: kmail2 does not use ctime format for first from line when storing in mbox format
Product: [Applications] kmail2 Reporter: Thomas Arend <thomas>
Component: commands and actionsAssignee: kdepim bugs <kdepim-bugs>
Status: RESOLVED FIXED    
Severity: grave CC: kollix, montel, thomas
Priority: NOR    
Version: 4.8.3   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In: 4.13
Sentry Crash Report:
Attachments: zipped file of a single exported message
Message saved from kmail Version 1.13.6
Message saved from kmail Version 4.8.3

Description Thomas Arend 2012-03-31 17:16:44 UTC
Created attachment 70041 [details]
zipped file of a single exported message

Since 4.7 the spamassassin "sa-learn --spam --mbox" command can not learn spam messages which are stored in mbox format by kmail. sa-learn always reports:
  Learned tokens from 0 message(s) (0 message(s) examined)

sa-learn still works on older mbox files.

I noticed some differences between former and current mbox exports. Now there are two empty lines between two messages. Removing the empty lines didn't not solve the problem.   

This same error in 4.8.0, 4.8.1
Comment 1 Laurent Montel 2012-04-06 07:59:30 UTC
I confirm it.
Comment 2 Thomas Arend 2012-05-15 20:11:28 UTC
Created attachment 71117 [details]
Message saved from kmail Version 1.13.6

This message was received and saved from
 
KMail
Version 1.13.6
Unter KDE 4.6.00 (4.6.0) "release 6"

sa-learn --spam --mbox reports
Learned tokens from 1 message(s) (1 message(s) examined)

Which is what we expect!
Comment 3 Thomas Arend 2012-05-15 20:15:25 UTC
Created attachment 71118 [details]
Message saved from kmail Version 4.8.3

This message was received and stored with kmail2 Version 4.8.3

sa-learn --spam --mbox gives
Learned tokens from 0 message(s) (0 message(s) examined)

Which we do not expect. 

Both checks sa-learn were run on the same computer with spamassassin 3.3.1!
Comment 4 Thomas Arend 2012-05-15 20:47:34 UTC
There is a simple difference in the messages which causes the problem.

kmail1 used the following format of the "From" line:
  From thomas@arend-rhb.de Tue May 15 22:01:41 2012

kmail2 used the format:
  From thomas@arend-rhb.de Tue, 15 May 2012 22:01:41 +0200

If I change the "From"-line to the older format, sa-learn can learn the messages!

kmail1 is not iaw RFC 822; which kmail2 is. Sticking to the standard the problem is with spamassassin.
Comment 5 Thomas Arend 2012-05-16 20:06:07 UTC
See bug report to SpamAssassin #6703 (https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6703)

See http://tools.ietf.org/html/rfc4155 which states that "A comprehensive description of mbox database files on UNIX-like systems can be found at http://qmail.org./man/man5/mbox.html, which should be treated as mostly authoritative ..."

man 5 mbox defines that the date time stamp of the From_ line shopuld be in ctime format. 

Therefore I propose to switch back to the old  date time stamp format of kmail (Version 1.x).
Comment 6 Thomas Arend 2012-05-16 21:11:19 UTC
Comment from Mark Martinec 2012-05-16 13:42:01 UTC from https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6703#c6

> When saving messages with kmail 1 the From Line has following format
> which is not iaw RFC 822:
>   From info@ende-18-06.com Fri Jun 17 16:03:07 2011
> With kmail 2 the format is changed to the format which is iaw RFC 822
>   From thomas@arend-rhb.de Tue, 15 May 2012 22:01:41 +0200
> which is not parsed correctly by sa-learn. sa-learn --spam reports: [...]

Oh, no, not yet another incompatible mbox format!!!

> Proposed to change the behavior in a way that the old malformed From lines
> and the new correct ones are parsed.

It is the other way around, the new one differs from everybody else.

The format of the mbox file (along with its separator From_ lines)
is *not* governed by RFC 822 or its successors. There is no formal
standard for an mbox format, the RFC 4155 comes closest:
  http://tools.ietf.org/html/rfc4155
See also a Wikipedia article:
  http://en.wikipedia.org/wiki/Mbox


RFC 4155 says:

| a timestamp indicating the UTC date and time when the message
| was originally received, conformant with the syntax of the
| traditional UNIX 'ctime' output sans timezone (note that the
| use of UTC precludes the need for a timezone indicator);

This matches qmail docs:
  http://qmail.org/qmail-manual-html/man5/mbox.html
and matches Postfix and sendmail's local delivery agent.


To accommodate the new incompatible format it seems that the
two instances of a regexps in ArchiveIterator.pm need to be
extended, or just relaxed. Not sure if the date would still
be correctly parsed.

Best would be to persuade kmail folks to back off the change!
Comment 7 Martin Koller 2012-07-28 22:52:24 UTC
confirmed in comment #1
Comment 8 Thomas Arend 2012-09-15 02:19:48 UTC
The bug is still in 4.9.1
Comment 9 Thomas Arend 2012-09-15 02:30:47 UTC
I changed the name of the bug so that it is clear, that this is a kmail2 issue.

The mbox format kmail2 uses is incompatible to kmai1 mbox format and is a violation of rfc 4155.
Comment 10 Thomas Arend 2012-10-27 22:33:29 UTC
kmail 4.9.2

When a message with unkown date - for example spam massages storing the adress in the date field - the from_ line used the local, nationalized  date time format:

From user@example.com So. Okt 28 00:25:18 2012

SpamAssassin can not detect messages in such an mbox.
Comment 11 Thomas Arend 2012-12-09 16:41:25 UTC
The error is now also kmail2 4.9.4.
Comment 12 Thomas Arend 2013-06-26 19:46:40 UTC
Nice try to fix this bug, but you did it totally wrong and screwed things more than ever:

The From_ line now looks as follows:

     From thomas@example.com Mi. Jun 26 21:26:17 2013

Why did you change the weekday to German languge? Is the month now also Englisch?
Comment 13 Thomas Arend 2013-06-26 19:47:24 UTC
I forgot the Version 4.10.4
Comment 14 Martin Koller 2014-03-22 10:53:54 UTC
patch added https://git.reviewboard.kde.org/r/116975/
Comment 15 Martin Koller 2014-03-22 12:16:41 UTC
Git commit d1379fda35c9809913d5ab5432461ad5a8fe92d8 by Martin Koller.
Committed on 22/03/2014 at 12:14.
Pushed by mkoller into branch 'KDE/4.13'.

write "From " delimiter line  with correct dateTime format

Fix writing dateTime field in the "From " delimiter line according
to RFC4155
FIXED-IN: 4.13
REVIEW: 116975

M  +6    -2    kmbox/mbox_p.cpp

http://commits.kde.org/kdepimlibs/d1379fda35c9809913d5ab5432461ad5a8fe92d8