Bug 73602

Summary: Message body lines starting with "From " are incorrectly parsed as message seperator in mbox folders
Product: [Unmaintained] kmail Reporter: A. Pfaller <apfaller>
Component: generalAssignee: kdepim bugs <kdepim-bugs>
Status: RESOLVED FIXED    
Severity: major    
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Sample mbox file showing the problem.
Stricter sanity checking for mbox message seperators.
Proposed patch

Description A. Pfaller 2004-01-27 14:20:21 UTC
Version:            (using KDE KDE 3.2.0)
Installed from:    SuSE RPMs
Compiler:          gcc-3.3.1 Suse 9.0 default compiler (gcc-3.3.1-24)
OS:          Linux

Sample :

..cut..
From MAINTAINERS:



SOFTWARE RAID (Multiple Disks) SUPPORT

P:      Ingo Molnar

..cut..

is incorrectly identified as a new message. 

While I inspected the mbox file with an hexeditor I
also noticed that line seperators are always "0d0a" instead
of a single "0a" when original mail was retrieved from
an POP account. Mail received locally always have only "0a".
Mails received with kmail of KDE 3.1.5 also only use a single
"0a" regardless of origin. Since this may be relevant for
the "From" detection code I included the relevant part as a
hexdump below.  

01eb:f700 53 74 61 74 65 3a 20 20 0a 58 2d 4b 4d 61 69 6c State: .X-KMail
01eb:f710 2d 4d 44 4e 2d 53 65 6e 74 3a 20 0a 0a 46 72 6f -MDN-Sent: ..Fro
01eb:f720 6d 20 4d 41 49 4e 54 41 49 4e 45 52 53 3a 0d 0a m MAINTAINERS:..
01eb:f730 0d 0a 53 4f 46 54 57 41 52 45 20 52 41 49 44 20 ..SOFTWARE RAID 
01eb:f740 28 4d 75 6c 74 69 70 6c 65 20 44 69 73 6b 73 29 (Multiple Disks)
01eb:f750 20 53 55 50 50 4f 52 54 0d 0a 50 3a 20 20 20 20  SUPPORT..P:    
01eb:f760 20 20 49 6e 67 6f 20 4d 6f 6c 6e 61 72 0d 0a 4d   Ingo Molnar..M


Andreas
Comment 1 Ingo Klöcker 2004-01-27 14:33:31 UTC
Any occurrences of "^From_" inside a message have to be escaped. See for example http://www.qmail.org/qmail-manual-html/man5/mbox.html:
"HOW A MESSAGE IS DELIVERED
     Here is how a program appends a message to an mbox file.

     It first creates a From_ line given the  message's  envelope
     sender  and  the  current  date.   If the envelope sender is
     empty (i.e., if this is a bounce message), the program  uses
     MAILER-DAEMON  instead.   If  the  envelope  sender contains
     spaces, tabs, or newlines, the program  replaces  them  with
     hyphens.

     The program then copies the message, applying >From  quoting
     to  each  line.   >From  quoting  ensures that the resulting
     lines are not From_ lines:  the program prepends a > to  any
     From_ line, >From_ line, >>From_ line, >>>From_ line, etc.

     Finally the program appends a blank line to the message.  If
     the  last  line of the message was a partial line, it writes
     two newlines; otherwise it writes one.

HOW A MESSAGE IS READ
     A reader scans through an mbox file looking for From_ lines.
     Any From_ line marks the beginning of a message.  The reader
     should not attempt to take advantage of the fact that  every
     From_ line (past the beginning of the file) is preceded by a
     blank line.

     Once the reader finds a message,  it  extracts  a  (possibly
     corrupted)  envelope  sender  and  delivery  date out of the
     From_ line.  It then reads until the next From_ line or  end
     of  file,  whichever  comes  first.  It strips off the final
     blank line and deletes  the  quoting  of  >From_  lines  and
     >>From_ lines and so on.  The result is an RFC 822 message."

So KMail behaves exactly as it should.
Comment 2 A. Pfaller 2004-01-27 14:34:09 UTC
Created attachment 4379 [details]
Sample mbox file showing the problem.
Comment 3 A. Pfaller 2004-01-27 15:06:54 UTC
If you look at kmfoldermbox.cpp you will see that  kmail tries a
conservative approach to this by using the regex 

#define MSG_SEPERATOR_REGEX "^From .*..:...*$"

to check for the message seperator as an additional check (
after trying a simple "From " string match. 

As I have noticed after upgrading to the 3.2 kmail there seem to
be quite a few broken mailers around (I have many more of this
problematic messages from different senders). Since I have never
noticed this before with the older kmail versions I think this is
a regression. 

I am not a regex guru but would the changed line end handling
perhaps cause more matches? ":...*$" might match ":\r\n" which
were never used as line terminators in older versions of kmail. 

Thanks,
Andreas

PS: Is there a reason for not simply using "\n" like before.
Some editors seem to have problems with a mixture of different
EOL conventions in the same file.

Comment 4 A. Pfaller 2004-01-27 17:57:42 UTC
Created attachment 4381 [details]
Stricter sanity checking for mbox message seperators.
Comment 5 Ingo Klöcker 2004-01-27 18:32:53 UTC
":...*$" does indeed match ":\r\n". The '\r\n' line ending problem (when downloading via POP3) will be fixed in KDE 3.2.1 (it's already fixed in cvs).

OTOH, the stricter regexp you propose should be okay. We'll consider using it.

BTW, did you experience the problem when you tried to import an mbox with the problematic message or did you simply download the problematic message via POP3?
Comment 6 A. Pfaller 2004-01-27 19:20:39 UTC
Subject: Re:  Message body lines starting with "From " are incorrectly parsed as message seperator in mbox folders

POP3. I did not notice anything unusual for the few days I have been
using the 3.2 kmail until today. Usually may mbox files only grow but
today I did some housekeeping (deletion and moving) of messages and
upon the next restart of kmail I started noticing problems (I have
auto compaction enabled). I tried to solve that by deleting all of
kmails index files but after kmail finshed rebuilding the index
files many more problems appeared. 

PS: I have rebuild my kmail with the stricter regex and restored a 
few mbox files from my backup and now everything seem to be back
to normal.

Thanks,
Andreas

Comment 7 A. Pfaller 2004-01-27 23:02:48 UTC
Subject: Re:  Message body lines starting with "From " are incorrectly parsed as message seperator in mbox folders

On Tuesday 27 January 2004 18:32, Ingo Klöcker wrote:

> BTW, did you experience the problem when you tried to import an mbox 
> with the problematic message or did you simply download the
> problematic message via POP3?  

Sorry I am slow today, I just understood the reason for your question.
I just verified by grepping through my mbox files that mails received
with KDE 3.2 kmail via POP3 do NOT have the required ">From quoting".
Mail received with older kmail versions are stored correctly.
The same applies for mail received via /var/spool/mail (I checked
the content before kmail collected it and the quoting was present at
that time).

This is the real bug. The regex patch just fixes the symptoms.

Andreas

Comment 8 A. Pfaller 2004-01-30 15:22:59 UTC
Created attachment 4434 [details]
Proposed patch

Actually use the result of the call to "escapeFrom( msgText )".
Stricter sanity checking of "From " mbox seperator lines.
Comment 9 Ingo Klöcker 2004-01-31 14:25:51 UTC
Doh! Thanks for catching this stupid bug. I've applied your patches. They will be in KDE 3.2.1.