Summary: | Message body lines starting with "From " are incorrectly parsed as message seperator in mbox folders | ||
---|---|---|---|
Product: | [Unmaintained] kmail | Reporter: | A. Pfaller <apfaller> |
Component: | general | Assignee: | kdepim bugs <kdepim-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | major | ||
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | openSUSE | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Sample mbox file showing the problem.
Stricter sanity checking for mbox message seperators. Proposed patch |
Description
A. Pfaller
2004-01-27 14:20:21 UTC
Any occurrences of "^From_" inside a message have to be escaped. See for example http://www.qmail.org/qmail-manual-html/man5/mbox.html: "HOW A MESSAGE IS DELIVERED Here is how a program appends a message to an mbox file. It first creates a From_ line given the message's envelope sender and the current date. If the envelope sender is empty (i.e., if this is a bounce message), the program uses MAILER-DAEMON instead. If the envelope sender contains spaces, tabs, or newlines, the program replaces them with hyphens. The program then copies the message, applying >From quoting to each line. >From quoting ensures that the resulting lines are not From_ lines: the program prepends a > to any From_ line, >From_ line, >>From_ line, >>>From_ line, etc. Finally the program appends a blank line to the message. If the last line of the message was a partial line, it writes two newlines; otherwise it writes one. HOW A MESSAGE IS READ A reader scans through an mbox file looking for From_ lines. Any From_ line marks the beginning of a message. The reader should not attempt to take advantage of the fact that every From_ line (past the beginning of the file) is preceded by a blank line. Once the reader finds a message, it extracts a (possibly corrupted) envelope sender and delivery date out of the From_ line. It then reads until the next From_ line or end of file, whichever comes first. It strips off the final blank line and deletes the quoting of >From_ lines and >>From_ lines and so on. The result is an RFC 822 message." So KMail behaves exactly as it should. Created attachment 4379 [details]
Sample mbox file showing the problem.
If you look at kmfoldermbox.cpp you will see that kmail tries a conservative approach to this by using the regex #define MSG_SEPERATOR_REGEX "^From .*..:...*$" to check for the message seperator as an additional check ( after trying a simple "From " string match. As I have noticed after upgrading to the 3.2 kmail there seem to be quite a few broken mailers around (I have many more of this problematic messages from different senders). Since I have never noticed this before with the older kmail versions I think this is a regression. I am not a regex guru but would the changed line end handling perhaps cause more matches? ":...*$" might match ":\r\n" which were never used as line terminators in older versions of kmail. Thanks, Andreas PS: Is there a reason for not simply using "\n" like before. Some editors seem to have problems with a mixture of different EOL conventions in the same file. Created attachment 4381 [details]
Stricter sanity checking for mbox message seperators.
":...*$" does indeed match ":\r\n". The '\r\n' line ending problem (when downloading via POP3) will be fixed in KDE 3.2.1 (it's already fixed in cvs). OTOH, the stricter regexp you propose should be okay. We'll consider using it. BTW, did you experience the problem when you tried to import an mbox with the problematic message or did you simply download the problematic message via POP3? Subject: Re: Message body lines starting with "From " are incorrectly parsed as message seperator in mbox folders POP3. I did not notice anything unusual for the few days I have been using the 3.2 kmail until today. Usually may mbox files only grow but today I did some housekeeping (deletion and moving) of messages and upon the next restart of kmail I started noticing problems (I have auto compaction enabled). I tried to solve that by deleting all of kmails index files but after kmail finshed rebuilding the index files many more problems appeared. PS: I have rebuild my kmail with the stricter regex and restored a few mbox files from my backup and now everything seem to be back to normal. Thanks, Andreas Subject: Re: Message body lines starting with "From " are incorrectly parsed as message seperator in mbox folders
On Tuesday 27 January 2004 18:32, Ingo Klöcker wrote:
> BTW, did you experience the problem when you tried to import an mbox
> with the problematic message or did you simply download the
> problematic message via POP3?
Sorry I am slow today, I just understood the reason for your question.
I just verified by grepping through my mbox files that mails received
with KDE 3.2 kmail via POP3 do NOT have the required ">From quoting".
Mail received with older kmail versions are stored correctly.
The same applies for mail received via /var/spool/mail (I checked
the content before kmail collected it and the quoting was present at
that time).
This is the real bug. The regex patch just fixes the symptoms.
Andreas
Created attachment 4434 [details]
Proposed patch
Actually use the result of the call to "escapeFrom( msgText )".
Stricter sanity checking of "From " mbox seperator lines.
Doh! Thanks for catching this stupid bug. I've applied your patches. They will be in KDE 3.2.1. |