Version: (using KDE KDE 3.2.0) Installed from: SuSE RPMs Compiler: gcc-3.3.1 Suse 9.0 default compiler (gcc-3.3.1-24) OS: Linux Sample : ..cut.. From MAINTAINERS: SOFTWARE RAID (Multiple Disks) SUPPORT P: Ingo Molnar ..cut.. is incorrectly identified as a new message. While I inspected the mbox file with an hexeditor I also noticed that line seperators are always "0d0a" instead of a single "0a" when original mail was retrieved from an POP account. Mail received locally always have only "0a". Mails received with kmail of KDE 3.1.5 also only use a single "0a" regardless of origin. Since this may be relevant for the "From" detection code I included the relevant part as a hexdump below. 01eb:f700 53 74 61 74 65 3a 20 20 0a 58 2d 4b 4d 61 69 6c State: .X-KMail 01eb:f710 2d 4d 44 4e 2d 53 65 6e 74 3a 20 0a 0a 46 72 6f -MDN-Sent: ..Fro 01eb:f720 6d 20 4d 41 49 4e 54 41 49 4e 45 52 53 3a 0d 0a m MAINTAINERS:.. 01eb:f730 0d 0a 53 4f 46 54 57 41 52 45 20 52 41 49 44 20 ..SOFTWARE RAID 01eb:f740 28 4d 75 6c 74 69 70 6c 65 20 44 69 73 6b 73 29 (Multiple Disks) 01eb:f750 20 53 55 50 50 4f 52 54 0d 0a 50 3a 20 20 20 20 SUPPORT..P: 01eb:f760 20 20 49 6e 67 6f 20 4d 6f 6c 6e 61 72 0d 0a 4d Ingo Molnar..M Andreas
Any occurrences of "^From_" inside a message have to be escaped. See for example http://www.qmail.org/qmail-manual-html/man5/mbox.html: "HOW A MESSAGE IS DELIVERED Here is how a program appends a message to an mbox file. It first creates a From_ line given the message's envelope sender and the current date. If the envelope sender is empty (i.e., if this is a bounce message), the program uses MAILER-DAEMON instead. If the envelope sender contains spaces, tabs, or newlines, the program replaces them with hyphens. The program then copies the message, applying >From quoting to each line. >From quoting ensures that the resulting lines are not From_ lines: the program prepends a > to any From_ line, >From_ line, >>From_ line, >>>From_ line, etc. Finally the program appends a blank line to the message. If the last line of the message was a partial line, it writes two newlines; otherwise it writes one. HOW A MESSAGE IS READ A reader scans through an mbox file looking for From_ lines. Any From_ line marks the beginning of a message. The reader should not attempt to take advantage of the fact that every From_ line (past the beginning of the file) is preceded by a blank line. Once the reader finds a message, it extracts a (possibly corrupted) envelope sender and delivery date out of the From_ line. It then reads until the next From_ line or end of file, whichever comes first. It strips off the final blank line and deletes the quoting of >From_ lines and >>From_ lines and so on. The result is an RFC 822 message." So KMail behaves exactly as it should.
Created attachment 4379 [details] Sample mbox file showing the problem.
If you look at kmfoldermbox.cpp you will see that kmail tries a conservative approach to this by using the regex #define MSG_SEPERATOR_REGEX "^From .*..:...*$" to check for the message seperator as an additional check ( after trying a simple "From " string match. As I have noticed after upgrading to the 3.2 kmail there seem to be quite a few broken mailers around (I have many more of this problematic messages from different senders). Since I have never noticed this before with the older kmail versions I think this is a regression. I am not a regex guru but would the changed line end handling perhaps cause more matches? ":...*$" might match ":\r\n" which were never used as line terminators in older versions of kmail. Thanks, Andreas PS: Is there a reason for not simply using "\n" like before. Some editors seem to have problems with a mixture of different EOL conventions in the same file.
Created attachment 4381 [details] Stricter sanity checking for mbox message seperators.
":...*$" does indeed match ":\r\n". The '\r\n' line ending problem (when downloading via POP3) will be fixed in KDE 3.2.1 (it's already fixed in cvs). OTOH, the stricter regexp you propose should be okay. We'll consider using it. BTW, did you experience the problem when you tried to import an mbox with the problematic message or did you simply download the problematic message via POP3?
Subject: Re: Message body lines starting with "From " are incorrectly parsed as message seperator in mbox folders POP3. I did not notice anything unusual for the few days I have been using the 3.2 kmail until today. Usually may mbox files only grow but today I did some housekeeping (deletion and moving) of messages and upon the next restart of kmail I started noticing problems (I have auto compaction enabled). I tried to solve that by deleting all of kmails index files but after kmail finshed rebuilding the index files many more problems appeared. PS: I have rebuild my kmail with the stricter regex and restored a few mbox files from my backup and now everything seem to be back to normal. Thanks, Andreas
Subject: Re: Message body lines starting with "From " are incorrectly parsed as message seperator in mbox folders On Tuesday 27 January 2004 18:32, Ingo Klöcker wrote: > BTW, did you experience the problem when you tried to import an mbox > with the problematic message or did you simply download the > problematic message via POP3? Sorry I am slow today, I just understood the reason for your question. I just verified by grepping through my mbox files that mails received with KDE 3.2 kmail via POP3 do NOT have the required ">From quoting". Mail received with older kmail versions are stored correctly. The same applies for mail received via /var/spool/mail (I checked the content before kmail collected it and the quoting was present at that time). This is the real bug. The regex patch just fixes the symptoms. Andreas
Created attachment 4434 [details] Proposed patch Actually use the result of the call to "escapeFrom( msgText )". Stricter sanity checking of "From " mbox seperator lines.
Doh! Thanks for catching this stupid bug. I've applied your patches. They will be in KDE 3.2.1.