Summary: | mbox import displays many useless "duplicate subject" errors, making import of large mailboxes painful | ||
---|---|---|---|
Product: | [Applications] kmail | Reporter: | Daniel Burrows <dburrows> |
Component: | general | Assignee: | kdepim bugs <kdepim-bugs> |
Status: | RESOLVED DUPLICATE | ||
Severity: | normal | ||
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Debian testing | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: |
Description
Daniel Burrows
2004-08-29 15:19:38 UTC
This is happening to me right now. The sugestion I give is to have a button that do an "Ok for all", or a checkbox that does "Ignore this warnings"/ "Do not show this warning again", or "Cancel Import"... I'm importing about 200MB of Emails and I have clicked lots of times! Same problem as above. If you import a mbox with a number of duplicate emails you can do nothing but sit there for a long time hitting <enter>. This is so fundamental that it actually prevents me from migrating from Mozilla. I need to import a 500MB mbox file, and there is no way I'll just sit there and click "OK". Can anyone explain what this warning is about? What is the sense of it? Couldn't it just be removed? After a bit of code viewing: The error occurs when both the *subject* AND the *time* a message was sent are equal. In this case, the second message is discarded. kmkernel.cpp int KMKernel::dcopAddMessage(const QString & foldername,const KURL & msgUrl) around line 686 (return code -4) Most cases occurred when I got e-mails where I was twice on the list of recipients: once in the To: field and once Cc:, so the loss of the second message is not terrible. However, there still were two messages in the original mailbox and I feel they should stay two after importing. My concern is, however, couldn't there be two messages with the same subject and send time that are *not* equal? I can think of the following scenario: Someone sends a message to a bunch of people. Then this person sends a personal comment to one of the persons (you) and does this by re-editing the original message. Not lets assume these messages are sent at the same time, for example by queuing them in a mail program and then sending them at exactly the same time. Now, the second message would be lost... I guess this is not as unlikely as it sounds. Or think of automatically created mails, e.g. from cron jobs or whatever. There may well be two messages send with the same subject and time, but different content. A fix would be to compare the actual message bodies in case subject and time are equal, something like changing if ( k == -1 ) { to something like this (im kmkernel.cpp): if ( k == -1 || (msg->___body___ == msgbase[k]->___body___) ) { don't know how exactly this is done best (msg->asDwString() ???) Ooops, should be if ( k == -1 || (msg->___body___ != msgbase[k]->___body___) ) { *** This bug has been marked as a duplicate of 83311 *** As of today this bug is marked resolved. An upgrade of Kmail using the current testing branch does not repair the problem. I still get the error when trying to import from pegasus. Kmail 1.7.1 Linux 2.4.27--k7 compiler gcc (Debian :3.3.5-6) I have googled, posted in the newsgroups and on linuxquestions.com. It was suggested I post here as it may not be resolved. Sorry if this is the wrong way to go about things. This Bug is fixed in KDE 3.4Beta2 and in the current CVS (fixed at: Tue Feb 1 02:12:55 2005 UTC)! Please test this with Beta2 or CVS snaphot. note: we use now the Message-ID or if there is no Message-ID the subject line + dateStr. This should be unique in nearly 100%. see the code comment: // NEW COMMENT from Danny Kukawka (danny.kukawka@web.de): // subject line + the date is only unique if the following // return a correct unique value: // time_t DT = mb->date(); // QString dt = ctime(&DT); // But if the datestring in the Header isn't RFC conform // subject line + the date isn't unique. // // The only uique headerfield is the Message-ID. In some // cases this could be empty. I then I use the // subject line + dateStr . If you have this problem anyway, please tell me where i can get a mbox-file for testing. On Wednesday 23 February 2005 00:18, Danny Kukawka wrote:
> note: we use now the Message-ID or if there is no Message-ID the subject
> line + dateStr. This should be unique in nearly 100%.
Just thought I'd mention that this breaks with duplicates, which are quite
common in the real world. I've had quite a few fun evenings tracking down
threading problems with duplicate messages.
Till
But you have since this commit/fix two ways to import mails to kmail. 1.) remove duplicate messages 2.) import without remove/search for duplicate messages If you want import all messages from your mbox to kmail use the second way (this is default). If you want to remove duplicates you must toggle the checkbox in the dialog and you maybe lost some messages (and you get a information how many messages wasn't imported.). Solve this not your problem? (I think it is not a good idea to check the mail body because of performance.) |