Bug 88383 - mbox import displays many useless "duplicate subject" errors, making import of large mailboxes painful
Summary: mbox import displays many useless "duplicate subject" errors, making import o...
Status: RESOLVED DUPLICATE of bug 83311
Alias: None
Product: kmail
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Debian testing Linux
: NOR normal
Target Milestone: ---
Assignee: kdepim bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-08-29 15:19 UTC by Daniel Burrows
Modified: 2007-09-14 12:17 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Burrows 2004-08-29 15:19:38 UTC
Version:            (using KDE KDE 3.3.0)
Installed from:    Debian testing/unstable Packages
Compiler:          gcc 3.3 
OS:                Linux

The following is the (edited) text of Debian bug report #268748.  It doesn't appear that the Debian maintainer forwards bug reports upstream, so I'm reporting it here as well.



When importing several large mbox files into kmail via the import tool, I got this message dozens of times:

 "Duplicate message subject error when adding message to folder MBOX-debbugs in KMail"

  Now, last I checked, it was still legal to send and receive distinct messages with the same subject.  In fact, this is encouraged when sending messages to mailing lists.  So, if the dialog means what it says, I don't think it should appear at all.  If that's *not* what the dialog is talking about, which I think might be the case based on the fact that it parsed my mailing list folders without complaining, it needs to be rewritten so that it actually says what it means.  Perhaps it's complaining about duplicated messages, not subjects?

  Much worse, the dialog is displayed *every* *single* *time* this condition is encountered, and the program stops processing the current mailbox until the user clicks OK.  I don't think I have to spell out the problem with this, but I will anyway: when importing a mailbox that has a lot of duplicated messages, the dialog appears dozens of times, forcing the user to click on the message over and over.  The fact that the message seems to be utterly useless (it seems to be meaningless -- as I noted above, the obvious interpretation doesn't make sense -- and even if I try to force some sense out of it there's nothing I can do about it) just adds insult to injury.

If the message must be displayed, it should either be displayed in the
status area of the import window (the one that says "Importing messages
from blahblahblah..."), displayed as a pop-up dialog once and never
displayed again, or *at least* displayed with a button you can push to
prevent it from popping up again.
Comment 1 Vasco Pinheiro 2004-09-30 17:51:43 UTC
This is happening to me right now. The sugestion I give is to have a button that do an "Ok for all", or a checkbox that does "Ignore this warnings"/ "Do not show this warning again", or "Cancel Import"...

 I'm importing about 200MB of Emails and I have clicked lots of times!

Comment 2 lexual 2004-11-18 05:45:57 UTC
Same problem as above. If you import a mbox with a number of duplicate emails you can do nothing but sit there for a long time hitting <enter>.
Comment 3 Pascal Niklaus 2005-01-28 09:11:55 UTC
This is so fundamental that it actually prevents me from migrating from Mozilla. I need to import a 500MB mbox file, and there is no way I'll just sit there and click "OK". 

Can anyone explain what this warning is about? What is the sense of it? Couldn't it just be removed?
Comment 4 Pascal Niklaus 2005-01-28 15:19:20 UTC
After a bit of code viewing:

The error occurs when both the *subject* AND the *time* a message was sent are equal. In this case, the second message is discarded. 

kmkernel.cpp
   int KMKernel::dcopAddMessage(const QString & foldername,const KURL & msgUrl)
      around line 686 (return code -4)

Most cases occurred when I got e-mails where I was twice on the list of recipients: once in the To: field and once Cc:, so the loss of the second message is not terrible.

However, there still were two messages in the original mailbox and I feel they should stay two after importing.

My concern is, however, couldn't there be two messages with the same subject and send time that are *not* equal? I can think of the following scenario:

Someone sends a message to a bunch of people. Then this person sends a personal comment to one of the persons (you) and does this by re-editing the original message. Not lets assume these messages are sent at the same time, for example by queuing them in a mail program and then sending them at exactly the same time. Now, the second message would be lost... I guess this is not as unlikely as it sounds.

Or think of automatically created mails, e.g. from cron jobs or whatever. There may well be two messages send with the same subject and time, but different content.

A fix would be to compare the actual message bodies in case subject and time are equal, something like changing

      if ( k == -1 ) {

to something like this (im kmkernel.cpp):

      if ( k == -1 || (msg->___body___ == msgbase[k]->___body___) ) {

don't know how exactly this is done best (msg->asDwString() ???)

Comment 5 Pascal Niklaus 2005-01-28 15:20:40 UTC
Ooops, should be

if ( k == -1 || (msg->___body___ != msgbase[k]->___body___) ) { 
 
Comment 6 Danny Kukawka 2005-02-03 10:53:26 UTC

*** This bug has been marked as a duplicate of 83311 ***
Comment 7 Steve 2005-02-22 22:32:54 UTC
As of today this bug is marked resolved. An upgrade of Kmail using the current testing branch does not repair the problem. I still get the error when trying to import from pegasus.

Kmail 1.7.1
Linux 2.4.27--k7
compiler gcc  (Debian :3.3.5-6)

I have googled, posted in the newsgroups and on linuxquestions.com. It was suggested I post here as it may not be resolved. Sorry if this is the wrong way to go about things.
Comment 8 Danny Kukawka 2005-02-23 00:18:24 UTC
This Bug is fixed in KDE 3.4Beta2 and in the current CVS (fixed at: Tue Feb 1 02:12:55 2005 UTC)! Please test this with Beta2 or CVS snaphot.

note: we use now the Message-ID or if there is no Message-ID the subject line + dateStr. This should be unique in nearly 100%.

see the code comment:
// NEW COMMENT from Danny Kukawka (danny.kukawka@web.de):
// subject line + the date is only unique if the following
// return a correct unique value:
//      time_t  DT = mb->date();
//      QString dt = ctime(&DT);
// But if the datestring in the Header isn't RFC conform
// subject line + the date isn't unique.
//
// The only uique headerfield is the Message-ID. In some
// cases this could be empty. I then I use the
// subject line + dateStr .

If you have this problem anyway, please tell me where i can get a mbox-file for testing.
Comment 9 Till Adam 2005-02-23 08:01:41 UTC
On Wednesday 23 February 2005 00:18, Danny Kukawka wrote:

> note: we use now the Message-ID or if there is no Message-ID the subject
> line + dateStr. This should be unique in nearly 100%.

Just thought I'd mention that this breaks with duplicates, which are quite 
common in the real world. I've had quite a few fun evenings tracking down 
threading problems with duplicate messages.

Till

Comment 10 Danny Kukawka 2005-02-23 12:11:08 UTC
But you have since this commit/fix two ways to import mails to kmail. 
1.) remove duplicate messages
2.) import without remove/search for duplicate messages

If you want import all messages from your mbox to kmail use the second way (this is default). If you want to remove duplicates you must toggle the checkbox in the dialog and you maybe lost some messages (and you get a information how many messages wasn't imported.).

Solve this not your problem? (I think it is not a good idea to check the mail body because of performance.)