Bug 113100 - Incorrect mime headers in attachment of files with russian names
Summary: Incorrect mime headers in attachment of files with russian names
Status: RESOLVED FIXED
Alias: None
Product: kmail
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Platform: Slackware Packages Linux
: NOR normal (vote)
Target Milestone: ---
Assignee: kdepim bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-22 22:43 UTC by Oleg
Modified: 2007-09-14 12:17 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg 2005-09-22 22:43:07 UTC
Version:            (using KDE KDE 3.4.2)
Installed from:    Slackware Packages
OS:                Linux

I use KDE 3.4.2 and kmail 1.8.2
Russian koi8-r locale.

When I attach file with russian characters in file-name, I get such mime headers:

 Content-Type: application/msword;
  name*=koi8-r''%F0%D2%C1%D7%C9%CC%C1%20%CB%C1%CE%C1%CC%C1%2019%2E09%2Edoc
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment;
      filename*="koi8-r''%F0%D2%C1%D7%C9%CC%C1%20%CB%C1%CE%C1%CC%C1%2019%2E09%2Edoc"

But when I send this file by Thunderbird, mime headers of attachment are:

 Content-Type: application/octet-stream;
 name="=?KOI8-R?Q?=F0=D2=C1=D7=C9=CC=C1_=CB=C1=CE=C1=CC=C1_19=2E09=2Edoc?="
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; 
filename="=?KOI8-R?Q?=F0=D2=C1=D7=C9=CC=C1_=CB=C1=CE=C1=CC=C1_19=2E09=2Edoc?="

Are keywords name*= and filename*= correct ?
I think no.

--
Oleg
Comment 1 Reinhold Kainhofer 2006-12-21 01:15:12 UTC
Yes, name*= and filename*= are extensions to the name= and filename= keywords. These extensions are defined in RFC 2231. 
Until KDE 3.5.5, kmail does not properly handle these name* and filename* keywords (which are e.g. produced by thunderbird). I have a patch here, which makes kmail behave correct with those file names, but I first have to get clearance from the kmail maintainers to apply it for kde 3.5.6.

Cheers,
Reinhold
Comment 2 Reinhold Kainhofer 2007-01-10 16:07:07 UTC
SVN commit 622052 by kainhofe:

Make RFC 2231-encoded attachment names work. Patch approved by Ingo (the issues he had were corrected).

RFC 2231 defines an enhanced encoding for attachment filenames, and thunderbird
apparently implemented this encoding. RFC 2231 allows one field to be split 
across multiple numbered entries of the form fieldname*0=....; 
fieldname*1=...; fieldname*2=...; or fieldname*0=....; fieldname*1=...; fieldname*2=...; 
All these entries first need to be concatenated to form the full value of the field. 

Here's a real-life example:

--------------060807060608070200030605
Content-Type: application/vnd.ms-excel;
 name*0*=ISO-8859-15''%41%46%42%D6%20%42%65%73%65%74%7A%75%6E%67%73%6C%69;
 name*1*=%73%74%65%20%53%74%61%6E%64%20%32%30%30%36%2D%31%32%2D%31%39%2E;
 name*2*=%78%6C%73
 Content-Transfer-Encoding: base64
 Content-Disposition: inline;
 filename*0*=ISO-8859-15''%41%46%42%D6%20%42%65%73%65%74%7A%75%6E%67%73%6C;
 filename*1*=%69%73%74%65%20%53%74%61%6E%64%20%32%30%30%36%2D%31%32%2D%31;
 filename*2*=%39%2E%78%6C%73


As a result, KMail shows %39%2E%78%6C%73 as the file name in both the message 
preview panel as well as in the mime tree. 

With this patch, KMail correctly shows the proper filename.
The patch adds one static method to collect all parts of rfc 2231-encoded 
params into one single string. That method is then used in two different 
places for the name and the filename props.



One minor problem remains, though: As the mime library does not have support 
for rfc2231 encoded attachments, the message is not shown with the attachment 
icon in the message list. 


BUG:108091
BUG:113100




 M  +10 -2     kmmessage.cpp  
 M  +36 -0     kmmsgbase.cpp  
 M  +5 -0      kmmsgbase.h  
 M  +30 -27    kmmsgpart.cpp  


--- branches/KDE/3.5/kdepim/kmail/kmmessage.cpp #622051:622052
@@ -2866,6 +2866,15 @@
 
 void applyHeadersToMessagePart( DwHeaders& headers, KMMessagePart* aPart )
 {
+  // TODO: Instead of manually implementing RFC2231 header encoding (i.e. 
+  //       possibly multiple values given as paramname*0=..; parmaname*1=..;... 
+  //       or par as paramname*0*=..; parmaname*1*=..;..., which should be
+  //       concatenated), use a generic method to decode the header, using RFC
+  //       2047 or 2231, or whatever future RFC might be appropriate!
+  //       Right now, some fields are decoded, while others are not. E.g.
+  //       Content-Disposition is not decoded here, rather only on demand in
+  //       KMMsgPart::fileName; Name however is decoded here and stored as a 
+  //       decoded String in KMMsgPart...
   // Content-type
   QCString additionalCTypeParams;
   if (headers.HasContentType())
@@ -2880,8 +2889,7 @@
       if (!qstricmp(param->Attribute().c_str(), "charset"))
         aPart->setCharset(QCString(param->Value().c_str()).lower());
       else if (!qstrnicmp(param->Attribute().c_str(), "name*", 5))
-        aPart->setName(KMMsgBase::decodeRFC2231String(
-              param->Value().c_str()));
+        aPart->setName(KMMsgBase::decodeRFC2231String(KMMsgBase::extractRFC2231HeaderField( param->Value().c_str(), "name" )));
       else {
         additionalCTypeParams += ';';
         additionalCTypeParams += param->AsString().c_str();
--- branches/KDE/3.5/kdepim/kmail/kmmsgbase.cpp #622051:622052
@@ -948,6 +948,42 @@
   return codec->toUnicode( st );
 }
 
+QCString KMMsgBase::extractRFC2231HeaderField( const QCString &aStr, const QCString &field )
+{
+  int n=-1;
+  QCString str;
+  bool found = false;
+  while ( n<=0 || found ) {
+    QString pattern( field );
+    pattern += "[*]"; // match a literal * after the fieldname, as defined by RFC 2231
+    if ( n>=0 ) { // If n<0, check for fieldname*=..., otherwise for fieldname*n=
+      pattern += QString::number(n) + "[*]?";
+    }
+    pattern += "=";
+    
+    QRegExp fnamePart( pattern, FALSE );
+    int startPart = fnamePart.search( aStr );
+    int endPart;
+    found = ( startPart >= 0 );
+    if ( found ) {
+      startPart += fnamePart.matchedLength();
+      // Quoted values end at the ending quote
+      if ( aStr[startPart] == '"' ) {
+        startPart++; // the double quote isn't part of the filename
+        endPart = aStr.find('"', startPart) - 1;
+      }
+      else {
+        endPart = aStr.find(';', startPart) - 1;
+      }
+      if (endPart < 0)
+        endPart = 32767;
+      str += aStr.mid( startPart, endPart-startPart+1).stripWhiteSpace();
+    }
+    n++;
+  }
+  return str;
+}
+
 QString KMMsgBase::base64EncodedMD5( const QString & s, bool utf8 ) {
   if (s.stripWhiteSpace().isEmpty()) return "";
   if ( utf8 )
--- branches/KDE/3.5/kdepim/kmail/kmmsgbase.h #622051:622052
@@ -355,6 +355,11 @@
 
   /** Decode given string as described in RFC2231 */
   static QString decodeRFC2231String(const QCString& aStr);
+  /** Extract a given param from the RFC2231-encoded header field, in particular
+      concatenate possibly multiple entries, which are given as paramname*0=..;
+      paramname*1=..; ... or paramname*0*=..; paramname*1*=..; ... and return 
+      their value as one string. That string will still be encoded */
+  static QCString extractRFC2231HeaderField( const QCString &aStr, const QCString &field );
 
   /** Calculate the base64 encoded md5sum (sans the trailing equal
       signs). If @p utf8 is false, uses QString::latin1() to calculate
--- branches/KDE/3.5/kdepim/kmail/kmmsgpart.cpp #622051:622052
@@ -504,41 +504,44 @@
 //-----------------------------------------------------------------------------
 QString KMMessagePart::fileName(void) const
 {
-  bool bRFC2231encoded = false;
-
-  // search the start of the filename
-  int startOfFilename = mContentDisposition.find("filename*=", 0, FALSE);
-  if (startOfFilename >= 0) {
-    bRFC2231encoded = true;
-    startOfFilename += 10;
-  }
-  else {
-    startOfFilename = mContentDisposition.find("filename=", 0, FALSE);
+  QCString str;
+  
+  // Allow for multiple filname*0, filename*1, ... params (defined by RFC 2231) 
+  // in the Content-Disposision
+  if ( mContentDisposition.contains( "filename*", FALSE ) ) {
+  
+    // It's RFC 2231 encoded, so extract the file name with the 2231 method
+    str = KMMsgBase::extractRFC2231HeaderField( mContentDisposition, "filename" );
+    return KMMsgBase::decodeRFC2231String(str);
+  
+  } else {
+    
+    // Standard RFC 2047-encoded
+    // search the start of the filename
+    int startOfFilename = mContentDisposition.find("filename=", 0, FALSE);
     if (startOfFilename < 0)
       return QString::null;
     startOfFilename += 9;
-  }
 
-  // search the end of the filename
-  int endOfFilename;
-  if ( '"' == mContentDisposition[startOfFilename] ) {
-    startOfFilename++; // the double quote isn't part of the filename
-    endOfFilename = mContentDisposition.find('"', startOfFilename) - 1;
-  }
-  else {
-    endOfFilename = mContentDisposition.find(';', startOfFilename) - 1;
-  }
-  if (endOfFilename < 0)
-    endOfFilename = 32767;
+    // search the end of the filename
+    int endOfFilename;
+    if ( '"' == mContentDisposition[startOfFilename] ) {
+      startOfFilename++; // the double quote isn't part of the filename
+      endOfFilename = mContentDisposition.find('"', startOfFilename) - 1;
+    }
+    else {
+      endOfFilename = mContentDisposition.find(';', startOfFilename) - 1;
+    }
+    if (endOfFilename < 0)
+      endOfFilename = 32767;
 
-  const QCString str = mContentDisposition.mid(startOfFilename,
+    const QCString str = mContentDisposition.mid(startOfFilename,
                                 endOfFilename-startOfFilename+1)
                            .stripWhiteSpace();
-
-  if (bRFC2231encoded)
-    return KMMsgBase::decodeRFC2231String(str);
-  else
     return KMMsgBase::decodeRFC2047String(str, charset());
+  }
+  
+  return QString::null;
 }