Bug 98790

Summary: import of vcard 2.1 with CHARSET set to UTF-8 are garbled
Product: [Unmaintained] kab3 Reporter: Urs Roesch <urs>
Component: generalAssignee: Tobias Koenig <tokoe>
Status: RESOLVED UNMAINTAINED    
Severity: normal CC: guitar, michal, tuju, wstephenson
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Sample Japanese vCard in UTF-8

Description Urs Roesch 2005-02-07 18:06:08 UTC
Version:            (using KDE KDE 3.3.2)
Installed from:    Gentoo Packages
Compiler:          gcc 3.3.4 20040623 
OS:                Linux

Sending a contact as vCard file (version 2.1) via Bluetooth from a Nokia 6630 (actual Vodafone 702NK) to target host and importing the file into kaddressbook via File -> Import -> Import vCard garbles the Japanese characters and renders/saves them as "?" (questionmarks) in the kabc file.

To confirm data corruption is not occuring during Bluetooth transmission the file was sent back to the phone in various other Japanese encodings like EUC-JP, SJIS and ISO-2022-JP. None did have data corruption issues.

Sample vCard file used for import:
 
BEGIN:VCARD
VERSION:2.1
N;CHARSET=UTF-8:æº~Lå~I~L;å~E~Cæ°~W
SOUND;X-IRMC-N;CHARSET=UTF-8:ï¾~Jï¾~Bï¾~Wï¾~B;ï½¹ï¾~^ï¾~]ï½·
TEL;VOICE:0000000000
TEL;CELL:000000000
EMAIL:unknown@no-domain.jp
TITLE;CHARSET=UTF-8:å~E~Cæ°~Wç~Y¾å~@~M
ORG;CHARSET=UTF-8:æº~Lå~I~Læ| ªå¼~Oä¼~Z社
END:VCARD

Is being transformed to the below in std.vcf:

BEGIN:VCARD
EMAIL:unknown@no-domain.jp
N:??;??;;;
ORG:??????
TEL;TYPE=VOICE:0000000000
TEL;TYPE=CELL:000000000
TITLE:????
UID:xVPHXC6HKV
VERSION:3.0
END:VCARD
Comment 1 Urs Roesch 2005-02-07 18:17:31 UTC
Created attachment 9471 [details]
Sample Japanese vCard in UTF-8

The copy & past example in the bug description will not work correctly use the
attached file for testing confirmation instead.
Comment 2 Anatoly Ershov 2005-02-25 20:11:28 UTC
Same with my kaddressbook
Kaddressbook Version: 3.3.1 (KDE 3.3.2, (3.1))
Operating System: Linux (i686) release 2.6.10-p4-skas3-v7

Namely, vCard v2.1 received from Siemens cx65 with UTF8-encoded Russian characters gets imported as containing only '?' signs in all such text fields *except* for FN. In a dialog it's viewed fine in the preview pane (the text viewed represents what becomes that same FN later).

Anatoly Ershov
ershov <> nice.ru
Comment 3 Sebastian Piechocki 2005-05-17 23:30:55 UTC
Same with my kaddressbook
Kaddressbook Version: 3.3.2 (KDE 3.3.2)
Operating System: Linux (i686) 2.6.11.6 (Debian testing)

Importing vCard v2.1 from Siemens SX1 makes in imported contact some japanese characters. My source vCard file is UTF-8 encoded. I am using polish language.
Comment 4 Bruno Gabuzomeu 2005-08-27 11:31:29 UTC
The same happens when importing v3.0 cards:
the special characters are corrupted importing from a local *.vcf file, no matter if the file is in UTF-8 or ISO-8859-2 charset.

Nevertheless, adding the *.vcf file as an additional addressbook solves the problem: it correctly import the special characters, no matter if the file is in UTF-8 or ISO-8859-2 charset!
Comment 5 dexen 2006-09-06 18:01:16 UTC
Same for me, when loading vCard over GroupDav, with fields in charset:UTF-8.

AFAI can trace this, it's result of patch from http://bugs.kde.org/show_bug.cgi?id=72380

Two calls to ``QString::fromUtf8( value.ascii() )'' seems to be superfluous. It appears to me like the data is getting de-coded (through some UTF8-to-internal-SQtring-format) twice. Or, in case it's UTF8 that makes internal format of QString, once. Anyway, one time too much.

Reason of my belief: when i used data UTF8-encoded, but removed the ``charset:utf-8'' bit, it shows up just OK.
Comment 6 Jiri Navratil 2007-07-22 12:31:39 UTC
My troubles are related to this bug.

KAddressBook 3.5.6 is using UTF-8 for vCards. Unfortunately the option ;CHARSET=UTF-8 is not stored in relevant fields. This is not in line with vCard 2.1 specification but mainly, this not allow sharing vCards with UTF-8 chars with other applications / devices, as these characters are after transfer wrong.

Solution is to use ;CHARSET=UTF-8 option for storing fields with UTF-8 encoding and to read correctly fields with ;CHARSET=UTF-8 
Comment 7 Philipp Sternberg 2007-11-05 14:26:51 UTC
yeah, can confirm this  for 3.5.7
Comment 8 Martin Koller 2008-01-05 12:40:59 UTC
*** Bug 98165 has been marked as a duplicate of this bug. ***
Comment 9 Martin Koller 2008-01-06 23:02:25 UTC
SVN commit 758100 by mkoller:

BUG: 98790
Correctly handle CHARSET attribute in vcard parsing
and handle given string as plain C-byte array


 M  +2 -2      vcardformatplugin.cpp  
 M  +1 -1      vcardparser/testread.cpp  
 M  +25 -11    vcardparser/vcardparser.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=758100
Comment 10 Tobias Koenig 2009-08-05 16:22:30 UTC
The development of the old KAddressBook will be discontinued for KDE 4.4.
Since the new application has the same name, but a completly new code base we close all bug reports against the old version and ask the submitters to resend there reports against the new product.