98790 – import of vcard 2.1 with CHARSET set to UTF-8 are garbled

Bug 98790 - import of vcard 2.1 with CHARSET set to UTF-8 are garbled

Summary: import of vcard 2.1 with CHARSET set to UTF-8 are garbled

Status:	RESOLVED UNMAINTAINED

Alias:	None

Product:	kab3
Classification:	Unmaintained
Component:	general (show other bugs)
Version:	unspecified
Platform:	Gentoo Packages Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Tobias Koenig

URL:
Keywords:

Duplicates (1):	98165 (view as bug list)
Depends on:
Blocks:

Reported:	2005-02-07 18:06 UTC by Urs Roesch
Modified:	2009-08-05 16:22 UTC (History)
CC List:	4 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Sample Japanese vCard in UTF-8 (266 bytes, application/octet-stream) 2005-02-07 18:17 UTC, Urs Roesch	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Urs Roesch 2005-02-07 18:06:08 UTC

Version:            (using KDE KDE 3.3.2)
Installed from:    Gentoo Packages
Compiler:          gcc 3.3.4 20040623 
OS:                Linux

Sending a contact as vCard file (version 2.1) via Bluetooth from a Nokia 6630 (actual Vodafone 702NK) to target host and importing the file into kaddressbook via File -> Import -> Import vCard garbles the Japanese characters and renders/saves them as "?" (questionmarks) in the kabc file.

To confirm data corruption is not occuring during Bluetooth transmission the file was sent back to the phone in various other Japanese encodings like EUC-JP, SJIS and ISO-2022-JP. None did have data corruption issues.

Sample vCard file used for import:
 
BEGIN:VCARD
VERSION:2.1
N;CHARSET=UTF-8:æº~Lå~I~L;å~E~Cæ°~W
SOUND;X-IRMC-N;CHARSET=UTF-8:ï¾~Jï¾~Bï¾~Wï¾~B;ï½¹ï¾~^ï¾~]ï½·
TEL;VOICE:0000000000
TEL;CELL:000000000
EMAIL:unknown@no-domain.jp
TITLE;CHARSET=UTF-8:å~E~Cæ°~Wç~Y¾å~@~M
ORG;CHARSET=UTF-8:æº~Lå~I~Læ| ªå¼~Oä¼~Zç¤¾
END:VCARD

Is being transformed to the below in std.vcf:

BEGIN:VCARD
EMAIL:unknown@no-domain.jp
N:??;??;;;
ORG:??????
TEL;TYPE=VOICE:0000000000
TEL;TYPE=CELL:000000000
TITLE:????
UID:xVPHXC6HKV
VERSION:3.0
END:VCARD

Comment 1 Urs Roesch 2005-02-07 18:17:31 UTC

Created attachment 9471 [details]
Sample Japanese vCard in UTF-8

The copy & past example in the bug description will not work correctly use the
attached file for testing confirmation instead.

Comment 2 Anatoly Ershov 2005-02-25 20:11:28 UTC

Same with my kaddressbook
Kaddressbook Version: 3.3.1 (KDE 3.3.2, (3.1))
Operating System: Linux (i686) release 2.6.10-p4-skas3-v7

Namely, vCard v2.1 received from Siemens cx65 with UTF8-encoded Russian characters gets imported as containing only '?' signs in all such text fields *except* for FN. In a dialog it's viewed fine in the preview pane (the text viewed represents what becomes that same FN later).

Anatoly Ershov
ershov <> nice.ru

Comment 3 Sebastian Piechocki 2005-05-17 23:30:55 UTC

Same with my kaddressbook
Kaddressbook Version: 3.3.2 (KDE 3.3.2)
Operating System: Linux (i686) 2.6.11.6 (Debian testing)

Importing vCard v2.1 from Siemens SX1 makes in imported contact some japanese characters. My source vCard file is UTF-8 encoded. I am using polish language.

Comment 4 Bruno Gabuzomeu 2005-08-27 11:31:29 UTC

The same happens when importing v3.0 cards:
the special characters are corrupted importing from a local *.vcf file, no matter if the file is in UTF-8 or ISO-8859-2 charset.

Nevertheless, adding the *.vcf file as an additional addressbook solves the problem: it correctly import the special characters, no matter if the file is in UTF-8 or ISO-8859-2 charset!

Comment 5 dexen 2006-09-06 18:01:16 UTC

Same for me, when loading vCard over GroupDav, with fields in charset:UTF-8.

AFAI can trace this, it's result of patch from http://bugs.kde.org/show_bug.cgi?id=72380

Two calls to ``QString::fromUtf8( value.ascii() )'' seems to be superfluous. It appears to me like the data is getting de-coded (through some UTF8-to-internal-SQtring-format) twice. Or, in case it's UTF8 that makes internal format of QString, once. Anyway, one time too much.

Reason of my belief: when i used data UTF8-encoded, but removed the ``charset:utf-8'' bit, it shows up just OK.

Comment 6 Jiri Navratil 2007-07-22 12:31:39 UTC

My troubles are related to this bug.

KAddressBook 3.5.6 is using UTF-8 for vCards. Unfortunately the option ;CHARSET=UTF-8 is not stored in relevant fields. This is not in line with vCard 2.1 specification but mainly, this not allow sharing vCards with UTF-8 chars with other applications / devices, as these characters are after transfer wrong.

Solution is to use ;CHARSET=UTF-8 option for storing fields with UTF-8 encoding and to read correctly fields with ;CHARSET=UTF-8

Comment 7 Philipp Sternberg 2007-11-05 14:26:51 UTC

yeah, can confirm this  for 3.5.7

Comment 8 Martin Koller 2008-01-05 12:40:59 UTC

*** Bug 98165 has been marked as a duplicate of this bug. ***

Comment 9 Martin Koller 2008-01-06 23:02:25 UTC

SVN commit 758100 by mkoller:

BUG: 98790
Correctly handle CHARSET attribute in vcard parsing
and handle given string as plain C-byte array


 M  +2 -2      vcardformatplugin.cpp  
 M  +1 -1      vcardparser/testread.cpp  
 M  +25 -11    vcardparser/vcardparser.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=758100

Comment 10 Tobias Koenig 2009-08-05 16:22:30 UTC

The development of the old KAddressBook will be discontinued for KDE 4.4.
Since the new application has the same name, but a completly new code base we close all bug reports against the old version and ask the submitters to resend there reports against the new product.