271859 – vCard logical lines folded in the middle of utf-8 character

Bug 271859 - vCard logical lines folded in the middle of utf-8 character

Summary: vCard logical lines folded in the middle of utf-8 character

Status:	RESOLVED FIXED

Alias:	None

Product:	kdepimlibs
Classification:	Applications
Component:	kabc (show other bugs)
Version:	4.5
Platform:	Ubuntu Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	kdepim bugs

URL:
Keywords:

Duplicates (1):	320196 (view as bug list)
Depends on:
Blocks:

Reported:	2011-04-27 18:11 UTC by Andrey Bondarenko
Modified:	2014-03-22 11:02 UTC (History)
CC List:	3 users (show)

See Also:
Latest Commit:	http://commits.kde.org/kdepimlibs/63bbded8f55f2c539e0ec5942b362cd26fc77a46
Version Fixed In:	4.13

Attachments
a test vcard (properly folded) (260 bytes, text/plain) 2011-04-27 18:11 UTC, Andrey Bondarenko	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Andrey Bondarenko 2011-04-27 18:11:29 UTC

Created attachment 59372 [details]
a test vcard (properly folded)

Version:           4.5 (using KDE 4.5.5) 
OS:                Linux

If a contact information saved as a VCard 3.0 (RFC 2425 / 2426) file with multibyte encoding (utf-8 in my case) it is possible to have lines folded in the middle of multibyte character. Some bytes of a character left on one line, the rest are put on the next line. This makes difficult to read saved vcard with other libraries (python-vobject for example).

I suppose this is a bug because RFC 2425 states following in "5.8.1. Line delimiting and folding":

  A logical line MAY be continued on the next physical line anywhere
   between two characters by inserting a CRLF immediately

So splitting a line in the middle of the character is not compatible with RFC 2425 requirements.

The cause of the bug is in kabc/vcardparser/vcardparser.cpp, function createVCard lines 291-301 (git revision 5ca796151e8fbf0e8b84574c9640a77af49c2c50). Folding is done on a byte level, after unicode characters are encoded as bytes.



Reproducible: Always

Steps to Reproduce:
Set locale to utf-8 based. Create contact in kaddressbook with long note (70+ characters) in some language whose characters are encoded into multiple bytes. Export contact as vcard 3.0. Try to load a vcard in some other program.

Actual Results:  
a Vcard exported by kaddressbook (look for broken characters):

BEGIN:VCARD
FN:test
N:test;;;;
NOTE:Длинный комментарий на русском языке\,
  чтобы в результате получилась строка бо�
 �ее 70 символов.
UID:MeYEG83HLw
VERSION:3.0
END:VCARD


Expected Results:  
a properly folded vcard:

BEGIN:VCARD
FN:test
N:test;;;;
NOTE:Длинный комментарий на русском яз
 ыке\, чтобы в результате получилась с
 трока более 70 символов.
UID:MeYEG83HLw
VERSION:3.0
END:VCARD

Comment 1 Martin Koller 2014-03-20 19:04:48 UTC

*** Bug 320196 has been marked as a duplicate of this bug. ***

Comment 2 Martin Koller 2014-03-22 11:02:52 UTC

Git commit 63bbded8f55f2c539e0ec5942b362cd26fc77a46 by Martin Koller.
Committed on 22/03/2014 at 10:59.
Pushed by mkoller into branch 'KDE/4.13'.

avoid splitting UTF-8 encoded character in the middle of encoded bytes

The file format spec in RFC 6350 says:
http://tools.ietf.org/html/rfc6350#section-3.2
Line Delimiting and Folding
"Multi-octet characters MUST remain contiguous."
This patch avoids splitting an UTF-8 encoded character in the middle
of the encoded bytes
FIXED-IN: 4.13
REVIEW: 116933

M  +2    -0    kabc/vcardparser/testroundtrip.qrc
A  +14   -0    kabc/vcardparser/tests/vcard9.vcf
A  +14   -0    kabc/vcardparser/tests/vcard9.vcf.ref
M  +41   -3    kabc/vcardparser/vcardparser.cpp

http://commits.kde.org/kdepimlibs/63bbded8f55f2c539e0ec5942b362cd26fc77a46