Bug 266141 - KTar class have broken UTF-8 support for multibyte characters (longlink)
Summary: KTar class have broken UTF-8 support for multibyte characters (longlink)
Status: RESOLVED FIXED
Alias: None
Product: kio
Classification: Frameworks and Libraries
Component: tar (show other bugs)
Version: 4.4
Platform: Debian testing Linux
: NOR normal
Target Milestone: ---
Assignee: Mario Bensi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-02-12 16:43 UTC by Ibragimov Rinat
Modified: 2018-08-28 21:50 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
fix for KTar's handling of UTF-8 multibyte file names (1.58 KB, patch)
2011-02-12 16:43 UTC, Ibragimov Rinat
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ibragimov Rinat 2011-02-12 16:43:35 UTC
Created attachment 57190 [details]
fix for KTar's handling of UTF-8 multibyte file names

Version:           4.4 (using KDE 4.4.5) 
OS:                Linux

I tried to create tar by KBackup program and found
truncated names of my files in .tar. KBackup uses
KTar class for writing tar files and that class
have broken UTF-8 support, it seems.

First, tar archives have to use "longlink trick" to 
store names longer than 100 bytes. KTar class has 
functions implementing longlink, but they check name 
length in _characters_, not in bytes. For non-ASCII 
characters in UTF-8 length of string in bytes and 
length in characters do not match. In my case file 
had character-length less than 100 and byte-length 
greater than 100, so name simply truncated. Such 
behavior can be observed on non-ASCII UTF-8 or any 
other multibyte encoding. If file name is very long,
resulting .tar may become unreadable.

Second, calculation of 'chksum' field of tar header also
broken: 'buffer' array defined as char, a signed number,
while in tar sources chksum obtained as sum of unsigned
values (actually there is the trick for (unsigned char)
emulate, converting to integer and then logical and with
0xFF). May be bad checksum was reason for unreadable .tar.

Reproducible: Always

Steps to Reproduce:
1. compile and run

// gcc -o qw qw.cpp -I/usr/include/qt4 -lQtCore -lkio
#include <ktar.h>
#include <stdio.h>

int main(void) {
   
    KTar archive(QString("testfile.tar"), "application/x-tar");
    QString	filename=QString::fromUtf8("Раз два три четыре пять, вышел зайчик погу.txt");
    archive.open(QIODevice::WriteOnly);
    archive.prepareWriting(filename, QString("user"), QString("group"), 0);
    archive.finishWriting(0);
    archive.close();
    return 0;
}

2. try to run `tar tf testfile.tar'.


Actual Results:  
tar complains about wrong headers

Expected Results:  
tar should display name of file:
Раз два три четыре пять, вышел зайчик погу.txt


tar itself creates readable archives.
Comment 1 Julian Steinmann 2018-08-05 12:17:46 UTC
Hey Ibragimov, does this issue still exist for you? The program you provided was written for Qt 4, which is no longer supported. If the problem does still exist, we'd love to merge your patch assuming that the patch does solve the bug.
Comment 2 Ibragimov Rinat 2018-08-07 11:31:18 UTC
(In reply to Julian Schraner from comment #1)
> Hey Ibragimov, does this issue still exist for you? The program you provided
> was written for Qt 4, which is no longer supported. If the problem does
> still exist, we'd love to merge your patch assuming that the patch does
> solve the bug.

Well, back then I also reported the issue to Debian bug tracker: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=612675. The patch I created was not complete, but fortunately it was updated and subsequently included into the package patchlist. It's still included in kde4libs (https://salsa.debian.org/qt-kde-team/kde/kde4libs/blob/master/debian/patches/ktar_longlink_length_in_bytes.diff). I then moved away from the software that used KTar, and then forgot about the issue completely.

Unfortunately, the issue is still there, but now it's in the KF5Archive component. As for the repro, here is the code:

-----------------------
// g++ -fPIC -o qw qw.cc `pkg-config --cflags --libs Qt5Core` -I/usr/include/KF5/KArchive -lKF5Archive

#include <ktar.h>
#include <stdio.h>

int main() {

    KTar archive(QString("testfile.tar"), "application/x-tar");
    QString filename=QString::fromUtf8("Раз два три четыре пять, вышел зайчик погу.txt");
    archive.open(QIODevice::WriteOnly);
    archive.prepareWriting(filename, QString("user"), QString("group"), 0);
    archive.finishWriting(0);
    archive.close();
    return 0;
}

-----------------------

As before, after program creates "testfile.tar", "tar tf testfile.tar" fails:

$ tar tf testfile.tar 
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors


I don't know if the patch can be applied as is to the current code. Haven't tried that yet.
Comment 3 Ibragimov Rinat 2018-08-07 16:01:03 UTC
(In reply to Julian Schraner from comment #1)

I've tested it against version of karchive from git repository, and still found the issue reproducible. In the process, I've added https://phabricator.kde.org/D14674 with the patch. Could you take a look?
Comment 4 Albert Astals Cid 2018-08-28 21:50:40 UTC
Git commit 524805b3ea2ebfc0a7c4a3f2fb89dc8371a66e5c by Albert Astals Cid, on behalf of Rinat Ibragimov.
Committed on 28/08/2018 at 21:50.
Pushed by aacid into branch 'master'.

handle non-ASCII encodings of file names in tar archives

Summary:

As UTF-8 may use multiple bytes per character, it's required to measure encoded file name length in bytes, not just in characters. It's possible to accidentally truncate file name if its name in characters is shorter that 100, but in bytes — longer than 100. Also, checksum calculation must be performed on unsigned bytes. Otherwise bytes in range 0x80-0xff when promoted to int become negative.

Reviewers: dfaure, kossebau

Reviewed By: dfaure

Subscribers: xyquadrat, broulik, cfeck, ibragimovrinat, kde-frameworks-devel

Tags: #frameworks

Differential Revision: https://phabricator.kde.org/D14674

M  +49   -0    autotests/karchivetest.cpp
M  +2    -0    autotests/karchivetest.h
A  +0    -0    autotests/tar_non_ascii_file_name.tar.gz
M  +10   -10   src/ktar.cpp

https://commits.kde.org/karchive/524805b3ea2ebfc0a7c4a3f2fb89dc8371a66e5c