Created attachment 57190 [details] fix for KTar's handling of UTF-8 multibyte file names Version: 4.4 (using KDE 4.4.5) OS: Linux I tried to create tar by KBackup program and found truncated names of my files in .tar. KBackup uses KTar class for writing tar files and that class have broken UTF-8 support, it seems. First, tar archives have to use "longlink trick" to store names longer than 100 bytes. KTar class has functions implementing longlink, but they check name length in _characters_, not in bytes. For non-ASCII characters in UTF-8 length of string in bytes and length in characters do not match. In my case file had character-length less than 100 and byte-length greater than 100, so name simply truncated. Such behavior can be observed on non-ASCII UTF-8 or any other multibyte encoding. If file name is very long, resulting .tar may become unreadable. Second, calculation of 'chksum' field of tar header also broken: 'buffer' array defined as char, a signed number, while in tar sources chksum obtained as sum of unsigned values (actually there is the trick for (unsigned char) emulate, converting to integer and then logical and with 0xFF). May be bad checksum was reason for unreadable .tar. Reproducible: Always Steps to Reproduce: 1. compile and run // gcc -o qw qw.cpp -I/usr/include/qt4 -lQtCore -lkio #include <ktar.h> #include <stdio.h> int main(void) { KTar archive(QString("testfile.tar"), "application/x-tar"); QString filename=QString::fromUtf8("Раз два три четыре пять, вышел зайчик погу.txt"); archive.open(QIODevice::WriteOnly); archive.prepareWriting(filename, QString("user"), QString("group"), 0); archive.finishWriting(0); archive.close(); return 0; } 2. try to run `tar tf testfile.tar'. Actual Results: tar complains about wrong headers Expected Results: tar should display name of file: Раз два три четыре пять, вышел зайчик погу.txt tar itself creates readable archives.
Hey Ibragimov, does this issue still exist for you? The program you provided was written for Qt 4, which is no longer supported. If the problem does still exist, we'd love to merge your patch assuming that the patch does solve the bug.
(In reply to Julian Schraner from comment #1) > Hey Ibragimov, does this issue still exist for you? The program you provided > was written for Qt 4, which is no longer supported. If the problem does > still exist, we'd love to merge your patch assuming that the patch does > solve the bug. Well, back then I also reported the issue to Debian bug tracker: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=612675. The patch I created was not complete, but fortunately it was updated and subsequently included into the package patchlist. It's still included in kde4libs (https://salsa.debian.org/qt-kde-team/kde/kde4libs/blob/master/debian/patches/ktar_longlink_length_in_bytes.diff). I then moved away from the software that used KTar, and then forgot about the issue completely. Unfortunately, the issue is still there, but now it's in the KF5Archive component. As for the repro, here is the code: ----------------------- // g++ -fPIC -o qw qw.cc `pkg-config --cflags --libs Qt5Core` -I/usr/include/KF5/KArchive -lKF5Archive #include <ktar.h> #include <stdio.h> int main() { KTar archive(QString("testfile.tar"), "application/x-tar"); QString filename=QString::fromUtf8("Раз два три четыре пять, вышел зайчик погу.txt"); archive.open(QIODevice::WriteOnly); archive.prepareWriting(filename, QString("user"), QString("group"), 0); archive.finishWriting(0); archive.close(); return 0; } ----------------------- As before, after program creates "testfile.tar", "tar tf testfile.tar" fails: $ tar tf testfile.tar tar: This does not look like a tar archive tar: Skipping to next header tar: Exiting with failure status due to previous errors I don't know if the patch can be applied as is to the current code. Haven't tried that yet.
(In reply to Julian Schraner from comment #1) I've tested it against version of karchive from git repository, and still found the issue reproducible. In the process, I've added https://phabricator.kde.org/D14674 with the patch. Could you take a look?
Git commit 524805b3ea2ebfc0a7c4a3f2fb89dc8371a66e5c by Albert Astals Cid, on behalf of Rinat Ibragimov. Committed on 28/08/2018 at 21:50. Pushed by aacid into branch 'master'. handle non-ASCII encodings of file names in tar archives Summary: As UTF-8 may use multiple bytes per character, it's required to measure encoded file name length in bytes, not just in characters. It's possible to accidentally truncate file name if its name in characters is shorter that 100, but in bytes — longer than 100. Also, checksum calculation must be performed on unsigned bytes. Otherwise bytes in range 0x80-0xff when promoted to int become negative. Reviewers: dfaure, kossebau Reviewed By: dfaure Subscribers: xyquadrat, broulik, cfeck, ibragimovrinat, kde-frameworks-devel Tags: #frameworks Differential Revision: https://phabricator.kde.org/D14674 M +49 -0 autotests/karchivetest.cpp M +2 -0 autotests/karchivetest.h A +0 -0 autotests/tar_non_ascii_file_name.tar.gz M +10 -10 src/ktar.cpp https://commits.kde.org/karchive/524805b3ea2ebfc0a7c4a3f2fb89dc8371a66e5c