Version: 2.13 (using KDE 4.3.0) Installed from: SuSE RPMs If there are some zip files in archive, their filenames are shown as ?????????.pdf and so on. It can extract files onto the filesystem but only the whole archive. Ark cannot extract some files - only all. And, it cannot open files in viewer by the same reason: it does not understand file names. I do not know, whether zip archives store information about filename encoding... (I just wondered when I saw true filenames in .rar archives!). In this case it wold be nice to set default charset to decode from. For example, I am to work only with archives created on Windows.
Can you please attach a zip file that presents this problem?
Created attachment 36428 [details] Zip archive with wring cyrillic filenames encoding File is rather large... but it is taken from production usage operations
I was able to preview both files and extract them separately without any trouble. Does using the command-line zipinfo and unzip tools work for you?
No, it does not. Both unzip and zipinfo shows a lot of questions instead of filename characters.
They both work fine here as well. Are you using UTF-8 as your system encoding?
Yes, the default character set is utf8 ----8<---------------------------------------------- pavel@pavel:~/tmp> unzip ReportPacket_DBV90821CJ.zip Archive: ReportPacket_DBV90821CJ.zip inflating: ����ի��� ����� (ꡡ�������).pdf inflating: ����ի��� ����� (����������).pdf pavel@pavel:~/tmp> env | grep -i lang NLS_LANG=Russian.UTF8 LANG=ru_RU.UTF-8 LANGUAGE= pavel@pavel:~/tmp> ----8<---------------------------------------------- Unrar behaviour is correct.
Can you paste the output of 'unzip -v'?
This one is from SuSE distribution: ----8<--------------------------------------------------------------------- pavel:/tmp/unzip60 # unzip -v UnZip 5.52 of 28 February 2005, by Info-ZIP. Maintained by C. Spieler. Send bug reports using http://www.info-zip.org/zip-bug.html; see README for details. Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ; see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites. Compiled with gcc 4.3.2 [gcc-4_3-branch revision 141291] for Unix (Linux ELF) on Dec 3 2008. UnZip special compilation options: COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported) SET_DIR_ATTRIB TIMESTAMP USE_EF_UT_TIME USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported) USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported) VMS_TEXT_CONV [decryption, version 2.9 of 05 May 2000] UnZip and ZipInfo environment options: UNZIP: [none] UNZIPOPT: [none] ZIPINFO: [none] ZIPINFOOPT: [none] ----8<--------------------------------------------------------------------- And the second one is compiled from source: ----8<--------------------------------------------------------------------- pavel:/tmp/unzip60 # /usr/local/bin/unzip -v UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send bug reports using http://www.info-zip.org/zip-bug.html; see README for details. Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ; see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites. Compiled with gcc 4.3.2 [gcc-4_3-branch revision 141291] for Unix (Linux ELF) on Aug 25 2009. UnZip special compilation options: COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported) SET_DIR_ATTRIB SYMLINKS (symbolic links supported, if RTL and file system permit) TIMESTAMP UNIXBACKUP USE_EF_UT_TIME USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported) USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported) UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8 paths) MBCS-support (multibyte character support, MB_CUR_MAX = 6) LARGE_FILE_SUPPORT (large files over 2 GiB supported) ZIP64_SUPPORT (archives using Zip64 for large files supported) USE_BZIP2 (PKZIP 4.6+, using bzip2 lib version 1.0.5, 10-Dec-2007) VMS_TEXT_CONV [decryption, version 2.11 of 05 Jan 2007] UnZip and ZipInfo environment options: UNZIP: [none] UNZIPOPT: [none] ZIPINFO: [none] ZIPINFOOPT: [none] ----8<---------------------------------------------------------------------
Apparently your system-wide unzip doesn't support UTF-8, whereas the version you've compiled yourself does. If you use your compile version, does it list your files correctly?
The 6.0 version of unzip does not shows files correctly and it does not name extracted files correctly. I have compiled unzip 6.0 with patch from AltLinux http://sisyphus.ru/srpm/Sisyphus/unzip/patches. Now, unzip and zipinfo shows wrong filenames (questions): ----8<-------------------------------------------------- pavel@pavel:~/tmp> unzip ReportPacket_DBV90821CJ.zip Archive: ReportPacket_DBV90821CJ.zip inflating: ?????????????????? ?????????? (????????????????????).pdf inflating: ?????????????????? ?????????? (????????????????????).pdf ----8<-------------------------------------------------- But the filenames of extracted files are correct now. Ark does not see filenames in archive. Now it see only one file in the archive. It can extract only all files again. But, now, filenames are correct (after extracting).
Can you try to compile unzip only with the iconv patch? Mine only has this patch, so we could try to get our programs as similar as possible. Ark only calls unzip, zip and zipinfo, so whatever problem there is, it's in those programs.
Where were those zip files created? Was it on an UTF-8 system or something using another encoding (such as Windows)?
The zip archive was created on Windows. As I understand, this is the main problem here. I've compiled unzip with iconv patch as I described in comment #10. I shall try some more actions to force zipinfo view filenames in UTF-8.
Do you know the exact encoding used on the Windows system that created this file? I'm thinking of closing this bug as UPSTREAM - I could reproduce it with infozip 5.52 on FreeBSD. It seems to be a bug in infozip, not KDE.
This work for me if I use unzip 5.52 and patch unzip-5.52-alt-natspec.patch. As I shwed in comment #10 this does not work correctly on unzip 6.0. There a discussion on this topic on info-zip site: http://www.info-zip.org/board/board.pl?m-1248086794/ Now, Ark shows every cyrrilic letter correctly. I do not know, whether ark should correct the wrong encoding from unpatched unzip: this is unzip's misfeature, but ark should work with this, unpatched version. To be out of the box. I do not know. What do you think?
Hi there, sorry for taking so long to answer. I've read the discussion on the board you indicated, and I think it's better to close this report as UPSTREAM, which means it's actually a problem on a program we depend on. In theory, everyone should be using UTF8 as their encoding, but as always Windows tends to make things difficult for everyone. I don't think it's easily possible to write a workaround in Ark (and I tend to think it's even better *not* to write one), as we only parse the terminal output infozip produces. While the infozip team doesn't fix this issue (or a better alternative program or library appears), I'd recommend either applying one those patches you've mentioned or using convmv, which is a program that converts filenames from one encoding to another. Thanks for all the effort you've put on this report, and sorry there's no better alternative ;)
>I'd recommend either applying one those patches you've mentioned or using convmv I am to install workable version for common users, that are afraid of console, so this variant does not fit. Is there some way to configure how Ark interacts with its console backends? Some way to configure console archivers as plugins to Ark? If such an interface exists, I could configure to change encoding of filenames on-the-fly for viewring and renaming it automatically after extraction.
(In reply to comment #17) > Is there some way to configure how Ark interacts with its console backends? > Some way to configure console archivers as plugins to Ark? > > If such an interface exists, I could configure to change encoding of filenames > on-the-fly for viewring and renaming it automatically after extraction. Well, the base for the code is in kerfuffle/cliinterface.cpp, and the zip plugin is in plugins/clizip. However, I've created a small Qt-only program that uses a QProcess and sets LANG, LANGUAGE and LC_ALL to ru_RU.CP1251 (although I'm not sure the exact encoding you're using on Windows) and it didn't work either. If you succeed in changing Ark, please let me know. > >I'd recommend either applying one those patches you've mentioned or using convmv > > I am to install workable version for common users, that are afraid of console, > so this variant does not fit. For the situation you describe, it looks easier to patch unzip and install the modified package on the users' machines. Is it feasible?
I have such problem. Which makes me know about itself when i trying to unzip archives, which was made in Windows. You should solve this error, or, maybe for the first time make opportunity to rename zipped files before extracting, like it is made in File-Roller (so, i solve such problems that way). Thanks.