Bug 204984

Summary: Zip archives does not show cyrrilic file names
Product: [Applications] ark Reporter: Pavel Baranchikov <maednoldor>
Component: generalAssignee: Harald Hvaal <metellius>
Status: RESOLVED UPSTREAM    
Severity: normal CC: mypavel, rakuco
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: openSUSE   
OS: Unspecified   
Latest Commit: Version Fixed In:
Attachments: Zip archive with wring cyrillic filenames encoding

Description Pavel Baranchikov 2009-08-24 16:44:49 UTC
Version:           2.13 (using KDE 4.3.0)
Installed from:    SuSE RPMs

If there are some zip files in archive, their filenames are shown as ?????????.pdf and so on. It can extract files onto the filesystem but only the whole archive.

Ark cannot extract some files - only all. And, it cannot open files in viewer by the same reason: it does not understand file names.

I do not know, whether zip archives store information about filename encoding... (I just wondered when I saw true filenames in .rar archives!). In this case it wold be nice to set default charset to decode from. For example, I am to work only with archives created on Windows.
Comment 1 Raphael Kubo da Costa 2009-08-24 17:32:51 UTC
Can you please attach a zip file that presents this problem?
Comment 2 Pavel Baranchikov 2009-08-25 05:22:16 UTC
Created attachment 36428 [details]
Zip archive with wring cyrillic filenames encoding

File is rather large... but it is taken from production usage operations
Comment 3 Raphael Kubo da Costa 2009-08-25 05:48:41 UTC
I was able to preview both files and extract them separately without any trouble.

Does using the command-line zipinfo and unzip tools work for you?
Comment 4 Pavel Baranchikov 2009-08-25 10:08:36 UTC
No, it does not. Both unzip and zipinfo shows a lot of questions instead of filename characters.
Comment 5 Raphael Kubo da Costa 2009-08-25 14:30:55 UTC
They both work fine here as well. Are you using UTF-8 as your system encoding?
Comment 6 Pavel Baranchikov 2009-08-25 19:46:35 UTC
Yes, the default character set is utf8

----8<----------------------------------------------
pavel@pavel:~/tmp> unzip ReportPacket_DBV90821CJ.zip
Archive:  ReportPacket_DBV90821CJ.zip
  inflating: ����ի��� ����� (ꡡ�������).pdf
  inflating: ����ի��� ����� (����������).pdf
pavel@pavel:~/tmp> env | grep -i lang
NLS_LANG=Russian.UTF8
LANG=ru_RU.UTF-8
LANGUAGE=
pavel@pavel:~/tmp>
----8<----------------------------------------------

Unrar behaviour is correct.
Comment 7 Raphael Kubo da Costa 2009-08-25 20:21:12 UTC
Can you paste the output of 'unzip -v'?
Comment 8 Pavel Baranchikov 2009-08-25 20:45:56 UTC
This one is from SuSE distribution:

----8<---------------------------------------------------------------------
pavel:/tmp/unzip60 # unzip -v
UnZip 5.52 of 28 February 2005, by Info-ZIP.  Maintained by C. Spieler.  Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.

Compiled with gcc 4.3.2 [gcc-4_3-branch revision 141291] for Unix (Linux ELF) on Dec  3 2008.

UnZip special compilation options:
        COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
        SET_DIR_ATTRIB
        TIMESTAMP
        USE_EF_UT_TIME
        USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
        USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
        VMS_TEXT_CONV
        [decryption, version 2.9 of 05 May 2000]

UnZip and ZipInfo environment options:
           UNZIP:  [none]
        UNZIPOPT:  [none]
         ZIPINFO:  [none]
      ZIPINFOOPT:  [none]
----8<---------------------------------------------------------------------

And the second one is compiled from source:

----8<---------------------------------------------------------------------
pavel:/tmp/unzip60 # /usr/local/bin/unzip -v
UnZip 6.00 of 20 April 2009, by Info-ZIP.  Maintained by C. Spieler.  Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.

Compiled with gcc 4.3.2 [gcc-4_3-branch revision 141291] for Unix (Linux ELF) on Aug 25 2009.

UnZip special compilation options:
        COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
        SET_DIR_ATTRIB
        SYMLINKS (symbolic links supported, if RTL and file system permit)
        TIMESTAMP
        UNIXBACKUP
        USE_EF_UT_TIME
        USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
        USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
        UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8 paths)
        MBCS-support (multibyte character support, MB_CUR_MAX = 6)
        LARGE_FILE_SUPPORT (large files over 2 GiB supported)
        ZIP64_SUPPORT (archives using Zip64 for large files supported)
        USE_BZIP2 (PKZIP 4.6+, using bzip2 lib version 1.0.5, 10-Dec-2007)
        VMS_TEXT_CONV
        [decryption, version 2.11 of 05 Jan 2007]

UnZip and ZipInfo environment options:
           UNZIP:  [none]
        UNZIPOPT:  [none]
         ZIPINFO:  [none]
      ZIPINFOOPT:  [none]
----8<---------------------------------------------------------------------
Comment 9 Raphael Kubo da Costa 2009-08-25 20:54:55 UTC
Apparently your system-wide unzip doesn't support UTF-8, whereas the version you've compiled yourself does. If you use your compile version, does it list your files correctly?
Comment 10 Pavel Baranchikov 2009-08-25 21:55:26 UTC
The 6.0 version of unzip does not shows files correctly and it does not name extracted files correctly.

I have compiled unzip 6.0 with patch from AltLinux http://sisyphus.ru/srpm/Sisyphus/unzip/patches.

Now, unzip and zipinfo shows wrong filenames (questions):

----8<--------------------------------------------------
pavel@pavel:~/tmp> unzip ReportPacket_DBV90821CJ.zip
Archive:  ReportPacket_DBV90821CJ.zip
  inflating: ?????????????????? ?????????? (????????????????????).pdf
  inflating: ?????????????????? ?????????? (????????????????????).pdf
----8<--------------------------------------------------

But the filenames of extracted files are correct now.

Ark does not see filenames in archive. Now it see only one file in the archive. It can extract only all files again. But, now, filenames are correct (after extracting).
Comment 11 Raphael Kubo da Costa 2009-08-25 22:08:08 UTC
Can you try to compile unzip only with the iconv patch? Mine only has this patch, so we could try to get our programs as similar as possible.

Ark only calls unzip, zip and zipinfo, so whatever problem there is, it's in those programs.
Comment 12 Raphael Kubo da Costa 2009-09-14 03:23:27 UTC
Where were those zip files created? Was it on an UTF-8 system or something using another encoding (such as Windows)?
Comment 13 Pavel Baranchikov 2009-09-14 06:27:23 UTC
The zip archive was created on Windows. As I understand, this is the main problem here.

I've compiled unzip with iconv patch as I described in comment #10. I shall try  some more actions to force zipinfo view filenames in UTF-8.
Comment 14 Raphael Kubo da Costa 2009-09-14 06:33:07 UTC
Do you know the exact encoding used on the Windows system that created this file? I'm thinking of closing this bug as UPSTREAM - I could reproduce it with infozip 5.52 on FreeBSD. It seems to be a bug in infozip, not KDE.
Comment 15 Pavel Baranchikov 2009-09-14 16:25:07 UTC
This work for me if I use unzip 5.52 and patch unzip-5.52-alt-natspec.patch. As I shwed in comment #10 this does not work correctly on unzip 6.0.

There a discussion on this topic on info-zip site: http://www.info-zip.org/board/board.pl?m-1248086794/

Now, Ark shows every cyrrilic letter correctly.

I do not know, whether ark should correct the wrong encoding from unpatched unzip: this is unzip's misfeature, but ark should work with this, unpatched version. To be out of the box. I do not know. What do you think?
Comment 16 Raphael Kubo da Costa 2009-09-19 21:53:23 UTC
Hi there, sorry for taking so long to answer.

I've read the discussion on the board you indicated, and I think it's better to close this report as UPSTREAM, which means it's actually a problem on a program we depend on.

In theory, everyone should be using UTF8 as their encoding, but as always Windows tends to make things difficult for everyone.

I don't think it's easily possible to write a workaround in Ark (and I tend to think it's even better *not* to write one), as we only parse the terminal output infozip produces.

While the infozip team doesn't fix this issue (or a better alternative program or library appears), I'd recommend either applying one those patches you've mentioned or using convmv, which is a program that converts filenames from one encoding to another.

Thanks for all the effort you've put on this report, and sorry there's no better alternative ;)
Comment 17 Pavel Baranchikov 2009-09-20 19:38:48 UTC
>I'd recommend either applying one those patches you've mentioned or using convmv

I am to install workable version for common users, that are afraid of console, so this variant does not fit.

Is there some way to configure how Ark interacts with its console backends? Some way to configure console archivers as plugins to Ark?

If such an interface exists, I could configure to change encoding of filenames on-the-fly for viewring and renaming it automatically after extraction.
Comment 18 Raphael Kubo da Costa 2009-09-20 21:23:43 UTC
(In reply to comment #17)
> Is there some way to configure how Ark interacts with its console backends?
> Some way to configure console archivers as plugins to Ark?
> 
> If such an interface exists, I could configure to change encoding of filenames
> on-the-fly for viewring and renaming it automatically after extraction.
Well, the base for the code is in kerfuffle/cliinterface.cpp, and the zip plugin is in plugins/clizip.

However, I've created a small Qt-only program that uses a QProcess and sets LANG, LANGUAGE and LC_ALL to ru_RU.CP1251 (although I'm not sure the exact encoding you're using on Windows) and it didn't work either. If you succeed in changing Ark, please let me know.

> >I'd recommend either applying one those patches you've mentioned or using convmv
> 
> I am to install workable version for common users, that are afraid of console,
> so this variant does not fit.
For the situation you describe, it looks easier to patch unzip and install the modified package on the users' machines. Is it feasible?
Comment 19 Pavel 2009-12-15 17:16:45 UTC
I have such problem. Which makes me know about itself when i trying to unzip archives, which was made in Windows. You should solve this error, or, maybe for the first time make opportunity to rename zipped files before extracting, like it is made in File-Roller (so, i solve such problems that way). Thanks.