When a zip file contains files that has Chinese characters in its filename, preview will show question mark in filenames, and the extracted filenames are wrong too. Reproducible: Always Steps to Reproduce: 1. Open a zip file containing files with Chinese filename 2. See it in preview 3. extract it Actual Results: Couldn't show or extract the filename correctly Expected Results: Should show or extract it correctly
Created attachment 82348 [details] screenshot of ark As you can see, the title of the zip file is correct, but the contents of it is not.
Could you please upload a sample archive ?
Created attachment 82385 [details] sample zip file that ark couldn't correctly Chinese filename
*** Bug 312478 has been marked as a duplicate of this bug. ***
in plugins/libarchive/libarchivehandler.cpp, emitEntryFromArchiveEntry(), the archive_entry_pathname_w() would return nothing with big5 filename while archive_entry_pathname() would return original (big5) filename. ark(18998) LibArchiveInterface::emitEntryFromArchiveEntry: 2: 0x0 ark(18998) LibArchiveInterface::emitEntryFromArchiveEntry: 2.1: ¶}©ñ¦¡¥¥x¨t²Î³nÅéºûÅ@³Ò°È©e¥~/ where 2: prints archive_entry_pathname_w(aentry) and 2.1: prints archive_entry_pathname(aentry). Therefore the filename is empty and in ArchiveModel would have the following debug message: ark(18998) ArchiveModel::newEntry: Weird, received empty entry (no filename) - skipping This is tested on the latest git version.
The problem is in libarchive. When my environment locale is zh_TW.UTF-8, and got a zip file with Big5 filename inside, the mbstowcs would return EILSEQ because it couldn't identify the encoding. Is it possible to add a fallback encoding option in ark, so that when failed to get the archive filename ("Weird ..." messages above), it could retry with fallback encoding?
Hi Franklin. Could you upload a test archive for the libarchive plugin? (e.g. a .tar.gz). Could you also check whether Ark 15.12 + chinese locale can extract zip files? (e.g. the one you already attached here).
I can confirm the issue, because I got the same issue with Japanese. Do you need some screen shot or other things to fix the issue?
(In reply to 佐藤 from comment #8) > I can confirm the issue, because I got the same issue with Japanese. Do you > need some screen shot or other things to fix the issue? Yes, screenshots and test archives please. It would be awesome if you could attach both .zip and .tar.gz test archives.
Created attachment 103555 [details] japanese zip file - ark can not read the character I added a ZIP file. It includes Japanese filesnames. Ark can not show the right character within the software. If you extract the ZIP file, you get wrong file names also.
Created attachment 103556 [details] I add a screenshot also
(In reply to 佐藤 from comment #10) > Created attachment 103555 [details] > japanese zip file - ark can not read the character > > I added a ZIP file. It includes Japanese filesnames. Ark can not show the > right character within the software. If you extract the ZIP file, you get > wrong file names also. Thanks! Can you add also a tar.gz file?
I believe that the problem is related to the filename encoding. In Franklin Weng's case, the zip (attachment 82385 [details]) can be extracted with `unar -e Big5 test.zip`, and in R. Sato's case (attachment 103555 [details]), `unar -e Shift_JIS nenngajyou-data.zip`.
(In reply to qdzcuypq from comment #13) > I believe that the problem is related to the filename encoding. > > In Franklin Weng's case, the zip (attachment 82385 [details]) can be > extracted with `unar -e Big5 test.zip`, and in R. Sato's case (attachment > 103555 [details]), `unar -e Shift_JIS nenngajyou-data.zip`. It is, from the very beginning. Windows seems still use old encodings in some cases, and files generated from Winzip are mostly problematic. In the old days I will use wine to run 7zip which can uncompress the (Chinese-name) files successfully, but in recent years there are more and more files that 7zip failed to uncompress.
I recently wrote patches to p7zip and unzip for OEM charset detection based on system locale. It's exactly that windows internal zip encoder does. https://sourceforge.net/p/infozip/patches/29/ https://sourceforge.net/p/p7zip/bugs/187/ To get correct file names you just need to install patched p7zip and set your system locale correctly. Or do something like alias 7z='LC_ALL=el_GR.UTF-8 7z' if you prefer opening archives using the locale different from system one. Alkis Georgopoulos is planning to package patched p7zip to .deb's and upload to ppa: https://github.com/mate-desktop/engrampa/issues/5#issuecomment-648410042
*** Bug 439392 has been marked as a duplicate of this bug. ***
Update on this issue: I played a bit with encoding probing using both KEncodingProber and ICU. The biggest issue with this approach is that filenames are usually very short, so the prober does not have enough data to properly guess the correct encoding. One possible solution could be the following: we add KEncodingProber support in the libzip plugin (Ark's default plugin for zip files). If KEncodingProber detects one or more non-unicode encodings, Ark would show a notification to the user asking if they want to attempt to fix garbled filenames, if any. If the user confirms, the libzip plugin would then reload the archive and convert the filenames from the detected encoding to the standard UTF-16 encoding used by Qt. This "opt-in" step is required because if we do it automatically we could break the normal workflow for valid zip archives that only contain UTF-8 filenames (since again, the probing is not precise and could detect a wrong encoding for a valid UTF-8 filename).
Actually, there is bug #378904 which track the same issue and has more information. Let's keep the discussion in a single place. *** This bug has been marked as a duplicate of bug 378904 ***