Bug 240727 - Ark shouldn't extract archives to invalid encoding filenames
Summary: Ark shouldn't extract archives to invalid encoding filenames
Status: RESOLVED FIXED
Alias: None
Product: ark
Classification: Applications
Component: general (show other bugs)
Version: 2.15
Platform: Debian testing Linux
: NOR normal
Target Milestone: ---
Assignee: Ragnar Thomsen
URL:
Keywords:
: 220513 251206 266158 276210 304426 308596 319712 322100 323098 329573 345519 349577 355814 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-06-04 15:13 UTC by Con Kolivas
Modified: 2015-11-25 16:14 UTC (History)
22 users (show)

See Also:
Latest Commit:
Version Fixed In: 15.12.0
Sentry Crash Report:


Attachments
Prioritizes p7zip over unzip & unrar (2.12 KB, patch)
2010-07-10 13:56 UTC, Christian Muehlhaeuser
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Con Kolivas 2010-06-04 15:13:17 UTC
Version:           unspecified (using KDE 4.4.3) 
OS:                Linux

The much maligned bug which prevents non UTF-8 encoded filenames 165044 https://bugs.kde.org/show_bug.cgi?id=165044 means that any files generated with an invalid encoding can not be accessed, modified, renamed or anything under kde4 due to a QT limitation. Ark, when extracting from an archive with a non UTF-8 encoding (such as a zip file encoded in japanese encoding) will extract files with invalid names, rendering a directory and/or series of files that can then not be accessed with any standard kde application. Presumably this is because it uses (un)zip as the backend for extraction. I suggest using an application for extraction that automatically converts filenames to valid UTF-8 encoding, even if the filename is different. 7z does this.

Reproducible: Always

Steps to Reproduce:
Extract a zip archive encoded with non UTF-8 encoded filenames.

Actual Results:  
Creates files and directories visible but inaccessible in dolphin or konqueror.

Expected Results:  
Files and directories should be accessible, even if the filenames are now different.
Comment 1 Raphael Kubo da Costa 2010-06-12 20:41:48 UTC
Hmm. We already have a 7zip plugin, which is currently used only for 7zip files. There's a wish for letting the user choose which backend to use for each archive format, but for now we could try giving the 7zip plugin more priority than the infozip one for zip files so that it is used instead by default.

What do you guys think?
Comment 2 Con Kolivas 2010-06-13 05:11:04 UTC
If 7z supports all the standard archive formats without the encoding problems, why bother using multiple backends at all by default?
Comment 3 Raphael Kubo da Costa 2010-06-20 08:35:29 UTC
One possible downside I see here, though, is that 7zip apparently cannot handle multi-volume archives.
Comment 4 Christian Muehlhaeuser 2010-07-10 13:56:59 UTC
Created attachment 48742 [details]
Prioritizes p7zip over unzip & unrar
Comment 5 Christian Muehlhaeuser 2010-07-10 13:59:02 UTC
What this patch does:

- Fixes cli7z plugin to support rar files.
- Gives cli7z a higher priority than zip / rar plugins.

Why this is good:
- p7zip doesn't create invalid filenames which we can't handle in KDE.
- p7zip is way faster than unzip / unrar.

What still needs to be done:
- Detect if p7zip and the p7zip-rar plugin are installed and default to unrar / unzip otherwise.
Comment 6 Florian Reinhard 2010-08-18 10:15:55 UTC
this does still happen in 4.5 with .zip files
Comment 7 Raphael Kubo da Costa 2010-09-19 02:51:07 UTC
*** Bug 251206 has been marked as a duplicate of this bug. ***
Comment 8 Jakub 2010-10-16 20:06:56 UTC
The same problem with polish characters: ą, ę, ó, ł, ś, ź, ż, ć, ń. Ark either doesn't extract a file, or extract inaccessible and not-to-remove file. Version 2.15.
Comment 9 Raphael Kubo da Costa 2010-12-08 02:19:31 UTC
Changing the default assignee in the currently open Ark bug reports to me.
Comment 10 Theofilos Intzoglou 2011-09-17 15:00:25 UTC
The problem exists in zip files created with old versions of winzip, pkzip or infozip for windows. There are a lot of patches for the latest version of the unzip utility from info-zip that add two new switches (-I and -O) that allow you to specify the character set to be used. As it seems in the latest beta version one of those patches has been included (v6.1beta). It would be nice if ark could use those flags somehow to specify the correct character set to be used for extraction and listing.
Comment 11 Raphael Kubo da Costa 2011-09-17 15:44:19 UTC
*** Bug 266158 has been marked as a duplicate of this bug. ***
Comment 13 Raphael Kubo da Costa 2011-09-17 15:46:19 UTC
*** Bug 276210 has been marked as a duplicate of this bug. ***
Comment 14 Raphael Kubo da Costa 2012-08-02 13:33:03 UTC
*** Bug 304426 has been marked as a duplicate of this bug. ***
Comment 15 Raphael Kubo da Costa 2012-10-18 12:26:43 UTC
*** Bug 308596 has been marked as a duplicate of this bug. ***
Comment 16 Frédéric Bron 2013-05-06 04:10:10 UTC
I do not think this issue is due to zip/unzip: same problem with valid UTF8 filenames.
Steps to reproduce:
touch abcdé.txt
zip foo.zip abcdé.txt
LC_ALL=fr_FR.UTF-8 ark foo.zip
-> the file appears as abcd?.txt and a double click to open it fails with error message:
"Impossible de charger le fichier /tmp/kde-fred/arkDVVZUI//abcd?.txt car il n'a pas été possible de lire depuis celui-ci. Vérifiez si vous avez les droits d'accès à ce fichier." Which I can translate to "Not able to load file /tmp/kde-fred/arkDVVZUI//abcd?.txt because it was not possible to read from it. Check if you have the permissions to access this file."

The funny thing is that with the C locale, it works better:
LC_ALL=C ark foo.zip
-> the file appears as abcd#U00e9.txt and is extracted as that name. Double clicking on it is ok to open it.
é unicode is e9 but ark should not convert the name to abcd#U00e9.txt. Whatever the locale, the name should contain "e cute".
Hope this helps to solve this bug.
Comment 17 Leonardo 2013-05-16 08:44:01 UTC
Ark is also not extracting files when they can't be written on disk because of filesystem limitations (and probably also permissions, didn't check). This wouldn't be much of a problem if only Ark didn't fail silently. I've lost data due to the fact that I deleted the original zip files after uncompressing them. Several files lost because of funny characters in their names. Not much of a problem, I had the data resent to me but this might not always be the case and renders Ark as an unreliable program, especially in mixed environments. You should never be in doubt about whether the operation was successful or not.

Having the possibility to rename said files from within Ark would also be very nice.
Comment 18 Raphael Kubo da Costa 2014-04-05 20:02:21 UTC
*** Bug 323098 has been marked as a duplicate of this bug. ***
Comment 19 Ragnar Thomsen 2015-09-24 13:55:29 UTC
*** Bug 345519 has been marked as a duplicate of this bug. ***
Comment 20 Ragnar Thomsen 2015-09-24 13:56:51 UTC
*** Bug 349577 has been marked as a duplicate of this bug. ***
Comment 21 Ragnar Thomsen 2015-09-24 14:07:10 UTC
*** Bug 329573 has been marked as a duplicate of this bug. ***
Comment 22 Daniel Duris 2015-09-24 14:17:24 UTC
Since this is a abandoned software and not developed anymore, I recommend to everyone encountering this issue - to download and use PeaZip. Better and nicer UI, too. And it works!
Comment 23 Ragnar Thomsen 2015-09-24 16:57:26 UTC
Ark has actually seen quite a lot of development this year. I recommend you to try a recent release. We are working on fixing this bug for the 15.12 release in December.
Comment 24 Elvis Angelaccio 2015-09-25 10:36:00 UTC
(In reply to Dan Duris from comment #22)
> Since this is a abandoned software and not developed anymore, I recommend to
> everyone encountering this issue - to download and use PeaZip. Better and
> nicer UI, too. And it works!

And it's also not shipped by any major distribution. 

Please do check for facts before making any claim. Ark is not abandoned and it's actively maintained.
Comment 25 Daniel Duris 2015-09-25 11:27:58 UTC
Sorry to insult your feelings, but I haven't seen upgrade to Ark for few years and not talking about the basic usability issues of the utmost importance - UTF8 handling of characters.

BTW, you know as well as anyone, that UTF8 has been around for 15+ years, so there is no excuse Ark can not handle it yet.

It's basic unusable for any work with UTF8 right now and based on UTF8 prominence, let's just make it short: it's unusable.
Comment 26 Elvis Angelaccio 2015-09-25 13:01:19 UTC
(In reply to Dan Duris from comment #25)
> Sorry to insult your feelings, but I haven't seen upgrade to Ark for few
> years and not talking about the basic usability issues of the utmost
> importance - UTF8 handling of characters.
> 
> BTW, you know as well as anyone, that UTF8 has been around for 15+ years, so
> there is no excuse Ark can not handle it yet.
> 
> It's basic unusable for any work with UTF8 right now and based on UTF8
> prominence, let's just make it short: it's unusable.

You're talking about one specific plugin (zip) among many others wich have good UTF8 support. The fact that UTF8 is broken in the zip plugin does not imply that Ark is not developed anymore. In fact, for the next Ark release we are trying to switch to the pz7ip plugin for zip archives, in order to solve this and other related bugs.

I'm sorry to hear that Ark cannot help you with the work you do, but really, you should blame the info-zip developers more than us, for this specific issue.
Comment 27 Martin Tlustos 2015-09-25 14:05:52 UTC
Sorry to remind you all, but this is for bug reports, not for discussions. If you want to attack/defend/anything else not purely bug related, please do so somewhere else. It will help the maintainers and all who have subscribed to this bug.

Thank you.
Comment 28 Elvis Angelaccio 2015-10-03 15:56:36 UTC
Git commit 50916c0ff9fd6731fc2effa26551e7f92b595eba by Elvis Angelaccio.
Committed on 03/10/2015 at 15:56.
Pushed by elvisangelaccio into branch 'master'.

Set cli7zip as the default plugin for zip archives

This increases the priority of the cli7zip plugin, so that p7zip is now the
default backend used for zip archives handling.
This change was required to address the unicode issues of the clizip plugin,
which resulted in a lot of bugs. While not perfect, p7zip handles better
filenames with non-ASCII characters; in particular it never extracts files
which cannot be opened or not even deleted by Dolphin.

Distributions who ship a patched version of Ark to address the same problem
(e.g. by setting libarchive as default plugin for zips, as Debian/Kubuntu do)
should drop their downstream patches, since they should not be necessary anymore.
FIXED-IN: 15.12.0
REVIEW: 125505

M  +1    -1    plugins/cli7zplugin/kerfuffle_cli7z.desktop.cmake

http://commits.kde.org/ark/50916c0ff9fd6731fc2effa26551e7f92b595eba
Comment 29 Elvis Angelaccio 2015-10-03 15:57:42 UTC
*** Bug 319712 has been marked as a duplicate of this bug. ***
Comment 30 Elvis Angelaccio 2015-10-03 15:58:38 UTC
*** Bug 322100 has been marked as a duplicate of this bug. ***
Comment 31 Elvis Angelaccio 2015-10-25 22:38:03 UTC
*** Bug 220513 has been marked as a duplicate of this bug. ***
Comment 32 Elvis Angelaccio 2015-11-25 16:14:01 UTC
*** Bug 355814 has been marked as a duplicate of this bug. ***