Bug 292935 - Ark should offer an option to ignore common metadata files
Summary: Ark should offer an option to ignore common metadata files
Status: CONFIRMED
Alias: None
Product: ark
Classification: Applications
Component: general (show other bugs)
Version: unspecified
Platform: Gentoo Packages Linux
: NOR wishlist
Target Milestone: ---
Assignee: Elvis Angelaccio
URL:
Keywords: junior-jobs
Depends on:
Blocks:
 
Reported: 2012-01-31 09:33 UTC by Andrew Udvare
Modified: 2021-03-11 15:05 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Example zip file from OS X (49.23 KB, application/zip)
2012-01-31 09:33 UTC, Andrew Udvare
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Udvare 2012-01-31 09:33:17 UTC
Created attachment 68364 [details]
Example zip file from OS X

Version:           unspecified (using KDE 4.7.4) 
OS:                Linux

In OS X, if you 'Compress' a directory it creates a zip file with the contents and a meta directory for OS X named __MACOSX. Inside the directory is file names of the same name but prefixed with ._

Example:
Compress newdir containing file a.h (from Finder)

Makes a zip file with:
newdir/a.h
__MACOSX/newdir/._a.h
__MACOSX/newdir/._.DS_Store

Reproducible: Always

Steps to Reproduce:
1. Get a file zipped on OS X using Finder
2. Right-click the file in Konqueror and use Extract here.

Actual Results:  
Gets the contents as expected but also extracts the useless __MACOSX directory.

Expected Results:  
Should extract only the contents expected.

I think 99% of people on Linux and other OS's consider this directory junk and I also think 99% of people are not going to explicitly make a __MACOSX directory in their zip file. I would not have a problem if without checking the contents, Ark simply did not extract __MACOSX directory if spotted in a zip file.

A more thorough check is also acceptable too and I think the correct solution. Basically, inspect the directory for directory names that are exactly the same and file names that are prefixed with ._ and finding a ._.DS_Store file within each directory as well.

I think this directory is ONLY used by the Extracting functionality built into Finder on OS X. As such, I think Ark on OS X should not need to deal with it either. There do not seem to be any consequences for ignoring this data on OS X.
Comment 1 Raphael Kubo da Costa 2012-02-01 02:41:36 UTC
Thanks for the report. I'm setting the severity to "wishlist" because the current behavior isn't really buggy.

I'm still unsure whether this is a valid bug report or not -- technically speaking, these directories are part of the zip archive like any other directory, and the fact that they exist are just a consequence of they way they were created (and the way Apple's program works). Plus, one might question whether the .DS_Store files should be extracted or not.

OTOH, I do see the point in implementing this kind of behavior to make Ark more user-friendly -- perhaps asking the user whether to extract these files.

Thoughts?
Comment 2 Andrew Udvare 2012-02-01 03:37:51 UTC
My idea is that Ark should have a setting like 'Do not extract common metadata/cache files'. This should be on by default.

Maybe arkrc can have a section for meta data files to ignore so advanced users can edit the list without having to modify Ark's source.

[IgnoreExactMatch]
Thumbs.db
.DS_Store
__MACOSX
.directory

This also means Ark would never have to actually test the contents of these files/directories should one appear in an archive that isn't actually what these normally are (especially since Thumbs.db could definitely mean a thumbnail database to something completely different).
Comment 3 Raphael Kubo da Costa 2012-02-01 12:39:38 UTC
(In reply to comment #2)
> My idea is that Ark should have a setting like 'Do not extract common
> metadata/cache files'. This should be on by default.

This sounds like a nice idea indeed. I'm retitling the bug to make it represent this discussion better. Thanks.
Comment 4 Justin Zobel 2021-03-09 05:23:05 UTC
Thank you for the bug report.

As this report hasn't seen any changes in 5 years or more, we ask if you can please confirm that the issue still persists.

If this bug is no longer persisting or relevant please change the status to resolved.
Comment 5 Jonathan Wakely 2021-03-11 14:54:53 UTC
I was just about to create a new wishlist report for exactly this feature. Yes, the problem still exists (using Ark version 20.08.1).
Comment 6 Jonathan Wakely 2021-03-11 15:03:58 UTC
(In reply to Andrew Udvare from comment #2)
> Maybe arkrc can have a section for meta data files to ignore so advanced
> users can edit the list without having to modify Ark's source.
> 
> [IgnoreExactMatch]
> Thumbs.db
> .DS_Store
> __MACOSX
> .directory

That customization feature might be nice, but might cause issues for people who don't realize the feature exists and don't know where to make the customization.

I think having the feature off by default is probably safest, but there could be a global configuration to turn it on, and the "Extract" dialog could show a checkbox for it with the other Options, to allow it to be turned on for individual extract operations (rather than globally).

If it's easily turned on/off it's probably sufficient to just ignore __MACOSX and .DS_Store folders at the top level of the archive. If they occur below the top level, they might be there intentionally (if they'd been created automatically on MacOS then they'd also be present at the top level).

> This also means Ark would never have to actually test the contents of these
> files/directories should one appear in an archive that isn't actually what
> these normally are (especially since Thumbs.db could definitely mean a
> thumbnail database to something completely different).

If it's easily enabled/disabled in the Extract dialog I think testing the contents of the metadata dirs probably isn't necessary. If extra checks are desirable though, I suggest the heuristic should be that a folder called __MACOSX or .DS_Store is not extracted if everything below it is either a folder, or an AppleDouble file (i.e. one beginning with "._"). I don't think it's necessary to verify that there's a one-to-one mapping from those AppleDouble files to the actual files elsewhere in the archive. If somebody wants to extract them, they can just uncheck the "Ignore metadata junk" option.
Comment 7 Jonathan Wakely 2021-03-11 15:05:21 UTC
(In reply to Jonathan Wakely from comment #6)
> If it's easily turned on/off it's probably sufficient to just ignore
> __MACOSX and .DS_Store folders at the top level of the archive. If they
> occur below the top level, they might be there intentionally (if they'd been
> created automatically on MacOS then they'd also be present at the top level).

Ah no, that won't work for .DS_Store as that occurs at any level. But the __MACOSX folder should only be at the top-level.