Bug 121641 - Ark cannot extract files when archive name and file names are iso-8859-1 encoded (system default is utf-8).
Summary: Ark cannot extract files when archive name and file names are iso-8859-1 enco...
Status: RESOLVED FIXED
Alias: None
Product: ark
Classification: Applications
Component: general (show other bugs)
Version: 2.5.2
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Harald Hvaal
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-09 12:19 UTC by Stéphane Gourichon
Modified: 2009-06-07 06:05 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Error Log (sorry I did not manage to copy the text directly) (38.19 KB, image/png)
2009-01-10 22:29 UTC, Fnx
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stéphane Gourichon 2006-02-09 12:19:50 UTC
Version:           2.5.2 (using KDE 3.4.3, Kubuntu Package 4:3.4.3-0ubuntu2 )
Compiler:          Target: i486-linux-gnu
OS:                Linux (i686) release 2.6.12-10-686-smp

I received by e-mail a zip containing files, some of which have iso-8859-1 encoded file names. My system is Ubuntu 5.10 "Breezy badger" (which has UTF-8 as default encoding) with all regular updates applied. All core apps work in UTF-8 flawlessly. All files created and manipulated in KDE apps have UTF-8 encoded names.

The major problem is that ark cannot extract (e.g. using "show" context menu) files with iso-8859-1 names, but it works with files with us-ascii names in the same archive. A closer look shows that ark behavior is inconsistent in handling characters.

How to reproduce :
-have a KDE system where UTF-8 is the default encoding (e.g. Ubuntu 5.10 Breezy Badger or tune your existing system to use a UTF-8 based locale setting)
-make a zip archive that contains a file name with at least one iso-latin-1 characters (any character with diacritics should do the trick, e.g. "é" e with acute accent). The easiest way is to make it on a system based on latin1 encoding.
-click on the archive file in Konqueror to open it in ark
-the file names are correctly displayed in ark window, showing that at this time ark assumes iso-8859-1 encoding on file names in zip archive
-right-click on a file with accent in its name, choose "display"

Expected behavior :
-a suitable viewer appears (in my case, pdf). (This correct behavior happens when clicking on one of the file in the archive that has no accent in its name.)

Observed behavior :
-a dialog pops up saying "problem when extracting, use 'detail' button..."
-"detail" button unfolds a box, here is a relevant extract :

> Archive:  /home/sgourichon/Desktop/Encadrés9pdfCorrigésdeDémographie.zip

The line above suggests that in this display ark assumed iso-8859-1 encoding on archive zip file name, too, which is wrong, the name of the archive file is utf-8 encoded and the rest of the KDE apps (even the window title of ark) handle it well. This is not the cause of the real problem though.

>   94237  Defl:N    84112  11%  01-28-05 18:37  e90fba57  Loi de pesanteur démographique Corr.pdf

The line above suggests that ark assumed iso-8859-1 encoding on compressed file name, which is correct in this case.

> caution: filename not matched:  Loi de pesanteur démographique Corr.pdf

The line above suggests that ark was inconsistent. Instead of asking to the zip extractor for the exact string it received from it (whatever the encoding used), for some reason the zip extracting part was asked for a file name encoded in UTF-8 which doesn't exist in the archive.

This is the cause for the failure.

Suggested fix (sorry I cannot check in the source currently) :
when ark is receiving the listing of the file names which sit in the zip (from the zip extracting kioslave I presume), keep them in their unmodified pristine condition as binary "keys" to retrieve files (not as QString or anything with locale-dependent magic mangling). If you have to make assumption about their encoding for display, store the modified string in another, locale-dependent string, so that when the user asks for a file by clicking on a line, the binary "key" you send to the zip extracting module is exactly the file name as it was in the listing and not something assumed to be iso then converted to utf8 which made it meaningless since no utf-8 encoded file name matched in the archive.

Please notice that bug #61200 involves character encoding issues, but I filed a different bug because the chinese user did not report that he could not extract file (my problem) but that the names were incorrect in the display (a different problem which did not occur in my case).

Thank you for KDE, I hope this bug report can help.
Comment 1 Nikolay Pavlov 2007-06-18 06:02:14 UTC
I want to add that the is not a backend problem. For example i can easily extract  Russian .rar files with the help of rar console tool or gtk xarchver frontend, but i see wrong Cyrillic names in ark.    
Comment 2 Oscar Megia Lopez 2008-03-10 23:32:57 UTC
I have the same problem with filenames that contain spanish characters (accents , ñ, etc). When I try to extract the rar file Ark it shows an error:

Error - Ark

The extract operation fails

RAR 3.60 Copyright ...

And a file listing of the rar file ending with "no files to extract"

The filename that I try to extract contains a character º that Ark shows as §. If I try to extract with the unrar console tool it works fine. rar extracts the file without errors but the filename change the º character for ? when you run ls.

The easiest way is copy the way that unrar manages the charset.

Regards
Oscar
Comment 3 Fnx 2009-01-10 22:29:44 UTC
Created attachment 30112 [details]
Error Log (sorry I did not manage to copy the text directly)

I confirm that this bug is still present in Kubuntu Hardy 8.04 (Kde 3.5.10)
Zip file was likely to be done in windows.

Extraction of file Téléthon.doc and tract pension complète.pdf fails because of the accent in the file name
Comment 4 Raphael Kubo da Costa 2009-04-08 03:16:44 UTC
Can you please confirm if it still happens with KDE4.2?
Comment 5 Raphael Kubo da Costa 2009-06-07 06:05:25 UTC
There has been no answer to my previous request, and this bug seems to relate only to KDE3. This has probably been fixed in KDE4 (specially 4.3+). Closing.