| Summary: | Incorrect text symbols when seeing non ASCII file names inside ZIP file | ||
|---|---|---|---|
| Product: | [Applications] krusader | Reporter: | Rafael Linux User <rafael.linux.user> |
| Component: | krarc | Assignee: | Krusader Bugs Distribution List <krusader-bugs-null> |
| Status: | RESOLVED INTENTIONAL | ||
| Severity: | normal | CC: | alex.bikadorov, krusader-bugs-null, rafael.linux.user |
| Priority: | NOR | ||
| Version First Reported In: | 2.5.0 | ||
| Target Milestone: | --- | ||
| Platform: | openSUSE | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: | Characters non ASCII not showed correctly | ||
|
Description
Rafael Linux User
2017-03-14 16:00:52 UTC
Created attachment 104563 [details]
Characters non ASCII not showed correctly
Please upload the test zip to some service that does not require a Google account, e.g. here. (In reply to Alex Bikadorov from comment #2) > Please upload the test zip to some service that does not require a Google > account, e.g. here. Sorry, I tried, but it's 9MiB size, so I'm looking another way to share it, but will be late. You could also create a simple ZIP file with the same (broken) names, but 0 byte files. We don't need the 9 MiB data :) Archive file (10MiB) at http://www.filedropper.com/fullhdcalibracionhd (In reply to Christoph Feck from comment #4) > You could also create a simple ZIP file with the same (broken) names, but 0 > byte files. We don't need the 9 MiB data :) Sorry, I think from my own computer I could recreate that issue. In fact, when I tried to reduce size (uncompressing and deleting the PDF file inside) and then I zipped from Krusader again, I notice I had TWO ARCHIVE FILENAMES IDENTICAL. So I investigate a little more and created a bug (I think is a bug) in OpenSUSE bugs: https://bugzilla.opensuse.org/show_bug.cgi?id=1029568 That's why I prefer to send you the original one archive. My apologizes. About having two archives with the same name: this is not a bug. It looks like the files have the same name but the characters are actually (damn)
...different. This is due to the encoding with UTF-8.
Try this command in a shell:
> LC_ALL=C ls -1b
(In reply to Alex Bikadorov from comment #8) > (damn) > > ...different. This is due to the encoding with UTF-8. > Try this command in a shell: > > LC_ALL=C ls -1b Thank you for sharing your knowledge, I didn't tried LC_ALL or the "b" parameter for "ls" command ;) Effectively, the result is: Full\ HD\ \302\251Calibracio\314\201n\ HD.zip Full\ HD\ \302\251Calibraci\303\263n\ HD.zip But I think there should be some os filesystem rule to avoid this problem. Maybe is a GDrive error (because I can see something weird in the filename when GDrive show the filename in the browser page ...). But I don't want to make you have a headhache about that, caused is not related and I'm sure you are very very busy. THANK YOU FOR YOU GOOD WORK ;) The problem is the UTF-8 encoding that allows the same shown letter to have different encodings, see http://stackoverflow.com/a/6153713/6286694. If different platforms (operating systems) decide different a encoding we can't To come back to the filenames *inside* the archive: The zip was created on another OS with an encoding that is not portable (but i couldn't find out which one). You will see the same characters when running "unzip -l" and the krarc protocol is doing exactly this: running the archive tool and parsing the output. So, if unzip can't handle this, krarc can't do this, too. The zip:/ protocol seems to work differently but it will probably impossible to fix this in krarc without changing the entire code. The archive is actually to blame, so this is a "wontfix" for me. Of course, somebody else can spend more time on this and reopen if wanted. more to read: http://unix.stackexchange.com/a/252000 https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/ https://github.com/rubyzip/rubyzip/wiki/Files-with-non-ascii-filenames Hi Alex
The articles you send my are very interesting and show the complexity of the case. Anyway IMHO, in these cases, it's preferable to show a "weird" character (like OpenSUSE terminal does) instead of let user think that "all is right".
I mean, I think Plasma should not represent visually "o'" like "ó", because really they are NOT the same character and can't get obtained thru a natural typing on a keyboard. In fact, "áéíóú" "ÁÉÍÓÚ" (in our case "ó") are obtained typing first "'" and after "o". So, why Plasma is showing "\314\201n\" like "\303\263n\" (this second one is a "typable" character, the first one not)?.
This will bring problems even in Krusader o a KDE terminal, cause both characters are showed like equals, but they are not.
Anyway, in fact, Krusader is showing that characters like a pair "+?" or "-?", instead like they are showed by a Linux terminal ("?"). Result is that Krusader don't let me extract any file of the ZIP archive :( so I think the status should be "SHOULDFIX" ;)
Thank you
(In reply to Alex Bikadorov from comment #10)
> The problem is the UTF-8 encoding that allows the same shown letter to have
> different encodings, see http://stackoverflow.com/a/6153713/6286694.
> If different platforms (operating systems) decide different a encoding we
> can't
>
> To come back to the filenames *inside* the archive: The zip was created on
> another OS with an encoding that is not portable (but i couldn't find out
> which one). You will see the same characters when running "unzip -l" and the
> krarc protocol is doing exactly this: running the archive tool and parsing
> the output.
> So, if unzip can't handle this, krarc can't do this, too.
>
> The zip:/ protocol seems to work differently but it will probably impossible
> to fix this in krarc without changing the entire code.
>
> The archive is actually to blame, so this is a "wontfix" for me. Of course,
> somebody else can spend more time on this and reopen if wanted.
wait, you mixing something up. > I mean, I think Plasma should not represent visually "o'" like "ó", because > really they are NOT the same character and can't get obtained thru a natural > typing on a keyboard. In fact, "áéíóú" "ÁÉÍÓÚ" (in our case "ó") are > obtained typing first "'" and after "o". So, why Plasma is showing > "\314\201n\" like ""n\" (this second one is a "typable" character, > the first one not)?. > This will bring problems even in Krusader o a KDE terminal, cause both > characters are showed like equals, but they are not. This has nothing to do with zip archives but only about filename representation with UTF-8. The character "ó" can have multiple encodings in UTF-8, namely "\303\263" and "\314\201". The first one is one character >U+00F3 ó 0303 0263 LATIN SMALL LETTER O WITH ACUTE and the second is the accent >U+0301 ́ 0314 0201 COMBINING ACUTE ACCENT which is the same character with a prior "0". Both are valid representations of "ó" and one application/library uses the first another the second one. Again: There is nothing wrong here. (And you should close the bug report for OpenSuse.) > Anyway, in fact, Krusader is showing that characters like a pair "+?" or > "-?", instead like they are showed by a Linux terminal ("?"). Result is that > Krusader don't let me extract any file of the ZIP archive :( so I think > the status should be "SHOULDFIX" ;) This is another issue about the filename encoding IN a zip archive. The point is that the archive was created with an invalid, non-standard, not-portable charset (not UTF-8). The KIO zip:/ protocol is using an own library (KArchive/KZip) and can compensate this. But Krusader is using the unzip tool. If unzip cannot correctly read the archive, Krusader can't either. You can also create a correct archive with the very same filenames and everything works correctly. This proves that there is no bug here. You can blame the creator of the archive.) I wish I could blame it, but no way to contact with him. Anyway, from a year to now, it's the first time it happens again. And always, as you explained, is related with zip archives (I HATE them). As you suggest me, I will close OpenSUSE bug just after this message. Anyway, thank you for ALL DETAILED information about this issue. Regards ;) |