When I have some ZIP file with non-ASCII characters and I open it on Krusader, it show a wrong character instead the right one. "Ark" KDE archive tool show correctly those filenames. If I extract that files, the names finally are created correctly. Example ZIP: https://goo.gl/g6pt8D with filenames with non ASCII characters that are showed like a question mark inside a diamond in Krusader. ├── 01.- Recorte & Nitidez.jpg ├── 02.- Brillo.jpg ├── 03.- Contraste.jpg ├── 04.- Escala de Colores.jpg ├── 05.- Barras de Color de Calibración HD.jpg ├── 06.- Barras de Color de Calibración HD (Negro).jpg ├── 07.- Patrones de Verificación │ ├── 01.- Croma 4-4-4 & 4-2-2.png │ ├── 02.- Prueba de degradado.jpg │ ├── 03.- 0-100%.jpg │ ├── 04.- Widscreen.jpg │ ├── 05.- Academia.jpg │ ├── 06.- Panavisión.jpg │ └── 07.- 4-3.jpg ├── 08.- Patrones Avanzados │ ├── 01.- Patron de Color & Tinte.jpg │ └── 02.- 75% Para televisores con C.M.S. activado.jpg └── Información sobre Calibración HD.pdf
Created attachment 104563 [details] Characters non ASCII not showed correctly
Please upload the test zip to some service that does not require a Google account, e.g. here.
(In reply to Alex Bikadorov from comment #2) > Please upload the test zip to some service that does not require a Google > account, e.g. here. Sorry, I tried, but it's 9MiB size, so I'm looking another way to share it, but will be late.
You could also create a simple ZIP file with the same (broken) names, but 0 byte files. We don't need the 9 MiB data :)
Archive file (10MiB) at http://www.filedropper.com/fullhdcalibracionhd
(In reply to Christoph Feck from comment #4) > You could also create a simple ZIP file with the same (broken) names, but 0 > byte files. We don't need the 9 MiB data :) Sorry, I think from my own computer I could recreate that issue. In fact, when I tried to reduce size (uncompressing and deleting the PDF file inside) and then I zipped from Krusader again, I notice I had TWO ARCHIVE FILENAMES IDENTICAL. So I investigate a little more and created a bug (I think is a bug) in OpenSUSE bugs: https://bugzilla.opensuse.org/show_bug.cgi?id=1029568 That's why I prefer to send you the original one archive. My apologizes.
About having two archives with the same name: this is not a bug. It looks like the files have the same name but the characters are actually
(damn) ...different. This is due to the encoding with UTF-8. Try this command in a shell: > LC_ALL=C ls -1b
(In reply to Alex Bikadorov from comment #8) > (damn) > > ...different. This is due to the encoding with UTF-8. > Try this command in a shell: > > LC_ALL=C ls -1b Thank you for sharing your knowledge, I didn't tried LC_ALL or the "b" parameter for "ls" command ;) Effectively, the result is: Full\ HD\ \302\251Calibracio\314\201n\ HD.zip Full\ HD\ \302\251Calibraci\303\263n\ HD.zip But I think there should be some os filesystem rule to avoid this problem. Maybe is a GDrive error (because I can see something weird in the filename when GDrive show the filename in the browser page ...). But I don't want to make you have a headhache about that, caused is not related and I'm sure you are very very busy. THANK YOU FOR YOU GOOD WORK ;)
The problem is the UTF-8 encoding that allows the same shown letter to have different encodings, see http://stackoverflow.com/a/6153713/6286694. If different platforms (operating systems) decide different a encoding we can't To come back to the filenames *inside* the archive: The zip was created on another OS with an encoding that is not portable (but i couldn't find out which one). You will see the same characters when running "unzip -l" and the krarc protocol is doing exactly this: running the archive tool and parsing the output. So, if unzip can't handle this, krarc can't do this, too. The zip:/ protocol seems to work differently but it will probably impossible to fix this in krarc without changing the entire code. The archive is actually to blame, so this is a "wontfix" for me. Of course, somebody else can spend more time on this and reopen if wanted.
more to read: http://unix.stackexchange.com/a/252000 https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/ https://github.com/rubyzip/rubyzip/wiki/Files-with-non-ascii-filenames
Hi Alex The articles you send my are very interesting and show the complexity of the case. Anyway IMHO, in these cases, it's preferable to show a "weird" character (like OpenSUSE terminal does) instead of let user think that "all is right". I mean, I think Plasma should not represent visually "o'" like "ó", because really they are NOT the same character and can't get obtained thru a natural typing on a keyboard. In fact, "áéíóú" "ÁÉÍÓÚ" (in our case "ó") are obtained typing first "'" and after "o". So, why Plasma is showing "\314\201n\" like "\303\263n\" (this second one is a "typable" character, the first one not)?. This will bring problems even in Krusader o a KDE terminal, cause both characters are showed like equals, but they are not. Anyway, in fact, Krusader is showing that characters like a pair "+?" or "-?", instead like they are showed by a Linux terminal ("?"). Result is that Krusader don't let me extract any file of the ZIP archive :( so I think the status should be "SHOULDFIX" ;) Thank you (In reply to Alex Bikadorov from comment #10) > The problem is the UTF-8 encoding that allows the same shown letter to have > different encodings, see http://stackoverflow.com/a/6153713/6286694. > If different platforms (operating systems) decide different a encoding we > can't > > To come back to the filenames *inside* the archive: The zip was created on > another OS with an encoding that is not portable (but i couldn't find out > which one). You will see the same characters when running "unzip -l" and the > krarc protocol is doing exactly this: running the archive tool and parsing > the output. > So, if unzip can't handle this, krarc can't do this, too. > > The zip:/ protocol seems to work differently but it will probably impossible > to fix this in krarc without changing the entire code. > > The archive is actually to blame, so this is a "wontfix" for me. Of course, > somebody else can spend more time on this and reopen if wanted.
wait, you mixing something up. > I mean, I think Plasma should not represent visually "o'" like "ó", because > really they are NOT the same character and can't get obtained thru a natural > typing on a keyboard. In fact, "áéíóú" "ÁÉÍÓÚ" (in our case "ó") are > obtained typing first "'" and after "o". So, why Plasma is showing > "\314\201n\" like ""n\" (this second one is a "typable" character, > the first one not)?. > This will bring problems even in Krusader o a KDE terminal, cause both > characters are showed like equals, but they are not. This has nothing to do with zip archives but only about filename representation with UTF-8. The character "ó" can have multiple encodings in UTF-8, namely "\303\263" and "\314\201". The first one is one character >U+00F3 ó 0303 0263 LATIN SMALL LETTER O WITH ACUTE and the second is the accent >U+0301 ́ 0314 0201 COMBINING ACUTE ACCENT which is the same character with a prior "0". Both are valid representations of "ó" and one application/library uses the first another the second one. Again: There is nothing wrong here. (And you should close the bug report for OpenSuse.) > Anyway, in fact, Krusader is showing that characters like a pair "+?" or > "-?", instead like they are showed by a Linux terminal ("?"). Result is that > Krusader don't let me extract any file of the ZIP archive :( so I think > the status should be "SHOULDFIX" ;) This is another issue about the filename encoding IN a zip archive. The point is that the archive was created with an invalid, non-standard, not-portable charset (not UTF-8). The KIO zip:/ protocol is using an own library (KArchive/KZip) and can compensate this. But Krusader is using the unzip tool. If unzip cannot correctly read the archive, Krusader can't either. You can also create a correct archive with the very same filenames and everything works correctly. This proves that there is no bug here. You can blame the creator of the archive.)
I wish I could blame it, but no way to contact with him. Anyway, from a year to now, it's the first time it happens again. And always, as you explained, is related with zip archives (I HATE them). As you suggest me, I will close OpenSUSE bug just after this message. Anyway, thank you for ALL DETAILED information about this issue. Regards ;)