Bug 393901 - can't unzip password protected files with password not in UTF-8 encoding
Summary: can't unzip password protected files with password not in UTF-8 encoding
Status: CONFIRMED
Alias: None
Product: ark
Classification: Applications
Component: plugins (show other bugs)
Version: unspecified
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: Elvis Angelaccio
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-06 09:29 UTC by Jakub Holý
Modified: 2022-12-04 12:23 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
test files to reproduce (1.19 KB, application/octet-stream)
2019-02-13 01:47 UTC, Jakub Holý
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Holý 2018-05-06 09:29:20 UTC
I have a zip file, that has been packed on Windows and is password protected. Unfortunately, the password has some local characters from cp1250 charset - like ž, ř, ů.

When I try to unpack the zip, Ark asks me for password correctly, but there is no way I can enter the non-UTF8 characters into the field.

Can you add an encoding selector for the password?
Comment 1 Elvis Angelaccio 2018-05-06 09:37:23 UTC
(In reply to Jakub Holý from comment #0)
> I have a zip file, that has been packed on Windows and is password
> protected. Unfortunately, the password has some local characters from cp1250
> charset - like ž, ř, ů.
> 
> When I try to unpack the zip, Ark asks me for password correctly, but there
> is no way I can enter the non-UTF8 characters into the field.

How so? Are you saying that you cannot even _type_ those characters into the password dialog? I just tried and I was able to paste the ů character just fine.
Comment 2 Jakub Holý 2018-05-06 10:29:19 UTC
> How so? Are you saying that you cannot even _type_ those characters into the
> password dialog? I just tried and I was able to paste the ů character just
> fine.

I cannot type the localized ů - it just enters the utf8 ů (0x016F), not the 0xF9 character.
Comment 3 Christoph Feck 2018-05-06 11:50:23 UTC
Are you able to extract the file using 7z or zip from the command line?
Comment 4 Jakub Holý 2018-05-06 11:54:58 UTC
yes, but its a little complicated.

I have to `cat` the password in a file, `recode utf8..cp1250 pass.txt` and then use `unzip -P pass.txt zipfile.zip`

It's quite difficult, and impossible for my mum :-)
Comment 5 Viorel-Cătălin Răpițeanu 2019-02-12 22:41:29 UTC
I was unable to reproduce this using the latest version of Ark. Can you retest this using the latest versions?
Comment 6 Jakub Holý 2019-02-13 01:47:28 UTC
Created attachment 118030 [details]
test files to reproduce

I have created an example :-)

In the general zip (so I would not have to upload 4 files) are a few files.

hello.txt - file i have zipped/packed
pass_utf8.txt - contains password in utf8, which I am able to enter quite easily on any czech keyboard or just by copy/paste, following file was encrypted with it
hello_utf8.zip - zipped file with password "žena" (woman in czech)

You can try this and you shoul'd be able to extract the zip just fine.

pass_cp1250.txt - also "žena" password, but converted to cp1250
hello_cp1250.zip - encrypted with the password in cp1250 encoding

If you try to open this, there is no way you can write the cp1250 password into the password prompt.

(examine both the passwords in hex editor, they are really different - utf8 pass has 1 byte more)


So what I would like to have:
where: password prompt dialogue window
what: something like the encoding selectbox here https://i.imgur.com/1cwfiCr.png
Comment 7 Viorel-Cătălin Răpițeanu 2019-02-13 12:54:56 UTC
I finally understood the scenario. Thanks for the provided zip-file.
This indeed looks like an issue.

It's kinda weird that not even Kate realizes the encoding (cp1250) when opening the file initially.
Comment 8 Jakub Holý 2019-02-13 17:55:53 UTC
There is no way for Kate to know, as there is no BOM.
It might be cp1250 as well as cp1252 or cp1251 or any other single byte encoding.
Comment 9 2wxsy58236r3 2020-07-05 07:06:01 UTC
Bug reporter, would you please try The Unarchiver (unar)? I can unzip the file by `unar -E windows-1250 hello_cp1250.zip`.

Just type the password in UTF-8 and I believe that `unar` will convert it to the specified encoding.
Comment 10 Jakub Holý 2020-07-05 07:27:44 UTC
Yes, `unar` works, as well as previously mentioned `recode`+`unzip`.

But this is still not GUI :-(

Thanks anyway
Comment 11 2wxsy58236r3 2020-08-31 09:39:28 UTC
libzip developer says that libzip just takes the bytes as given and does not check any encoding, and makes no assumptions about the encoding of the password. [1]

Hope this helps for Ark developers.

[1] https://github.com/nih-at/libzip/issues/207#issuecomment-683630284
Comment 12 Elvis Angelaccio 2022-12-03 11:49:24 UTC
I played a bit with the test file (thanks for that!).

If we convert the UTF-16 password provided by the ark password dialog to the windows-1250 encoding using QTextCodec, the archive is extracted just fine (since libzip just uses the raw bytes and doesn't care about the encoding). 

The problem is: how do we ask the user which encoding wants to use for the password? We can't really do it in the password dialog, because ark uses the general-purpose KPasswordDialog provided by kwidgetsaddons.

The easiest thing could be to add a dropdown menu in the ark settings to configure which encoding to use for passwords.

There is the additional problem that QTextCodec is gone in Qt6 and the replacement doesn't yet have feature parity: https://phabricator.kde.org/T14154

But since we need QTextCodec for Kate I don't think would be too bad if we keep using it in Ark too.