Bug 403557 - UTF-8 characters are not saved correctly using the implicit path configuration module
Summary: UTF-8 characters are not saved correctly using the implicit path configuratio...
Status: RESOLVED FIXED
Alias: None
Product: systemsettings
Classification: Applications
Component: kcm_desktoppath (show other bugs)
Version: 5.15.0
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Unassigned bugs mailing-list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-24 13:19 UTC by Viorel-Cătălin Răpițeanu
Modified: 2019-02-20 16:19 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.56
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Viorel-Cătălin Răpițeanu 2019-01-24 13:19:07 UTC
SUMMARY
UTF-8 characters are not saved correctly using the implicit path configuration module.

STEPS TO REPRODUCE
1. Change the implicit path for a type of files (like Downloads) to contain special characters.

OBSERVED RESULT
1. The path chosen 'Téléchargements' is saved as encoded: 'T\xc3\xa9l\xc3\xa9chargements'


SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 5.14.90
KDE Plasma Version: 5.14.90
KDE Frameworks Version: 5.54.0
Qt Version: 5.12.0-3
Comment 1 Viorel-Cătălin Răpițeanu 2019-01-24 13:21:36 UTC
Video of the described issue:
https://drive.google.com/open?id=1OnMYnry8tabB-JqoOuCPb53R_XCNEYAU
Comment 2 Christophe Marin 2019-01-24 13:45:35 UTC
Confirmed.
Comment 3 Viorel-Cătălin Răpițeanu 2019-02-16 01:05:21 UTC
D17651 is the cause of this regression.
Comment 4 Albert Astals Cid 2019-02-16 23:57:10 UTC
Jos can you shed some light here?
Comment 5 Jos van den Oever 2019-02-17 11:23:56 UTC
The information for folder preferences is written to $HOME/.config/user-dirs.dirs

I've tried to recreate the bug by renaming my Downloads folder to Téléchargements. This gave me
```
XDG_DOWNLOAD_DIR="$HOME/Téléchargements"
``
in $HOME/.config/user-dirs.dirs
Reopening the settings dialog again showed me Téléchargements.

My environment has LANG=en_US.UTF-8.
KDE Frameworks 5.49.0 which is from before D17651.

I've tried asking LXR which code writes user-dirs.dirs.
https://lxr.kde.org/search?_filestring=&_string=user-dirs.dirs
but I'm not getting a reply.
Comment 7 Albert Astals Cid 2019-02-17 16:57:42 UTC
i think the problem here is that you're escaping a string that people don't expect to be escaped, so when Qt reads it back it breaks.

Why did you decided to escape anything bigger than 127 though? that's perfectly valid utf-8, isn't it?
Comment 8 Jos van den Oever 2019-02-17 17:51:10 UTC
I've just written a test to recreate the issue.

void KConfigTest::testQStringUtf8()
{
    QTemporaryFile file;
    QVERIFY(file.open());
    KConfig config(file.fileName(), KConfig::SimpleConfig);
    KConfigGroup general(&config, "General");
    const QString value("Téléchargements");
    general.writeEntry("Utf8", value);
    config.sync();
    file.flush();
    file.close();
    QFile readFile(file.fileName());
    QVERIFY(readFile.open(QFile::ReadOnly));
    // check that reading works
    KConfig config2(file.fileName(), KConfig::SimpleConfig);
    KConfigGroup general2(&config2, "General");
    QCOMPARE(value, general2.readEntry("Utf8", QByteArray()));
}

This passes. The é is escaped in the file. This is not needed. Any value above 126 is non-printable or valid UTF8 or another binary value.
Not escaping valid UTF8 sequences is an improvement. But that is not the issue here.

"Téléchargements" is escaped on saving. That is fine. But it should be unescaped when loading. This happens in the test, but not in plasma-desktop/kcms/desktoppaths/globalpaths.cpp.
Comment 9 Albert Astals Cid 2019-02-17 17:57:35 UTC
(In reply to Jos van den Oever from comment #8)
> I've just written a test to recreate the issue.
> 
> void KConfigTest::testQStringUtf8()
> {
>     QTemporaryFile file;
>     QVERIFY(file.open());
>     KConfig config(file.fileName(), KConfig::SimpleConfig);
>     KConfigGroup general(&config, "General");
>     const QString value("Téléchargements");
>     general.writeEntry("Utf8", value);
>     config.sync();
>     file.flush();
>     file.close();
>     QFile readFile(file.fileName());
>     QVERIFY(readFile.open(QFile::ReadOnly));
>     // check that reading works
>     KConfig config2(file.fileName(), KConfig::SimpleConfig);
>     KConfigGroup general2(&config2, "General");
>     QCOMPARE(value, general2.readEntry("Utf8", QByteArray()));
> }
> 
> This passes. The é is escaped in the file. This is not needed. Any value
> above 126 is non-printable or valid UTF8 or another binary value.
> Not escaping valid UTF8 sequences is an improvement. But that is not the
> issue here.
> 
> "Téléchargements" is escaped on saving. That is fine. But it should be
> unescaped when loading. This happens in the test, but not in
> plasma-desktop/kcms/desktoppaths/globalpaths.cpp.

I guess because the loading is done through
QUrl::fromLocalFile(QStandardPaths::writableLocation(QStandardPaths::DocumentsLocation));
Comment 10 Viorel-Cătălin Răpițeanu 2019-02-17 18:04:51 UTC
(In reply to Jos van den Oever from comment #8)
> "Téléchargements" is escaped on saving. That is fine. But it should be
> unescaped when loading. This happens in the test, but not in
> plasma-desktop/kcms/desktoppaths/globalpaths.cpp.

The problem is that any application(not just globalpaths) should load that directory path unescaped. In our case, that doesn't happen with firefox (that's just an application I've seen this happening).
Comment 11 Jos van den Oever 2019-02-17 18:20:59 UTC
globalpaths.cpp should not use KConfig to write user-dirs.dirs

KConfig escapes bytes >= 127. Improving that so that it does not escape UTF8 would be nice, but it would only solve this issue for users with UTF8 locales.
I think that `user-dirs.dirs` is written in the locale of the user.

https://code.woboq.org/qt5/qtbase/src/corelib/io/qstandardpaths_unix.cpp.html#177

This does not mention what the encoding of user-dirs.dirs is.
http://manpages.ubuntu.com/manpages/cosmic/man5/user-dirs.dirs.5.html
Comment 12 Jos van den Oever 2019-02-17 19:16:02 UTC
xdg-user-dirs documentation says: "This file is in a shell format, so its easy to access from a shell script." 

https://www.freedesktop.org/wiki/Software/xdg-user-dirs/

The tool xdg-user-dirs does not assume an encoding:
https://cgit.freedesktop.org/xdg/xdg-user-dirs/tree/xdg-user-dir-lookup.c#n84
Comment 13 Jos van den Oever 2019-02-17 22:43:29 UTC
Even though KConfig should not be used by globalpaths.cpp, KConfig can be improved.

https://phabricator.kde.org/D19107 does that.

But note that while the bug described here will probably go away for the reporters setup, the bug in globalpaths.cpp is not really solved.
Comment 14 David Faure 2019-02-18 21:44:03 UTC
Aren't all Unix distributions using utf-8 locales these days?
Comment 15 Jos van den Oever 2019-02-19 13:51:22 UTC
Most distributions do.

The first answer on this question affirms that assuming UTF-8 is generally safe.

https://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux
Comment 16 Jos van den Oever 2019-02-20 16:19:22 UTC
Git commit 2cdcd4f30666fd1095ab7cf31361e404db871075 by Jos van den Oever.
Committed on 20/02/2019 at 16:19.
Pushed by vandenoever into branch 'master'.

Write valid UTF8 characters without escaping.

Summary:
commit 6a18528 introduced escaping of bytes >= 127 to ensure that
KConfig files are valid UTF8.
The simplistic approach with a cutoff results in many escaped bytes
where it is not required. Especially non-western configuration files
would have many escapes.

This commit fixes that by only escaping bytes that are not valid UTF8.
FIXED-IN: 5.56

Test Plan: ninja && ninja test

Reviewers: dfaure, arichardson, apol, #frameworks, thiago

Subscribers: rapiteanu, kde-frameworks-devel

Tags: #frameworks

Differential Revision: https://phabricator.kde.org/D19107

M  +52   -2    autotests/kconfigtest.cpp
M  +2    -0    autotests/kconfigtest.h
M  +113  -9    src/core/kconfigini.cpp

https://commits.kde.org/kconfig/2cdcd4f30666fd1095ab7cf31361e404db871075