SUMMARY UTF-8 characters are not saved correctly using the implicit path configuration module. STEPS TO REPRODUCE 1. Change the implicit path for a type of files (like Downloads) to contain special characters. OBSERVED RESULT 1. The path chosen 'Téléchargements' is saved as encoded: 'T\xc3\xa9l\xc3\xa9chargements' SOFTWARE/OS VERSIONS Linux/KDE Plasma: 5.14.90 KDE Plasma Version: 5.14.90 KDE Frameworks Version: 5.54.0 Qt Version: 5.12.0-3
Video of the described issue: https://drive.google.com/open?id=1OnMYnry8tabB-JqoOuCPb53R_XCNEYAU
Confirmed.
D17651 is the cause of this regression.
Jos can you shed some light here?
The information for folder preferences is written to $HOME/.config/user-dirs.dirs I've tried to recreate the bug by renaming my Downloads folder to Téléchargements. This gave me ``` XDG_DOWNLOAD_DIR="$HOME/Téléchargements" `` in $HOME/.config/user-dirs.dirs Reopening the settings dialog again showed me Téléchargements. My environment has LANG=en_US.UTF-8. KDE Frameworks 5.49.0 which is from before D17651. I've tried asking LXR which code writes user-dirs.dirs. https://lxr.kde.org/search?_filestring=&_string=user-dirs.dirs but I'm not getting a reply.
Got a reply now. https://lxr.kde.org/source/kde/workspace/plasma-desktop/kcms/desktoppaths/globalpaths.cpp
i think the problem here is that you're escaping a string that people don't expect to be escaped, so when Qt reads it back it breaks. Why did you decided to escape anything bigger than 127 though? that's perfectly valid utf-8, isn't it?
I've just written a test to recreate the issue. void KConfigTest::testQStringUtf8() { QTemporaryFile file; QVERIFY(file.open()); KConfig config(file.fileName(), KConfig::SimpleConfig); KConfigGroup general(&config, "General"); const QString value("Téléchargements"); general.writeEntry("Utf8", value); config.sync(); file.flush(); file.close(); QFile readFile(file.fileName()); QVERIFY(readFile.open(QFile::ReadOnly)); // check that reading works KConfig config2(file.fileName(), KConfig::SimpleConfig); KConfigGroup general2(&config2, "General"); QCOMPARE(value, general2.readEntry("Utf8", QByteArray())); } This passes. The é is escaped in the file. This is not needed. Any value above 126 is non-printable or valid UTF8 or another binary value. Not escaping valid UTF8 sequences is an improvement. But that is not the issue here. "Téléchargements" is escaped on saving. That is fine. But it should be unescaped when loading. This happens in the test, but not in plasma-desktop/kcms/desktoppaths/globalpaths.cpp.
(In reply to Jos van den Oever from comment #8) > I've just written a test to recreate the issue. > > void KConfigTest::testQStringUtf8() > { > QTemporaryFile file; > QVERIFY(file.open()); > KConfig config(file.fileName(), KConfig::SimpleConfig); > KConfigGroup general(&config, "General"); > const QString value("Téléchargements"); > general.writeEntry("Utf8", value); > config.sync(); > file.flush(); > file.close(); > QFile readFile(file.fileName()); > QVERIFY(readFile.open(QFile::ReadOnly)); > // check that reading works > KConfig config2(file.fileName(), KConfig::SimpleConfig); > KConfigGroup general2(&config2, "General"); > QCOMPARE(value, general2.readEntry("Utf8", QByteArray())); > } > > This passes. The é is escaped in the file. This is not needed. Any value > above 126 is non-printable or valid UTF8 or another binary value. > Not escaping valid UTF8 sequences is an improvement. But that is not the > issue here. > > "Téléchargements" is escaped on saving. That is fine. But it should be > unescaped when loading. This happens in the test, but not in > plasma-desktop/kcms/desktoppaths/globalpaths.cpp. I guess because the loading is done through QUrl::fromLocalFile(QStandardPaths::writableLocation(QStandardPaths::DocumentsLocation));
(In reply to Jos van den Oever from comment #8) > "Téléchargements" is escaped on saving. That is fine. But it should be > unescaped when loading. This happens in the test, but not in > plasma-desktop/kcms/desktoppaths/globalpaths.cpp. The problem is that any application(not just globalpaths) should load that directory path unescaped. In our case, that doesn't happen with firefox (that's just an application I've seen this happening).
globalpaths.cpp should not use KConfig to write user-dirs.dirs KConfig escapes bytes >= 127. Improving that so that it does not escape UTF8 would be nice, but it would only solve this issue for users with UTF8 locales. I think that `user-dirs.dirs` is written in the locale of the user. https://code.woboq.org/qt5/qtbase/src/corelib/io/qstandardpaths_unix.cpp.html#177 This does not mention what the encoding of user-dirs.dirs is. http://manpages.ubuntu.com/manpages/cosmic/man5/user-dirs.dirs.5.html
xdg-user-dirs documentation says: "This file is in a shell format, so its easy to access from a shell script." https://www.freedesktop.org/wiki/Software/xdg-user-dirs/ The tool xdg-user-dirs does not assume an encoding: https://cgit.freedesktop.org/xdg/xdg-user-dirs/tree/xdg-user-dir-lookup.c#n84
Even though KConfig should not be used by globalpaths.cpp, KConfig can be improved. https://phabricator.kde.org/D19107 does that. But note that while the bug described here will probably go away for the reporters setup, the bug in globalpaths.cpp is not really solved.
Aren't all Unix distributions using utf-8 locales these days?
Most distributions do. The first answer on this question affirms that assuming UTF-8 is generally safe. https://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux
Git commit 2cdcd4f30666fd1095ab7cf31361e404db871075 by Jos van den Oever. Committed on 20/02/2019 at 16:19. Pushed by vandenoever into branch 'master'. Write valid UTF8 characters without escaping. Summary: commit 6a18528 introduced escaping of bytes >= 127 to ensure that KConfig files are valid UTF8. The simplistic approach with a cutoff results in many escaped bytes where it is not required. Especially non-western configuration files would have many escapes. This commit fixes that by only escaping bytes that are not valid UTF8. FIXED-IN: 5.56 Test Plan: ninja && ninja test Reviewers: dfaure, arichardson, apol, #frameworks, thiago Subscribers: rapiteanu, kde-frameworks-devel Tags: #frameworks Differential Revision: https://phabricator.kde.org/D19107 M +52 -2 autotests/kconfigtest.cpp M +2 -0 autotests/kconfigtest.h M +113 -9 src/core/kconfigini.cpp https://commits.kde.org/kconfig/2cdcd4f30666fd1095ab7cf31361e404db871075