Bug 432406 - Cannot open file with non-ascii name when LC_ALL is not set
Summary: Cannot open file with non-ascii name when LC_ALL is not set
Status: RESOLVED FIXED
Alias: None
Product: dolphin
Classification: Applications
Component: general (show other bugs)
Version: 20.08.3
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: tusooa
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-02 04:58 UTC by tusooa
Modified: 2021-02-05 21:07 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.79


Attachments
Backtrace with /usr/bin/dolphin and system kio (3.57 KB, text/plain)
2021-02-04 12:59 UTC, tusooa
Details
Backtrace with the dolphin and kio I compiled in Debug mode (3.91 KB, text/plain)
2021-02-04 13:00 UTC, tusooa
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tusooa 2021-02-02 04:58:39 UTC
SUMMARY

Trying to open a file whose absolute path contains non-ascii will prompt that the file does not exist. Setting LC_ALL to a utf-8 locale seems to be a workaround.

STEPS TO REPRODUCE
1. `touch 测试`
2. `locale`
3. Try to open that file in dolphin
4. (unset LC_CTYPE;export LANG=en_CA.UTF-8;dolphin) and try again
5. (unset LC_CTYPE;export LC_ALL=en_CA.UTF-8;dolphin) and try again


OBSERVED RESULT

2. 
] locale
LANG=en_CA.UTF-8
LC_CTYPE=zh_CN.UTF-8
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER="en_CA.UTF-8"
LC_NAME="en_CA.UTF-8"
LC_ADDRESS="en_CA.UTF-8"
LC_TELEPHONE="en_CA.UTF-8"
LC_MEASUREMENT="en_CA.UTF-8"
LC_IDENTIFICATION="en_CA.UTF-8"
LC_ALL=

3,4: Pops up a dialog with "Unable to run the command specified. The file or folder /home/tusooa/测试 does not exist."

5: Pops up a dialog to choose an application to open that file. After choosing one (/usr/bin/emacsclient), the file is successfully opened.

EXPECTED RESULT

3,4 should behave like 5.

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: 5.20.5
(available in About System)
KDE Plasma Version: 5.20.5
KDE Frameworks Version: 5.77.0
Qt Version: 5.15.2

ADDITIONAL INFORMATION

This only applies to opening files, not directories.

The file can be trashed properly.

It seems to matter only when an external program is needed to open the file.

Konqueror seems to also suffer from this bug. However, the LC_ALL trick will do nothing to it. Also it does not happen when you are opening an html file in Konqueror.

I am not sure whether this is a Dolphin and Konqueror bug, or this is a kio bug. If my judgement is wrong, please move it to the right product.
Comment 1 tusooa 2021-02-03 13:58:24 UTC
With no LC_ALL=en_CA.UTF-8,

codec is  "US-ASCII" mibEnum= 3
locale is "en_CA" ,  "en-CA"

With LC_ALL=en_CA.UTF-8 the codec is set to UTF-8. Not sure why.
Comment 2 tusooa 2021-02-04 00:51:20 UTC
QTextCodec gets the charset from ucnv_getDefaultName() which will return "US-ASCII" if setlocale(LC_ALL, "") is not called before that. 

https://bugreports.qt.io/browse/QTBUG-57522 describes how this case may happen. And it seems the Qt person refused to fix it. This may be the cause of this bug.
Comment 3 tusooa 2021-02-04 12:59:41 UTC
Created attachment 135420 [details]
Backtrace with /usr/bin/dolphin and system kio
Comment 4 tusooa 2021-02-04 13:00:15 UTC
Created attachment 135421 [details]
Backtrace with the dolphin and kio I compiled in Debug mode
Comment 5 tusooa 2021-02-04 13:00:55 UTC
I used the following code to override ucnv_getDefaultName() to call std::terminate() and thus allow us to get a backtrace:

```
#include <exception>

extern "C" const char *ucnv_getDefaultName(void)
{
    std::terminate();
    return "";
}
```

Compile it with `g++ ucnv-override.cpp -shared -fPIC -o ucnv-override.so` and set LD_PRELOAD=/path/to/ucnv-override.so , we get two backtraces: one with system dolphin and kio, another with those I compiled with debugging symbols.
Comment 6 tusooa 2021-02-04 13:14:04 UTC
Replacing the 

static QLoggingCategory category("kf.kio.widgets.kdirmodel", QtInfoMsg);

in kdirmodel.cpp with

Q_LOGGING_CATEGORY(category, "kf.kio.widgets.kdirmodel", QtInfoMsg);

resolves this bug.

From https://doc.qt.io/qt-5/qloggingcategory.html#Q_LOGGING_CATEGORY-1 :
"The implicitly-defined QLoggingCategory object is created on first use, in a thread-safe manner."

The original way to explicitly define a static QLoggingCategory will lead to a call to ucnv_getDefaultName() before QApplication constructor (where setlocale() is called), thus making QTextCodec::codecForLocale() misbehave. Here we replace the explicit definition with the Q_LOGGING_CATEGORY macro, and thus avoid this problem.
Comment 7 tusooa 2021-02-04 13:20:19 UTC
Git commit b3b545bb47b2dfbff3f88a60c0922bdb589f20ce by Tusooa Zhu.
Committed on 04/02/2021 at 13:15.
Pushed by tusooaw into branch 'tusooaw/432406-non-ascii-file'.

Fix default codec being set to "US-ASCII" in KIO apps

From https://doc.qt.io/qt-5/qloggingcategory.html#Q_LOGGING_CATEGORY-1 :
"The implicitly-defined QLoggingCategory object is created on first use,
in a thread-safe manner."

The original way to explicitly define a static QLoggingCategory will lead
to a call to ucnv_getDefaultName() before QApplication constructor (where
setlocale() is called), thus making QTextCodec::codecForLocale() misbehave.
Here we replace the explicit definition with the Q_LOGGING_CATEGORY macro,
and thus avoid this problem.

M  +1    -1    src/widgets/kdirmodel.cpp

https://invent.kde.org/frameworks/kio/commit/b3b545bb47b2dfbff3f88a60c0922bdb589f20ce
Comment 8 tusooa 2021-02-04 14:17:41 UTC
As this is not yet merged, reopen this.
Comment 9 Luigi Toscano 2021-02-04 14:25:53 UTC
It may not be the best policy, but bugs are marked RESOLVED/FIXED when the fix is merged, so let's keep it for consistency.

(I may have heard the original plan was to use CLOSED/FIXED as the final-final state, but it never happened. Anyway the "Latest Commit" field should help).
Comment 10 Nate Graham 2021-02-04 14:32:44 UTC
Tusooa meant that the branch has not yet been merged to master. The hookscript got confused because the branch was not prefixed with work/, so it interpreted the Bug: keyword to mean that this bug should be closed.
Comment 11 tusooa 2021-02-04 14:36:22 UTC
(In reply to Nate Graham from comment #10)
> Tusooa meant that the branch has not yet been merged to master. The
> hookscript got confused because the branch was not prefixed with work/, so
> it interpreted the Bug: keyword to mean that this bug should be closed.

Yes, that is what i mean--Because in Krita people tend to name their branch prefixed with their username and I thought doing that will not trigger the closing of a bug, but looks like I messed up.
Comment 12 tusooa 2021-02-05 13:04:01 UTC
Git commit 423bea37e4bb06b9c04a62a9a91b720ec5d5e3d5 by Tusooa Zhu.
Committed on 04/02/2021 at 23:06.
Pushed by tusooaw into branch 'master'.

Use Q_LOGGING_CATEGORY macro instead of explicit QLoggingCategory

Explicitly defining a QLoggingCategory will make QTextCodec::codecForLocale()
misbehave, so we change it to Q_LOGGING_CATEGORY macro.

M  +1    -1    src/kpasswdserver/kpasswdserver.cpp
M  +1    -1    src/urifilters/ikws/kuriikwsfilter.cpp
M  +1    -1    src/urifilters/ikws/kuriikwsfiltereng.cpp
M  +1    -1    src/urifilters/ikws/kurisearchfilter.cpp
M  +1    -1    src/urifilters/localdomain/localdomainurifilter.cpp
M  +1    -1    src/urifilters/shorturi/kshorturifilter.cpp

https://invent.kde.org/frameworks/kio/commit/423bea37e4bb06b9c04a62a9a91b720ec5d5e3d5
Comment 13 tusooa 2021-02-05 13:04:09 UTC
Git commit 0a13e0a3e830be2b2b2e5c2c6cf8cef25bd68bd8 by Tusooa Zhu.
Committed on 04/02/2021 at 14:45.
Pushed by tusooaw into branch 'master'.

Fix default codec being set to "US-ASCII" in KIO apps

From https://doc.qt.io/qt-5/qloggingcategory.html#Q_LOGGING_CATEGORY-1 :
"The implicitly-defined QLoggingCategory object is created on first use,
in a thread-safe manner."

The original way to explicitly define a static QLoggingCategory will lead
to a call to ucnv_getDefaultName() before QApplication constructor (where
setlocale() is called), thus making QTextCodec::codecForLocale() misbehave.
Here we replace the explicit definition with the Q_LOGGING_CATEGORY macro,
and thus avoid this problem.

M  +1    -1    src/widgets/kdirmodel.cpp

https://invent.kde.org/frameworks/kio/commit/0a13e0a3e830be2b2b2e5c2c6cf8cef25bd68bd8