Summary: | krosspython: UTF-8 python strings are encoded as ASCII when converted to QString | ||
---|---|---|---|
Product: | [Developer tools] bindings | Reporter: | Daniel Calviño Sánchez <danxuliu> |
Component: | general | Assignee: | kde-bindings |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
Patch for unittest.py showing the bug
Patch for pythonvariant.h with a workaround to the bug |
Description
Daniel Calviño Sánchez
2009-10-01 00:38:43 UTC
Created attachment 37271 [details]
Patch for unittest.py showing the bug
Here it is a unit test showing the problem: the file is encoded in UTF-8, so are the strings, but the string returned by the method is different than the Python one.
However the problem happens, as already said, not when the string is returned by the method and thus converted from QString to Python string, but when it is passed to the method and converted from Python string to QString.
Created attachment 37272 [details]
Patch for pythonvariant.h with a workaround to the bug
Here it is a workaround for the bug. I am not familiar with Kross code, so it may be a better way to do it. The problem was that Py::String(obj).as_string().c_str(); returned the const char* data of a std::string, and therefore the UTF-8 characters were split in several C++ char. As toVariant(const Py::Object& obj) (the method the sentence belongs to) returns a QString, the const char* was "casted" to QString using QString(const char * str) constructor, which expects an ASCII string and thus ignores that there are multibyte characters.
In order to initialize a QString from a UTF-8 const char* taking into account multibyte characters, static method QString::fromUtf8(const char*) has to be used. As 7 bit characters in UTF-8 are the same as in ASCII, this keeps backwards compatibility with already existing code that used ASCII.
Of course, when a different encoding than UTF-8 is used in Python strings the same problem of bad encoded QStrings arises again, although personally I think that it is pretty safe to assume that the majority of strings will be encoded in UTF-8.
SVN commit 1030960 by sebsauer: fix UTF-8 python strings encoding Patch by Daniel Calviño Sánchez BUG:209046 M +1 -1 pythonvariant.h WebSVN link: http://websvn.kde.org/?view=rev&revision=1030960 SVN commit 1030961 by sebsauer: backport r1030960 from trunk to 4.3 branch; fix UTF-8 python strings encoding Patch by Daniel Calviño Sánchez BUG:209046 M +1 -1 pythonvariant.h WebSVN link: http://websvn.kde.org/?view=rev&revision=1030961 great work, thanks :) |