Bug 93466 - xml document without encoding= not loaded in utf8
Summary: xml document without encoding= not loaded in utf8
Status: RESOLVED DUPLICATE of bug 55355
Alias: None
Product: kate
Classification: Applications
Component: part (show other bugs)
Version: unspecified
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-11-17 20:23 UTC by S. Burmeister
Modified: 2005-11-07 21:40 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
xml file (17.55 KB, text/plain)
2004-11-17 22:58 UTC, S. Burmeister
Details

Note You need to log in before you can comment on or make changes to this bug.
Description S. Burmeister 2004-11-17 20:23:59 UTC
Version:           post 3.3.0 (using KDE KDE 3.3.1)
Installed from:    SuSE RPMs
OS:                Linux

I have a document that has <?xml version="1.0"?> as first line. I was told that if no encoding is set, XML should use utf8. However when I open that document in quanta it is opened with the wrong encoding. Only when I open it from within a project with utf8 set to be the default encoding it is displayed correctly.
Comment 1 András Manţia 2004-11-17 21:55:35 UTC
On Wednesday 17 November 2004 21:24, S.Burmeister wrote:
> I have a document that has <?xml version="1.0"?> as first line. I was
> told that if no encoding is set, XML should use utf8. However when I
> open that document in quanta it is opened with the wrong encoding.
More exactly? 

> Only when I open it from within a project with utf8 set to be the
> default encoding it is displayed correctly.
Sure, as I suppose Quanta (or more exactly the Kate part) detects the 
encoding by verifying the characters present in the document. You have 
two choices (or three):
1. in the open dialog select the encoding you want
2. set the global default encoding to UTF8
3. move the report to Kate and tell them to detect the encoding based on 
the file type & content (I mean, Kate detects if it's an XML file and 
has the charset line, etc.)

Andras

Comment 2 S. Burmeister 2004-11-17 22:55:23 UTC
> More exactly?

I'll attach it. BTW, there is another bug, as this document is not highlighted 
for XML but none, maybe because it's named .xsl?

> > Only when I open it from within a project with utf8 set to be the
> > default encoding it is displayed correctly.
>
> Sure, as I suppose Quanta (or more exactly the Kate part) detects the
> encoding by verifying the characters present in the document. You have
> two choices (or three):
> 1. in the open dialog select the encoding you want
> 2. set the global default encoding to UTF8
> 3. move the report to Kate and tell them to detect the encoding based on
> the file type & content (I mean, Kate detects if it's an XML file and
> has the charset line, etc.)

Yet, if I open it in kate it displays correctly. If I open an instance of 
quanta, then select the file in question, open with..., quanta, the very same 
file is opened in another quanta instance, displayed incorrectly. I tried 
different settings and it seems that when I open it with quanta opening a 
utf8 encoded project at the same time (because of it being opened when 
closing it before) the file is displayed correctly on the first instance, if 
I leave that instance open and open the same file in another instace, using 
open with..., quanta opens without opening the project that was open on last 
exit (a feature I guess) and displaying the file incorrectly. The file is 
also displayed incorrectly when I close all quanta instances with no project 
open and then open the file using open with...

So what I do not understand is, if kate gets it right without having a project 
open that tells it to use utf8 for the document, why would quanta fail?

Comment 3 S. Burmeister 2004-11-17 22:58:53 UTC
Created attachment 8322 [details]
xml file

the file in question
Comment 4 András Manţia 2004-11-22 12:01:06 UTC
On Wednesday 17 November 2004 23:55, S.Burmeister wrote:
> I'll attach it. BTW, there is another bug, as this document is not
> highlighted for XML but none, maybe because it's named .xsl?
Might be. The highlighting is done by Kate. Does it highlight correctly? 
Here Quanta highlights it correctly (xslt) as well, but I have KDE CVS 
HEAD.

> Yet, if I open it in kate it displays correctly. If I open an
> instance of quanta, then select the file in question, open with...,
> quanta, the very same file is opened in another quanta instance,
> displayed incorrectly. 

But what is the global encoding setting (Settings->Configure 
Quanta->Default character encoding) in your Quanta? Quanta uses that 
one if a document is opened in other ways but File->Open. In case of 
File->Open, you can override it in the Open dialog.

> I tried different settings and it seems that 
> when I open it with quanta opening a utf8 encoded project at the same
> time (because of it being opened when closing it before) the file is
> displayed correctly on the first instance, if I leave that instance
> open and open the same file in another instace, using open with...,
> quanta opens without opening the project that was open on last exit
> (a feature I guess)

It doesn't allow to open the same project in two instances.

> and displaying the file incorrectly.

Because no project is loaded, so the global encoding setting is used, 
not the project one.

Please check that setting, and let me know if this was the problem or 
not.

Comment 5 S. Burmeister 2004-11-22 13:47:29 UTC
> Might be. The highlighting is done by Kate. Does it highlight correctly?
> Here Quanta highlights it correctly (xslt) as well, but I have KDE CVS
> HEAD.

Kate does not hoghlight it, as there is no highlighting with *.xsl attached, 
yet it should recognise the <?xml, I guess.

> > Yet, if I open it in kate it displays correctly. If I open an
> > instance of quanta, then select the file in question, open with...,
> > quanta, the very same file is opened in another quanta instance,
> > displayed incorrectly.
>
> But what is the global encoding setting (Settings->Configure
> Quanta->Default character encoding) in your Quanta? Quanta uses that
> one if a document is opened in other ways but File->Open. In case of
> File->Open, you can override it in the Open dialog.

Why should Quanta use the default encoding, if a file states what encoding it 
uses? Using the default only makes sense for new documents and for documents, 
that do not state their encoding. Why would somebody try to read a document 
in Chinese, if the first line states that it is written in English? In case 
of xml, not stating the encoding means utf8.

> Please check that setting, and let me know if this was the problem or
> not.

Yes, but no. Quanta should obey encodings stated by definition (XML being utf8 
if not stated otherwise) and encodings stated by the file that is opened. It 
does not make sense to force the user to override the default-setting, if a 
file states what encoding it uses. Apart from that, this also means that a 
user is forced to open quanta first, instead of opening the file directly 
from konqueror.

Comment 6 András Manţia 2004-11-22 17:26:51 UTC
On Monday 22 November 2004 14:47, S.Burmeister wrote:
> ------- You are receiving this mail because: -------
> Kate does not hoghlight it, as there is no highlighting with *.xsl
> attached, yet it should recognise the <?xml, I guess.
As I said it works in KDE CVS HEAD, so it will work latest in 3.4.


> Why should Quanta use the default encoding, if a file states what
> encoding it uses? 

Because we don't detect the encoding of a file, but Kate does. We just 
set it to the default one. Try to open the file with Kate as:
kate filename. Does it open with UTF8? If yes, now create some file with 
another encoding and see if it opens correctly.

> Using the default only makes sense for new 
> documents and for documents, that do not state their encoding. Why
> would somebody try to read a document in Chinese, if the first line
> states that it is written in English?

I don't know, but I believe Kate doesn't check the content of the 
document by looking for <?xml lines, but it tries to figure out if the 
whole document has characters that does not fit in iso-8891-1 and so. 
So it detects the encoding by the structure of the document, not by the 
meaning of the text that's inside.

> Yes, but no. Quanta should obey encodings stated by definition (XML
> being utf8 if not stated otherwise) and encodings stated by the file
> that is opened. It does not make sense to force the user to override
> the default-setting, if a file states what encoding it uses. Apart
> from that, this also means that a user is forced to open quanta
> first, instead of opening the file directly from konqueror.

Quanta obeys what the user selects (in File->Open), or what is in the 
default encoding. I don't find any better way of doing it with the 
current katepart. If Kate opens various types of documents correctly 
*based* on the <?xml line and when you open them by running "kate 
filename", let me know and I will try to find a way to have the same 
behavior in Quanta as well.

Andras

Comment 7 András Manţia 2005-11-07 20:25:09 UTC
Does this (analyzing the beginning of the file to find the encoding) make sense for you? If no, just close the report, as I believe this is not Quanta's job.
Comment 8 Christoph Cullmann 2005-11-07 20:34:03 UTC

*** This bug has been marked as a duplicate of 55355 ***
Comment 9 Anders Lund 2005-11-07 21:40:18 UTC
We ship a filetype for katepart that makes utf-8 the encoding for XML files. But this also overrides the information found in the file, so actually looking in the document makes a lot of sense, and we have the problem on the radar.

There is a problem with katepart having to reload the file when the encoding is changed, but it's still better than getting it wrong. My personal opinion is that we should have a trigger in the highlight parser, which analyzes the file anyway.