Bug 252168 - Konqueror´s Advanced Text Editor View ignores character set
Summary: Konqueror´s Advanced Text Editor View ignores character set
Status: REOPENED
Alias: None
Product: konqueror
Classification: Applications
Component: general (show other bugs)
Version: 4.11.5
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: Konqueror Developers
URL:
Keywords:
Depends on: 329454
Blocks:
  Show dependency treegraph
 
Reported: 2010-09-23 20:14 UTC by Christopher Yeleighton
Modified: 2014-03-22 17:45 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Naive page (with a-umlaut, ISO-8859-1) (144 bytes, text/html)
2012-07-21 22:18 UTC, groot
Details
Naive page (with a-umlaut, ISO-8859-1) (185 bytes, text/html)
2012-07-21 22:20 UTC, groot
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christopher Yeleighton 2010-09-23 20:14:44 UTC
Version:           4.4.4 (using KDE 4.4.4) 
OS:                Linux

When Konqueror switches to display the Advanced Text Editor, it should inform the text editor about the source character set.

Reproducible: Always

Steps to Reproduce:
1. 
iconv '-f' 'utf-8' '-t' 'latin1' >'/tmp/index.html' <<'<!-- EOF -->' &&
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
><HTML LANG=en 
><META HTTP-EQUIV=CONTENT-TYPE 
CONTENT="TEXT/HTML; CHARSET=ISO-8859-1" 
><TITLE >Naïve page </TITLE 
><P >Naïve page
<!-- EOF -->
xdg-open '/tmp/index.html'

2. 
Tell Konqueror to display the page in Text Editor.

Actual Results:  
1. 
Konqueror displays "Naïve page" in HTML view.

2.
The file was opened read-only because it contains invalid UTF-8.

Expected Results:  
2.
The Text Editor should automatically convert the text from its actual encoding, which is known to Konqueror at display time.

OS: Linux (x86_64) release 2.6.34.7-0.2-desktop
Compiler: gcc

Workaround: Tell the Text Editor to use the encoding ISO-8859-1.
Comment 1 groot 2012-07-21 22:18:02 UTC
Created attachment 72675 [details]
Naive page (with a-umlaut, ISO-8859-1)

Pretty much the content of the iconverted HTML in step 1 of the test.
Comment 2 groot 2012-07-21 22:20:38 UTC
Created attachment 72676 [details]
Naive page (with a-umlaut, ISO-8859-1)
Comment 3 groot 2012-07-21 22:33:20 UTC
Can't reproduce the problem either in KDE 4.4.4 or KDE 4.7.2. Step 2 is View->View Mode->Embedded Advanced Text Editor. 

In KDE 4.4.4, the attached file opens RW in the kate view (even though in kate my encoding is set, mysteriously, to Simplified Chinese).

In KDE 4.7.2, the attached file opens RO in the kate view, but no mention of encoding problem (it makes sense to open RO, because the resource the view represents is a read-only website).
Comment 4 Myriam Schweingruber 2012-07-22 15:55:48 UTC
(In reply to comment #3)
> Can't reproduce the problem either in KDE 4.4.4 or KDE 4.7.2. Step 2 is
> View->View Mode->Embedded Advanced Text Editor. 
> 
> In KDE 4.4.4, the attached file opens RW in the kate view (even though in
> kate my encoding is set, mysteriously, to Simplified Chinese).
> 
> In KDE 4.7.2, the attached file opens RO in the kate view, but no mention of
> encoding problem (it makes sense to open RO, because the resource the view
> represents is a read-only website).

Hm, you actually confirm this is NOT reproducible? Then this should not be set to confirmed but on the contrary, set to resolved.
Comment 5 groot 2012-07-22 21:25:39 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Can't reproduce the problem either in KDE 4.4.4 or KDE 4.7.2. Step 2 is
> > View->View Mode->Embedded Advanced Text Editor. 
> 
> Hm, you actually confirm this is NOT reproducible? Then this should not be
> set to confirmed but on the contrary, set to resolved.

It was more "The behavior I get is sufficiently different from the original reporter that I can't tell if it's resolved" but reconsidering, the exact test the OP proposes works fine under KDE 4.7.2.
Comment 6 Christopher Yeleighton 2014-03-22 17:45:08 UTC
The following step will get you Latin–2 text reinterpreted as Latin–1:

iconv '-f' 'utf-8' '-t' 'latin2' >'/tmp/index.html' <<'<!-- EOF -->' 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" >
<HTML LANG=en >
<META HTTP-EQUIV=CONTENT-TYPE CONTENT="TEXT/HTML; CHARSET=ISO-8859-2" >
<TITLE >Koń i żółw </TITLE >
<P >Koń i żółw grali w kości z piękną ćmą u źródła. 
<!-- EOF -->