Bug 252168

Summary: Konqueror´s Advanced Text Editor View ignores character set
Product: [Applications] konqueror Reporter: Christopher Yeleighton <giecrilj>
Component: generalAssignee: Konqueror Developers <konq-bugs>
Status: REOPENED ---    
Severity: normal CC: groot
Priority: NOR    
Version: 4.11.5   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:
Bug Depends on: 329454    
Bug Blocks:    
Attachments: Naive page (with a-umlaut, ISO-8859-1)
Naive page (with a-umlaut, ISO-8859-1)

Description Christopher Yeleighton 2010-09-23 20:14:44 UTC
Version:           4.4.4 (using KDE 4.4.4) 
OS:                Linux

When Konqueror switches to display the Advanced Text Editor, it should inform the text editor about the source character set.

Reproducible: Always

Steps to Reproduce:
1. 
iconv '-f' 'utf-8' '-t' 'latin1' >'/tmp/index.html' <<'<!-- EOF -->' &&
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
><HTML LANG=en 
><META HTTP-EQUIV=CONTENT-TYPE 
CONTENT="TEXT/HTML; CHARSET=ISO-8859-1" 
><TITLE >Naïve page </TITLE 
><P >Naïve page
<!-- EOF -->
xdg-open '/tmp/index.html'

2. 
Tell Konqueror to display the page in Text Editor.

Actual Results:  
1. 
Konqueror displays "Naïve page" in HTML view.

2.
The file was opened read-only because it contains invalid UTF-8.

Expected Results:  
2.
The Text Editor should automatically convert the text from its actual encoding, which is known to Konqueror at display time.

OS: Linux (x86_64) release 2.6.34.7-0.2-desktop
Compiler: gcc

Workaround: Tell the Text Editor to use the encoding ISO-8859-1.
Comment 1 groot 2012-07-21 22:18:02 UTC
Created attachment 72675 [details]
Naive page (with a-umlaut, ISO-8859-1)

Pretty much the content of the iconverted HTML in step 1 of the test.
Comment 2 groot 2012-07-21 22:20:38 UTC
Created attachment 72676 [details]
Naive page (with a-umlaut, ISO-8859-1)
Comment 3 groot 2012-07-21 22:33:20 UTC
Can't reproduce the problem either in KDE 4.4.4 or KDE 4.7.2. Step 2 is View->View Mode->Embedded Advanced Text Editor. 

In KDE 4.4.4, the attached file opens RW in the kate view (even though in kate my encoding is set, mysteriously, to Simplified Chinese).

In KDE 4.7.2, the attached file opens RO in the kate view, but no mention of encoding problem (it makes sense to open RO, because the resource the view represents is a read-only website).
Comment 4 Myriam Schweingruber 2012-07-22 15:55:48 UTC
(In reply to comment #3)
> Can't reproduce the problem either in KDE 4.4.4 or KDE 4.7.2. Step 2 is
> View->View Mode->Embedded Advanced Text Editor. 
> 
> In KDE 4.4.4, the attached file opens RW in the kate view (even though in
> kate my encoding is set, mysteriously, to Simplified Chinese).
> 
> In KDE 4.7.2, the attached file opens RO in the kate view, but no mention of
> encoding problem (it makes sense to open RO, because the resource the view
> represents is a read-only website).

Hm, you actually confirm this is NOT reproducible? Then this should not be set to confirmed but on the contrary, set to resolved.
Comment 5 groot 2012-07-22 21:25:39 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Can't reproduce the problem either in KDE 4.4.4 or KDE 4.7.2. Step 2 is
> > View->View Mode->Embedded Advanced Text Editor. 
> 
> Hm, you actually confirm this is NOT reproducible? Then this should not be
> set to confirmed but on the contrary, set to resolved.

It was more "The behavior I get is sufficiently different from the original reporter that I can't tell if it's resolved" but reconsidering, the exact test the OP proposes works fine under KDE 4.7.2.
Comment 6 Christopher Yeleighton 2014-03-22 17:45:08 UTC
The following step will get you Latin–2 text reinterpreted as Latin–1:

iconv '-f' 'utf-8' '-t' 'latin2' >'/tmp/index.html' <<'<!-- EOF -->' 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" >
<HTML LANG=en >
<META HTTP-EQUIV=CONTENT-TYPE CONTENT="TEXT/HTML; CHARSET=ISO-8859-2" >
<TITLE >Koń i żółw </TITLE >
<P >Koń i żółw grali w kości z piękną ćmą u źródła. 
<!-- EOF -->