Bug 329454

Summary: Kate ignores HTML character set declaration
Product: [Applications] kate Reporter: Christopher Yeleighton <giecrilj>
Component: encodingAssignee: KWrite Developers <kwrite-bugs-null>
Status: RESOLVED FIXED    
Severity: minor CC: michal.humpula
Priority: NOR    
Version: 3.11.3   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:
Bug Depends on:    
Bug Blocks: 252168    

Description Christopher Yeleighton 2013-12-31 07:49:56 UTC
Kate ignores HTML character set declaration, using ISO Latin-1 for all non-UTF content.

Reproducible: Always

Steps to Reproduce:
  1.  { cat > /tmp/kate.html <<'<!-- EOF -->'; }
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
><HTML ><META HTTP-EQUIV=CONTENT-TYPE CONTENT="TEXT/HTML; CHARSET=ISO-8859-2" 
><TITLE >Character set detection</TITLE ><P >Zażółć gęślą jaźń
<!-- EOF -->

  2. Tell Kate to open <URL: /tmp/kate.html > using the default encoding (UTF–8).

Actual Results:  
  2. Kate figures out that the encoding is not UTF but uses ISO Latin-1:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
><HTML ><META HTTP-EQUIV=CONTENT-TYPE CONTENT="TEXT/HTML; CHARSET=ISO-8859-2" 
><TITLE >Character set detection</TITLE ><P >Za¿ó³æ gê¶l± jaŒñ

Expected Results:  
  2. Let Kate use the encoding declared in the CONTENT-TYPE instead:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
><HTML ><META HTTP-EQUIV=CONTENT-TYPE CONTENT="TEXT/HTML; CHARSET=ISO-8859-2" 
><TITLE >Character set detection</TITLE ><P >Zażółć gęślą jaźń

Of course, you can always tell Kate to use a different encoding, after the document has been opened (but before it is modified).
Comment 1 Michal Humpula 2013-12-31 12:40:20 UTC
I guess kate could use QTextCodec::codecForHtml when detecting text/html mime type.
Comment 2 Christoph Cullmann 2014-01-26 14:28:08 UTC
Git commit 91e1030a512910d120175bf519c1662a35cff68c by Christoph Cullmann.
Committed on 26/01/2014 at 14:15.
Pushed by cullmann into branch 'master'.

allow detection of encoding by HTML encoding

M  +19   -6    src/buffer/katetextloader.h

http://commits.kde.org/ktexteditor/91e1030a512910d120175bf519c1662a35cff68c