Bug 329454 - Kate ignores HTML character set declaration
Summary: Kate ignores HTML character set declaration
Status: RESOLVED FIXED
Alias: None
Product: kate
Classification: Applications
Component: encoding (show other bugs)
Version: 3.11.3
Platform: openSUSE Linux
: NOR minor
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks: 252168
  Show dependency treegraph
 
Reported: 2013-12-31 07:49 UTC by Christopher Yeleighton
Modified: 2014-01-26 14:28 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christopher Yeleighton 2013-12-31 07:49:56 UTC
Kate ignores HTML character set declaration, using ISO Latin-1 for all non-UTF content.

Reproducible: Always

Steps to Reproduce:
  1.  { cat > /tmp/kate.html <<'<!-- EOF -->'; }
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
><HTML ><META HTTP-EQUIV=CONTENT-TYPE CONTENT="TEXT/HTML; CHARSET=ISO-8859-2" 
><TITLE >Character set detection</TITLE ><P >Zażółć gęślą jaźń
<!-- EOF -->

  2. Tell Kate to open <URL: /tmp/kate.html > using the default encoding (UTF–8).

Actual Results:  
  2. Kate figures out that the encoding is not UTF but uses ISO Latin-1:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
><HTML ><META HTTP-EQUIV=CONTENT-TYPE CONTENT="TEXT/HTML; CHARSET=ISO-8859-2" 
><TITLE >Character set detection</TITLE ><P >Za¿ó³æ gê¶l± jaŒñ

Expected Results:  
  2. Let Kate use the encoding declared in the CONTENT-TYPE instead:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
><HTML ><META HTTP-EQUIV=CONTENT-TYPE CONTENT="TEXT/HTML; CHARSET=ISO-8859-2" 
><TITLE >Character set detection</TITLE ><P >Zażółć gęślą jaźń

Of course, you can always tell Kate to use a different encoding, after the document has been opened (but before it is modified).
Comment 1 Michal Humpula 2013-12-31 12:40:20 UTC
I guess kate could use QTextCodec::codecForHtml when detecting text/html mime type.
Comment 2 Christoph Cullmann 2014-01-26 14:28:08 UTC
Git commit 91e1030a512910d120175bf519c1662a35cff68c by Christoph Cullmann.
Committed on 26/01/2014 at 14:15.
Pushed by cullmann into branch 'master'.

allow detection of encoding by HTML encoding

M  +19   -6    src/buffer/katetextloader.h

http://commits.kde.org/ktexteditor/91e1030a512910d120175bf519c1662a35cff68c