Kate ignores HTML character set declaration, using ISO Latin-1 for all non-UTF content. Reproducible: Always Steps to Reproduce: 1. { cat > /tmp/kate.html <<'<!-- EOF -->'; } <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" ><HTML ><META HTTP-EQUIV=CONTENT-TYPE CONTENT="TEXT/HTML; CHARSET=ISO-8859-2" ><TITLE >Character set detection</TITLE ><P >Zażółć gęślą jaźń <!-- EOF --> 2. Tell Kate to open <URL: /tmp/kate.html > using the default encoding (UTF–8). Actual Results: 2. Kate figures out that the encoding is not UTF but uses ISO Latin-1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" ><HTML ><META HTTP-EQUIV=CONTENT-TYPE CONTENT="TEXT/HTML; CHARSET=ISO-8859-2" ><TITLE >Character set detection</TITLE ><P >Za¿ó³æ gê¶l± jaŒñ Expected Results: 2. Let Kate use the encoding declared in the CONTENT-TYPE instead: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" ><HTML ><META HTTP-EQUIV=CONTENT-TYPE CONTENT="TEXT/HTML; CHARSET=ISO-8859-2" ><TITLE >Character set detection</TITLE ><P >Zażółć gęślą jaźń Of course, you can always tell Kate to use a different encoding, after the document has been opened (but before it is modified).
I guess kate could use QTextCodec::codecForHtml when detecting text/html mime type.
Git commit 91e1030a512910d120175bf519c1662a35cff68c by Christoph Cullmann. Committed on 26/01/2014 at 14:15. Pushed by cullmann into branch 'master'. allow detection of encoding by HTML encoding M +19 -6 src/buffer/katetextloader.h http://commits.kde.org/ktexteditor/91e1030a512910d120175bf519c1662a35cff68c