Version: 3.4.0 (using KDE 3.4.0, Debian Package 4:3.4.0-0ubuntu3.2 (3.1)) Compiler: gcc version 3.3.5 (Debian 1:3.3.5-8ubuntu2) OS: Linux (i686) release 2.6.10-5-686 When some caracters are on top of html code in web page, this <meta http-equiv="Content-Type" content="text/HTML; charset=UTF-8" /> isn't read by konqueror, the charset is ISO8859-1. Firefox read charset correctly when some caracters are on top of html.
Can you give us a test case? Konqueror enforces correctness when searching for the <meta> tag. It must be inside <head>, for instance, so it will stop processing if it sees a non-<head> tag.
Every "View as HTML" link of Google can be used as test case. One example: http://66.102.9.104/search?q=cache:JTl6CLhE8ZcJ:www.testdaf.de/dokumente/anmeldung.pdf+test+dokument&hl=de Sadly, they simply put a <table> before their output of their converted document, which has a correct <meta http-equiv="Content-Type"... The table prevents Konqueror from finding the meta tags.
I can see the problem, but I'm not sure if we can fix this reasonably. The reasonable thing is to scan the start of document until we can be sure not to find it. Then we have to start showing it to the user. What we implement currently is scan the HTML header, which isn't shown anyways. When we see stuff to be shown, we give up.
Maybe we should do it like firefox. They search in the first up to 2048 bytes of the page for a <meta> tag, containing "charset". Regardless of any other tags.
*** Bug 120036 has been marked as a duplicate of this bug. ***
Created attachment 14253 [details] decoder more tolerant This simple change makes the mentioned google pages work. Google adds nearly 80 tags before the header, so we must allow to skip so much tags to find the meta tag.
The proposed patch does not seem to help. Have you actually tested it? :)
Yes i tested it and it worked. But Google seems to have changed their pages a bit in the past. The best thing is, they finaly put a <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> line in front of their converter output. So at least "View as html" works now ;)
This is still broken for me. As a test case, go ahead and open the original test case, view the html source. There are two <meta> tags specifying the encoding, one at the top added by google, and one in the original <head> section. Remove the first <meta> tag at the top of the page. Save the document locally, and open it again. Konqueror still fails to detect the second tag correctly, and defaults to the wrong charset.
Reproducible. Whether the konqueror devs think this should be "fixed", I'll leave up to them.
Message from the Bugsquad and Konqueror teams: This bug is closed as outdated, as we do not have the manpower to maintain the KDE3 version anymore. If you still can reproduce this issue with Konqueror 4.8.4 or later, please open a new report. Thank you for your understanding.