Bug 79065

Summary: External CSS style-sheets default to wrong charset
Product: [Applications] konqueror Reporter: Thiago Macieira <thiago>
Component: khtml parsingAssignee: Konqueror Developers <konq-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: illogical1
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Test page, HTML 4.01 Transitional, UTF-8
Test CSS stylesheet, UTF-8 encoded
Attempt at fixing the problem
Second attempt at fixing

Description Thiago Macieira 2004-04-04 21:20:45 UTC
Version:           3.2.0 (using KDE 3.2.90 (CVS >= 20040117), compiled sources)
Compiler:          gcc version 3.3.3
OS:          Linux (i686) release 2.6.3

When a webpage (HTML and XHTML) references an external style-sheet through a <LINK> reference, the charset for the loaded file is incorrectly set: it defaults to ISO-8859-1 (Latin 1), even if metadata from the server specifies a different encoding.

The attached testpage (valid HTML 4.01 Transitional) demonstrates this error. The external stylesheet when loaded like this:
    <link rel="StyleSheet" type="text/css" href="test.css">

Causes the text to appear in Konqueror:
	This should appear « quoted ». And this is a test of UTF-8: €. 

Changing the load line to the following:
    <link rel="StyleSheet" type="text/css" charset="utf-8" href="test.css">

Causes the text to appear as it should (and as it does in Mozilla):
	This should appear « quoted ». And this is a test of UTF-8: €. 

Note: my locale is UTF-8, so all files are supposed to be loaded UTF-8 (as the webpage showing the Euro symbol demonstrates). Also, when retrieving the webpage from a server, I get:
kio_http: (918400) "Content-Type: text/css; charset=utf-8"
Comment 1 Thiago Macieira 2004-04-04 21:21:32 UTC
Created attachment 5531 [details]
Test page, HTML 4.01 Transitional, UTF-8
Comment 2 Thiago Macieira 2004-04-04 21:22:11 UTC
Created attachment 5532 [details]
Test CSS stylesheet, UTF-8 encoded
Comment 3 Thiago Macieira 2004-04-04 22:25:44 UTC
The functions at fault are:
	CachedObject::codecForBuffer (khtml/misc/loader.cpp)
	DocLoader::requestStyleSheet (same)

Nowhere in misc/loader.cpp does it try and get the charset from the KIO metadata.
Comment 4 Thiago Macieira 2004-04-05 00:01:35 UTC
Created attachment 5535 [details]
Attempt at fixing the problem

The attached patch fixes the problem for me, both for remote files and local
ones. It does:

- move the m_charset member from khtml::CachedCSSStyleSheet and
khtml::CachedScript into khtml::CachedObject. It won't be used, of course, for
images (khtml::CachedImage).

- in khtml::Loader::slotFinished, query the metadata from the job before
calling r->object->data. In case of local files, use the charset from
QTextCodec::codecForLocale
Comment 5 Thiago Macieira 2004-04-14 05:52:00 UTC
Created attachment 5632 [details]
Second attempt at fixing

The previous patch made the server charset parameter override the user's. This
one inverts that logic.
Comment 6 Thiago Macieira 2005-03-06 22:14:30 UTC
*** Bug 100993 has been marked as a duplicate of this bug. ***
Comment 7 Allan Sandfeld 2005-03-22 00:15:04 UTC
CVS commit by carewolf: 

Make charset in <link> actually mean something. Patch is simplified version 
of one by Thiago Maciera
BUG: 79065


  M +4 -0      ChangeLog   1.408
  M +2 -2      misc/loader.cpp   1.181


--- kdelibs/khtml/ChangeLog  #1.407:1.408
@@ -1,2 +1,6 @@
+2005-03-22  Allan Sandfeld Jensen <kde@carewolf.com>
+
+        * misc/loader.cpp: Do not override existing charset with an empty one.
+
 2005-03-21  Allan Sandfeld Jensen <kde@carewolf.com>
 

--- kdelibs/khtml/misc/loader.cpp  #1.180:1.181
@@ -968,5 +968,5 @@ CachedCSSStyleSheet *DocLoader::requestS
 
     CachedCSSStyleSheet* s = Cache::requestObject<CachedCSSStyleSheet, CachedObject::CSSStyleSheet>( this, fullURL, accept );
-    if ( s ) {
+    if ( s && !charset.isEmpty() ) {
         s->setCharset( charset );
     }
@@ -981,5 +981,5 @@ CachedScript *DocLoader::requestScript( 
 
     CachedScript* s = Cache::requestObject<CachedScript, CachedObject::Script>( this, fullURL, 0 );
-    if ( s )
+    if ( s && !charset.isEmpty() )
         s->setCharset( charset );
     return s;