Bug 67348 - Font selection in HTML gives unexpected results
Summary: Font selection in HTML gives unexpected results
Status: RESOLVED FIXED
Alias: None
Product: konqueror
Classification: Applications
Component: khtml parsing (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Konqueror Developers
URL:
Keywords:
: 67819 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-11-05 22:21 UTC by Carl Thompson
Modified: 2003-11-24 20:09 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Carl Thompson 2003-11-05 22:21:53 UTC
Version:            (using KDE Devel)
Installed from:    Compiled sources

It appears that Konqueror is trying to be "smart" about fonts that are specified using the "font-family" CSS style or "font" HTML tag but not matched exactly.  This causes some incorrect results.

Test case: http://carlthompson.net/konqueror_font_selection_bug.html

The test case contains the following HTML:
<blockquote>
<pre>
<html>
<body>
<div style="font-family:utopia">
This is test 1 - style=&quot;font-family:utopia&quot;
</div>
<div style="font-family:'ms sans serif', utopia">
This is test 2 - style=&quot;font-family:'ms sans serif', utopia&quot;
</div>
<div style="font-family:ms sans serif, utopia">
This is test 3 - style=&quot;font-family:ms sans serif, utopia&quot;
</div>
</body>
</html>
</pre>
</blockquoute>
On my system, the 'utopia' font exists but the 'ms sans serif' font does not. Therefore I would expect that all three test lines would be rendered using 'utopia.'

The first test line is displayed correctly because 'utopia' is the only font specified.

The second test has two fonts specified: 'ms sans serif' and 'utopia.' This test is displayed in a sans serif font apparently because 'ms sans serif' contains the words 'sans serif' so Konqueror decided it would be a good idea to substitute a sans serif font it knows about for one that it does not.

The third test has the same fonts specified as the second but the first font (which contains multiple words) is not properly quoted. On my system this test is displayed in the 'comic sans ms' font apparently because Konqueror saw 'ms sans serif' and 'comic sans ms' both have the words 'sans' and 'ms' in them and decided that was close enough.


The big problem with the second and third tests is that konqueror believes it knows better than the web developer when it comes to deciding what font to use. When a web developer specifies two fonts, what that means is:

<blockquote>
Use the first font I specified if an <em><strong>exact</strong></em> match can be found. If an exact match cannot use the second font I specified. If an exact match cannot be found for the second font, use the default font.
</blockquote>

Konqueror is perverting that logic by deciding that certain fonts are close enough.

In my humble opinion, all of this type of font-guessing logic should be removed from Konqueror and left up to the web developer and user.  Further, the second and third tests should be handled exactly the same way because there are many real-world sites that do not properly quote multi-word font names.

The real-world site where I first noticed this problem is http://washingtonpost.com/ where it specifies 'ms sans serif' in its fonts but Konqueror uses 'comic sans ms.'

This bug seems similar to bug 57485.  However, that bug report is overly-complicated and emphasizes the reporter's fontconfig/Xft setup, which really has very little to do with this.  The problem does not seem to be in the this layer or QT because other QT-based applications behave "correctly."
Comment 1 Stephan Kulow 2003-11-06 10:41:09 UTC
the two first cases display utopia to me and the third one is simply invalid
CSS, so it's result should be fine to be undefined. 

BTW: it's not konqueror that's deciding it, it's Qt.
Comment 2 Carl Thompson 2003-11-06 20:15:06 UTC
Subject: Re:  Font selection in HTML gives unexpected results

> ------- Additional Comments From coolo@kde.org  2003-11-06 10:41 -------
> the two first cases display utopia to me

Hmmm, this is weird.  For me the second example normally shows up as 'ms
reference sans serif' (not the requested 'ms sans serif').  However, if I
change the default font in Konqueror's settings to 'ms reference sans
serif' then the second example shows up as 'utopia!'  I have no idea why
changing the default font would have anything to do with the font used for
the second example.  Clearly, there is something very wrong here...

> and the third one is simply invalid CSS, so it's result should be fine to
> be undefined.

I agree that it is invalid CSS (or invalid HTML if the 'font' tag is used). 
However, I have seen several sites in addition to washingtonpost.com that
are requesting 'ms sans serif' in the exact same broken way so there is
probably some common site building tool that generates this broken code. 
Since several real world sites are known to do this, Konqueror should
probably display it properly.  Since the font list is comma delimited
anyway, it does not seem like it should be hard to handle unquoted font
names the exact same way and fix the broken rendering of these sites.

> BTW: it's not konqueror that's deciding it, it's Qt.

I tried to test this by changing the default font used by Qt applications in
my qtrc file.  I cannot duplicate this type of behavior in pure Qt
applications so it does seem to be Konqueror/KDE specific.

In any case, Konqueror/KHTML _should_ absolutely be deciding the exact font
used.  Only Konqueror parses the HTML and knows the exact fonts specified
by the web developers and the default font specified by the user.  To
reiterate what I said previously, the font used should be an exact match of
one of the fonts specified by the web developer or if none match the
default font specified by the user.  No other font should ever be used in
my opinion.

Thanks,
Carl

Comment 3 Carl Thompson 2003-11-06 20:30:01 UTC
By the way, I forgot mention that Mozilla and Opera both choose the fonts that I expect and act the way I've described even for the third case (invalid CSS/HTML because multi-word font name is unquoted).  Internet Explorer does too (but I needed to change the test a little because 'ms sans serif' is actually on my Windows box).  Konqueror is doing something different than the three "biggest" browsers and to me this means that Konqueror should be fixed to choose fonts in the same way.
Comment 4 Stephan Kulow 2003-11-24 11:34:44 UTC
Subject: kdelibs/khtml

CVS commit by coolo: 

handle invalid font families (as discussed with Dirk I'm commiting my version
of the patch and will after that try to merge the safari code to handle even
more cases)
(this problem is the cause of many bug reports, I'm just picking one :)
CCMAIL: 67348-done@bugs.kde.org


  M +5 -0      ChangeLog   1.106
  M +24 -3     css/cssparser.cpp   1.278


--- kdelibs/khtml/ChangeLog  #1.105:1.106
@@ -1,2 +1,7 @@
+2003-11-24  Stephan Kulow  <coolo@kde.org>
+
+        * css/cssparser.cpp (parseFontFamily): handle invalid font families like
+        font-family: ms sans serif, utopia
+
 2003-11-24  David Faure  <faure@kde.org>
 

--- kdelibs/khtml/css/cssparser.cpp  #1.277:1.278
@@ -1518,6 +1518,27 @@ CSSValueListImpl *CSSParser::parseFontFa
             list->append( new CSSPrimitiveValueImpl( id ) );
         else if ( value->unit == CSSPrimitiveValue::CSS_STRING ||
-                  value->unit == CSSPrimitiveValue::CSS_IDENT )
-            list->append( new FontFamilyValueImpl( qString( value->string ) ) );
+                  value->unit == CSSPrimitiveValue::CSS_IDENT ) {
+            QString face = qString( value->string );
+            // the grammar says we can only have comma separated strings,
+            // but the real life HTML contains stuff like "font-family: ms sans serif, adobe helvetica"
+            // so we look ahead in the list and if the next item is a string too, we concat the strings
+            // before we take it as item
+        caughtinvalid:
+            value = valueList->next();
+            if ( value ) {
+                if ( value->unit == CSSPrimitiveValue::CSS_STRING ||
+                     value->unit == CSSPrimitiveValue::CSS_IDENT ) {
+                    face += " " + qString( value->string );
+                    goto caughtinvalid;
+                } else if ( value->unit == Value::Operator && value->iValue == ',' ) {
+                    value = valueList->next();
+                }
+            }
+            list->append( new FontFamilyValueImpl( face ) );
+            if ( value )
+                continue;
+            else
+                break;
+        }
         else {
 //             kdDebug( 6080 ) << "invalid family part" << endl;


Comment 5 Stephan Kulow 2003-11-24 11:47:23 UTC
*** Bug 67819 has been marked as a duplicate of this bug. ***
Comment 6 Carl Thompson 2003-11-24 19:57:54 UTC
This fix works for me as expected to resolve this bug.

Thank You!
Comment 7 Stephan Kulow 2003-11-24 20:09:58 UTC
thanks for telling