Bug 47682

Summary: [Qt]Glyph substitution poor
Product: [Applications] konqueror Reporter: Stevan White <stevan_white>
Component: khtml rendererAssignee: Konqueror Developers <konq-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: aiacovitti, ce, chahibi, cheeth, davidpjames, dev, dhaval, erayo, fc_kdebugs, gassauer, gt, hewlett, juuso.alasuutari, kde, kde_bugzilla_2, khanreaper, kuba, mail, mboquien, nicholas, nicolasg, pacho, petr.hroudny, rrick, r_o_rossi, sebastian_ml, sonic, the.rhorn, tyrerj
Priority: NOR    
Version: 4.0   
Target Milestone: ---   
Platform: Mandrake RPMs   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Table of all HTML 4 SGML Character Entities
Mozilla vs. Konqueror CVS showing "rarr; / larr;" arrows

Description Bugzilla Maintainers 2002-09-10 06:18:58 UTC
(*** This bug was imported into bugs.kde.org ***)

Package:           khtml
Version:           4.0 (using KDE 3.0.3 )
Severity:          normal
Installed from:    Mandrake Linux Cooker i586 - Cooker
Compiler:          gcc version 3.2 (Mandrake Linux 9.0 3.2-1mdk)
OS:                Linux (i686) release 2.4.19-5mdk
OS/Compiler notes: 

Many of the HTML 4 SGML Entities don't display at all.

See my page at
http://www.zipcon.net/~swhite/docs/computers/browsers/entities.html


(Submitted via bugs.kde.org)
(Called from KBugReport dialog)
Comment 1 Stevan White 2002-11-01 11:20:16 UTC
This depends on the fonts used in the document.  If I don't specify them (my  
browser is set to use a serif font by default), the arrow characters  
	&larr; &uarr; &rarr; &darr; &harr; &crarr; &lArr; &rArr; &dArr; &hArr;  
are not displayed.  
If I specify the style   
	body{ font-family: times, serif; }  
Then the above arrow symbols are displayed. 
This improves many other SGML elements, but not all: 
	&loz; &spades; &clubs; &hearts; &diams;  
	&lceil; &rceil; &lfloor; &rfloor; &lang; &rang; 
are not displayed. 
See also my page 
	<http://www.zipcon.net/~swhite/docs/math/math.html> 
 
Comment 2 Stevan White 2002-11-08 09:03:22 UTC
Created attachment 396 [details]
Table of all HTML 4 SGML Character Entities
Comment 3 Kai Lahmann 2003-06-14 03:41:50 UTC
if they are displayed as the entity, it's a bug. If they are displayed as a block (to show "font has 
nothing for this") it isn't. As I don#t see any entities, this bug is WFM 
Comment 4 Stevan White 2003-06-15 02:39:31 UTC
I disagree.

The standard HTML 4 SGML Entities should be displayed, regardless of fonts.  

If the current font doesn't contain a glyph to represent the entity, the browser
should try to find one that does.  Failing that, the browser should find some
other way to display the entity (either construct a nice representation, or fall
back to displaying the entity name.)

You will find that various browsers take different strategies on this point. 
The best browsers bend over backward to display the entities as the standard
specifies that they're to be displayed.
Comment 5 Kai Lahmann 2003-06-15 13:20:47 UTC
*** Bug 39618 has been marked as a duplicate of this bug. ***
Comment 6 Kai Lahmann 2003-06-15 13:21:21 UTC
*** Bug 33332 has been marked as a duplicate of this bug. ***
Comment 7 Kai Lahmann 2003-06-15 13:21:28 UTC
*** Bug 32095 has been marked as a duplicate of this bug. ***
Comment 8 Stevan White 2003-07-11 09:31:36 UTC
More information:

First, here's a table with all the entities all in one page:
    http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html 

Second, I've run numerous experiments using that page, with Konqueror, Opera,
Netscape, and Explorer on Linux and Windows.  I altered the charset in the meta
tag, and the font-family specified in CSS.

My reading of the HTML standard is that these entities should be displayed
intelligibly regardless of the specified font-family or encoding.  

Konqueror is independent of encoding, but dependent on font (it displays
perfectly if the font-family is "arial unicode ms").

Opera 7.11 Linux displays the entities correctly independently of both font and
encoding.  On Windows, it is missing a few characters.

NS 7.2 Linux is independent of encoding, but depends on fonts (displays correctly
using font "times", but many endities are messed up if use "arial unicode ms".
Note this is *opposite* from Konqueror)

MSIE 6.0.2800.1106 Is bad in both ways: a change of font or a change in page
encoding will cause most of the SGML Character Entities to be displayed incorrectly.
Comment 9 Stephan Kulow 2003-10-22 20:53:33 UTC
that bug is in qt somehow.
Comment 10 Stephan Kulow 2003-10-25 12:24:01 UTC
*** Bug 66532 has been marked as a duplicate of this bug. ***
Comment 11 Stephan Kulow 2003-11-28 23:26:14 UTC
*** Bug 63751 has been marked as a duplicate of this bug. ***
Comment 12 Stephan Kulow 2003-12-11 22:32:18 UTC
*** Bug 70167 has been marked as a duplicate of this bug. ***
Comment 13 Nicolas Goutte 2003-12-21 23:26:00 UTC
The page http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html works for me! The font has not all characters but then the box is displayed. (So I suppose that Qt has been fixed in the meantime.)

Have a nice day!
Comment 14 Gerald Teschl 2003-12-22 11:41:42 UTC
Dear Nicolas Goutte, you obviously missed the point:

>My reading of the HTML standard is that these entities should be displayed
>intelligibly regardless of the specified font-family or encoding.

This bug makes konqueror useless for scientific text, which is a pitty IMHO;-(
Comment 15 Nicolas Goutte 2003-12-22 12:17:46 UTC
Sorry, re-opening!
Comment 16 Stephan Kulow 2003-12-22 12:53:05 UTC
Subject: Re:  HTML 4 SGML Entities don't all work

> http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
> works for me! The font has not all characters but then the box is
> displayed. (So I suppose that Qt has been fixed in the meantime.)
The displaying of the box _is_ the bug.

Greetings, Stephan

Comment 17 Jens 2004-01-15 22:59:33 UTC
Created attachment 4185 [details]
Mozilla vs. Konqueror CVS showing "rarr; / larr;" arrows

I can confirm this bug in CVS from 2-jan-2004. (see screenshot)
Comment 18 Jesper Juhl 2004-01-20 23:12:40 UTC
Still not there with KDE3.2RC1
Comment 19 Frerich Raabe 2004-02-19 13:51:20 UTC
*** Bug 75569 has been marked as a duplicate of this bug. ***
Comment 20 Axel Boldt 2004-03-17 23:07:17 UTC
This bug is a major show stopper (for me). As it stands, the Wikipedia math pages cannot be read with Konqueror. We have long given up on IE, usually recommend that people use Mozilla, but Konqueror would be a good alternative!
Comment 21 Stephan Kulow 2004-07-20 11:04:05 UTC
*** Bug 85523 has been marked as a duplicate of this bug. ***
Comment 22 Maksim Orlovich 2004-07-22 19:16:07 UTC
*** Bug 85709 has been marked as a duplicate of this bug. ***
Comment 23 Stephan Kulow 2004-08-09 11:06:06 UTC
*** Bug 86822 has been marked as a duplicate of this bug. ***
Comment 24 Stephan Kulow 2004-10-17 10:58:41 UTC
*** Bug 91495 has been marked as a duplicate of this bug. ***
Comment 25 Stephan Kulow 2004-11-12 15:42:46 UTC
*** Bug 77348 has been marked as a duplicate of this bug. ***
Comment 26 Allan Sandfeld 2005-01-08 13:16:41 UTC
Marking as "should be fixed by Qt"
Comment 27 Allan Sandfeld 2005-02-21 23:22:00 UTC
*** Bug 44290 has been marked as a duplicate of this bug. ***
Comment 28 Stevan White 2005-02-27 11:59:22 UTC
Allan,

Have the Qt folks been notified that they are going to fix this bug?

(Or are you saying that some recent improvement in Qt should already have fixed it?)
Comment 29 Allan Sandfeld 2005-02-27 12:17:06 UTC
Yes, they know. They even fixed it once in Qt 3.1.0 I think, but reverted it because it created new problems. 

I can only hope they have fixed it for real in Qt 4.
Comment 30 Allan Sandfeld 2005-03-08 19:20:55 UTC
*** Bug 98394 has been marked as a duplicate of this bug. ***
Comment 31 Maksim Orlovich 2005-03-17 05:52:50 UTC
*** Bug 101659 has been marked as a duplicate of this bug. ***
Comment 32 Stevan White 2005-03-20 11:11:03 UTC
Allen,

Can you provide me with a report number for this Qt issue?

If it's their problem, I would like to pester them.
Comment 33 Maksim Orlovich 2005-07-10 03:19:34 UTC
*** Bug 108820 has been marked as a duplicate of this bug. ***
Comment 34 Nick Andrew 2005-09-21 05:11:45 UTC
The HTML character entity rarr displays as a box for me if the font is sans-serif, and displays correctly if it is serif. I also see frequent
boxes in text webpages in places where I would expect dashes or quotes.

Konqueror should display either the literal '&' 'rarr' ';' or find a font
substitute in which the character is provided.

My test case:

<html>
 <head>
  <title>right arrow test</title>
 </head>
<body>
Here it is ... &rarr;<br>
<font size="+1">1&rarr;</font>
<font size="+2">2&rarr;</font>
<font size="+3">3&rarr;</font>
<font size="+4">4&rarr;</font>
<font size="+5">5&rarr;</font><br>
<font face="serif">serif&rarr;</font><br>
<font face="sans">sans&rarr;</font>
</body>
</html>
Comment 35 Maksim Orlovich 2005-11-14 19:09:20 UTC
*** Bug 116356 has been marked as a duplicate of this bug. ***
Comment 36 Maksim Orlovich 2006-02-04 19:41:12 UTC
*** Bug 75314 has been marked as a duplicate of this bug. ***
Comment 37 Marijn Schouten 2006-02-15 17:46:52 UTC
I see only(8) boxes for #34's testcase. Increasing the font size manually doesn't change it. Using konqueror 3.5.1.
Comment 38 Marijn Schouten 2006-02-15 18:01:43 UTC
Also for some reason &notin; is displayed as &not;in;
I've opened Bug 122047: &notin; is displayed as &not;in;
Comment 39 Stevan White 2006-02-18 22:04:52 UTC
Marijn,

The &notin; element is curious, but the display is intelligible, which is all the standard requires.  But sure, in light of everything else, call it a bug.

At this time, Konqueror is still failing to display, or displaying unintelligibly,
about 20% of all the HTML 4 character entities.  This page still works:
http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
Comment 40 Didier Raboud 2006-03-27 15:12:32 UTC
Still not solved in 3.5.1. Isn't that a more than 3 years old bug ? Wikipedia is sometimes unreadable (math's) and cannot be used correctly with Konqueror. What about QT solving it ?
Comment 41 Maksim Orlovich 2006-03-31 02:47:39 UTC
*** Bug 124603 has been marked as a duplicate of this bug. ***
Comment 42 Allan Sandfeld 2006-04-01 11:40:25 UTC
*** Bug 124689 has been marked as a duplicate of this bug. ***
Comment 43 Thiago Macieira 2006-04-01 23:51:25 UTC
Qt 4 solves it.
Comment 44 Thorsten Staerk 2006-04-02 05:54:54 UTC
I want to confirm that it works beautifully with KDE 4. Thanks, Thiago & others!
Comment 45 Thiago Macieira 2006-04-04 22:03:39 UTC
I have nothing to do with this :-)
Comment 46 Stevan White 2006-06-04 18:16:54 UTC
I just installed KDE 5.3.2, with Konqueror of the same version, as distributed in Kubuntu Linux 6.6

Things are vastly improved.

There remain just a few glitches.  The following entities are not correctly displayed:

&notin;	<- this one is rather amusing.
&or;

&lceil;
&rceil;
&lfloor;
&rfloor;
&lang;
&rang;

Let's go for perfection!

Once again, have a look at my table at
http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
Comment 47 Stevan White 2006-06-04 18:37:46 UTC
In Konquerer, at any rate, the SGML display is very font-dependent.  This is not a good sign:  According to the standard, SGML characters must be intelligibly displayed, regardless of such considerations.

My table is missing only the above-mentioned entities if I set the font (in CSS) to display in the distro's "sans-serif" font.  But in the serif font, things are much worse.  Considering this, I now have to say things are not much improved since I first reported the problem.

Notice that this does not happen in other browsers:  Firefox in particular finds a font in which the required glyph is available, and uses that glyph.

I'll work on my page so that you can easily switch the font it uses.
Comment 48 Stevan White 2006-06-04 18:41:20 UTC
Apologies:

I read 5.3.2, when it is 3.5.2.  This is of course a lower number than the one everybody says fixes the trouble.

So, to redeem myself now I have to go on and install KDE 4, I guess.
Comment 49 Allan Sandfeld 2006-06-04 23:08:40 UTC
Sorry it is CANTFIX/WONTFIX for 3.5 and fixed in 4.0
Comment 50 Maksim Orlovich 2006-07-17 04:45:01 UTC
*** Bug 130957 has been marked as a duplicate of this bug. ***
Comment 51 Allan Sandfeld 2006-09-17 13:42:19 UTC
*** Bug 123133 has been marked as a duplicate of this bug. ***
Comment 52 Tommi Tervo 2006-12-07 13:00:26 UTC
*** Bug 138496 has been marked as a duplicate of this bug. ***
Comment 53 Tommi Tervo 2007-02-20 15:42:53 UTC
*** Bug 130171 has been marked as a duplicate of this bug. ***
Comment 54 Tommi Tervo 2007-08-04 09:24:57 UTC
*** Bug 146922 has been marked as a duplicate of this bug. ***
Comment 55 Maksim Orlovich 2007-09-26 20:57:55 UTC
*** Bug 150228 has been marked as a duplicate of this bug. ***
Comment 56 Tommi Tervo 2007-11-01 21:42:31 UTC
*** Bug 151673 has been marked as a duplicate of this bug. ***
Comment 57 Tommi Tervo 2007-11-17 17:46:26 UTC
*** Bug 152462 has been marked as a duplicate of this bug. ***
Comment 58 Tommi Tervo 2008-07-15 12:57:02 UTC
*** Bug 166604 has been marked as a duplicate of this bug. ***
Comment 59 Stevan White 2008-08-02 21:07:21 UTC
Hi again!

Over six years after I reported this ugly bug, two years after it had been proclaimed fixed in development versions not easily available to the public or to me (and the bug accordingly closed) I can finally, gratefully report that in Ubuntu 8.05, with KDE/Konqueror 4.0.3, all the HTML 4 entities are displayed correctly, under some conditions.

So I will finally agree that this problem is fixed.  Whoo Hoo!

Just in case somebody in the inner circle is listening:  (not to dampen the festive mood, but) this is an AWFUL development cycle.  Look at it as a challenge to find a way to streamline the development system so that the product does not continue to look stupid for the better part of a decade after a problem has been reported.  (I mean, the development environments both of KDE and of QT.)
Comment 60 Alain Knaff 2009-05-30 13:42:51 UTC
&shy; is still no correctly handled (in 4.2.2):

If no break needed, both parts of the word separated by &shy; are displayed stuck to each other, without hyphen (as they should).
However, if a break is needed, they are displayed on two lines (as expected), but no hyphen is displayed after the first part!

All other characters from http://www.zipcon.net/~swhite/docs/computers/browsers/entities.html are handles correctly.

Test case:

<html>
 <body>
  long long long sentence longlonglonglonglong&shy;wordwordword test&iexcl;
 </body>
</html>

Then make window narrow enough that longword is broken in two: the hyphen is missing!
Comment 61 Alain Knaff 2009-05-30 13:48:36 UTC
Reopening this due to issue with &shy;

All other entities seem to be handled all richt
Comment 62 Maksim Orlovich 2009-05-30 15:57:44 UTC
&shy is fixed in either 4.2.3 or 4.2.4. And please don't touch a bug report with 20 or so people CC'd for an unrelated issue!