(*** This bug was imported into bugs.kde.org ***) Package: khtml Version: 3.0 (using KDE 2.2.1 -0.rh71.1.cups) Severity: normal Installed from: Red Hat Linux 7.1 Compiler: gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-85) OS: Linux (i686) release 2.4.9-0.18 OS/Compiler notes: The Ascii character <AD> which looks like a hyphen is a "soft" hyphen and is used to indicate places in words where a line-break may occur (just like \- in LaTeX). It should only be displayed at the end of the line when a break does indeed occur. However konqueror unconditionnally displays it just as if it were a normal hyphen (2D). (See also http://www.w3.org/TR/REC-html40/struct/text.html#h-9.3.3) Currently konqueror doesn't seem to handle the soft hyphen. However when fixing this please be aware that unfortunately the soft hyphen is misused in a number of places such as for example http://www.newscientist.com/news/news.jsp?id=ns99991443 where it is misused as a numeric minus sign (Â255). Obviously suppressing its display in this instance would severly distort the meaning of the text. In order to prevent that from happening maybe <AD> should always be displayed after a space or before a digit. (Submitted via bugs.kde.org) (Called from KBugReport dialog)
The problem is still present in KDE 3.1.0. Moreover, Konqueror does never break a word containing the soft hyphen. Example: test
This is a confirmation that the bug still exists in KDE 3.1.3.
And in HEAD (20030725).
*** Bug 52956 has been marked as a duplicate of this bug. ***
and in 20031016 HEAD
kedit / Qt3.3 handles it correctly but khtml (KDE 3.2) fails. Testcase: khtmltests/regression/tests/unsorted/33855.html
*** Bug 60053 has been marked as a duplicate of this bug. ***
Dave Hyatt in his blog says he has fixed this for Safari. So hopefully this will be fixed in khtml too, in a not too distant future. See http://weblogs.mozillazine.org/hyatt/archives/2004_03.html#005130
ASCII codes above 127 isn't real ASCII codes so in some extended ASCII tables 173 aka AD is soft hyphen in others it isn't. So please referre to soft hyphen as either ­, &173; or &x0AD; like the W3C HTML 4.0 recommendation says: http://www.w3.org/TR/REC-html40/struct/text.html#hyphenation Right now I use Konqueror 3.2.2-4 Red Hat and the soft hyphens are shown as small spaces which doesn't break
The subject of soft hyphenation is not that easy. Please read http://www.cs.tut.fi/~jkorpela/shy.html
I have read the above link, and I understand the issue may be complicated. However, I have seen in the above article that: - in unicode, the character is meant to be a discretonary hyphen - HTML 4 recommends to assign to the character the same meaning. - Mozilla and Netscape hide the character from the stream, partially ignoring the problem, but acting substantially in accordance with the HTML 4 spec. I think we should either do what Mozilla is doing in this case, or if it is not too difficult, do the extra step of using the hyphen as a line-break hint. To make still better, the document should be hyphenated on the fly, but that is not possible unless the document language is known. Until then, at least we would give authors a way to improve the document look and legibility. If the feature is implemented, the hyphen must be ignored when searching in the text, according to the above document.
The ­ (­) still always displays as a normal hyphen in KDE 3.3.2
*** Bug 102577 has been marked as a duplicate of this bug. ***
*** Bug 111212 has been marked as a duplicate of this bug. ***
SVN commit 468638 by ggarand: Port soft-hyphen support from WebCore BUG: 33855, 113900 M +8 -0 ChangeLog M +65 -4 rendering/bidi.cpp M +5 -1 rendering/render_text.cpp M +3 -0 rendering/render_text.h --- branches/KDE/3.5/kdelibs/khtml/ChangeLog #468637:468638 @@ -1,5 +1,13 @@ 2005-10-08 Germain Garand <germain@ebooksfrance.org> + Port soft-hyphen support from WebCore + + * rendering/bidi.cpp (findNextLineBreak): main skipping/breaking logic + + * rendering/render_text.{cpp,h} (calcMinMaxWidth): account for soft hyphen + +2005-10-08 Germain Garand <germain@ebooksfrance.org> + Follow Mozilla/Opera in considering the Window object as an implementation of the AbstractView/ViewCSS DOM2 interfaces. Return the Window object when available ; an AbstractView if not. --- branches/KDE/3.5/kdelibs/khtml/rendering/bidi.cpp #468637:468638 @@ -420,6 +420,19 @@ sLastBidiRun = startRun; } +static void chopMidpointsAt(RenderObject* obj, uint pos) +{ + if (!sNumMidpoints) return; + BidiIterator* midpoints = smidpoints->data(); + for (uint i = 0; i < sNumMidpoints; i++) { + const BidiIterator& point = midpoints[i]; + if (point.obj == obj && point.pos == pos) { + sNumMidpoints = i; + break; + } + } +} + static void checkMidpoints(BidiIterator& lBreak, BidiState &bidi) { // Check to see if our last midpoint is a start point beyond the line break. If so, @@ -435,8 +448,16 @@ if (currpoint == lBreak) { // We hit the line break before the start point. Shave off the start point. sNumMidpoints--; - if (!endpoint.obj->style()->preserveWS()) + if (!endpoint.obj->style()->preserveWS()) { + if (endpoint.obj->isText()) { + // Don't shave a character off the endpoint if it was from a soft hyphen. + RenderText* textObj = static_cast<RenderText*>(endpoint.obj); + if (endpoint.pos+1 < textObj->length() && + textObj->text()[endpoint.pos+1].unicode() == SOFT_HYPHEN) + return; + } endpoint.pos--; + } } } } @@ -1817,19 +1838,47 @@ bool appliedStartWidth = pos > 0; // If the span originated on a previous line, // then assume the start width has been applied. bool appliedEndWidth = false; + bool nextIsSoftBreakable = false; while(len) { bool previousCharacterIsSpace = currentCharacterIsSpace; + bool isSoftBreakable = nextIsSoftBreakable; + nextIsSoftBreakable = false; const QChar c = str[pos]; currentCharacterIsSpace = c == ' '; if (preserveWS || !currentCharacterIsSpace) isLineEmpty = false; + // Check for soft hyphens. Go ahead and ignore them. + if (c.unicode() == SOFT_HYPHEN && pos > 0) { + nextIsSoftBreakable = true; + if (!ignoringSpaces) { + // Ignore soft hyphens + BidiIterator endMid(0, o, pos-1); + addMidpoint(endMid); + + // Add the width up to but not including the hyphen. + tmpW += t->width(lastSpace, pos - lastSpace, f); + + // For whitespace normal only, include the hyphen. We need to ensure it will fit + // on the line if it shows when we break. + if (o->style()->whiteSpace() == NORMAL) + tmpW += t->width(pos, 1, f); + + BidiIterator startMid(0, o, pos+1); + addMidpoint(startMid); + } + + pos++; + len--; + lastSpace = pos; // Cheesy hack to prevent adding in widths of the run twice. + continue; + } #ifdef APPLE_CHANGES // KDE applies wordspacing differently bool applyWordSpacing = false; #endif - if ( (preserveLF && c == '\n') || (autoWrap && isBreakable( str, pos, strlen )) ) { + if ( (preserveLF && c == '\n') || (autoWrap && (isBreakable( str, pos, strlen ) || isSoftBreakable)) ) { if (ignoringSpaces) { if (!currentCharacterIsSpace) { // Stop ignoring spaces and begin at this @@ -1880,8 +1929,12 @@ } } - if (w + tmpW > width && autoWrap) { - goto end; + if (autoWrap) { + if (w+tmpW > width) + goto end; + else if ( (pos > 1 && str[pos-1].unicode() == SOFT_HYPHEN) ) + // Subtract the width of the soft hyphen out since we fit on a line. + tmpW -= t->width(pos-1, 1, f); } if( preserveLF && *(str+pos) == '\n' ) { @@ -2114,6 +2167,14 @@ lBreak.increment(bidi); } + if (lBreak.obj && lBreak.pos >= 2 && lBreak.obj->isText()) { + // For soft hyphens on line breaks, we have to chop out the midpoints that made us + // ignore the hyphen so that it will render at the end of the line. + QChar c = static_cast<RenderText*>(lBreak.obj)->text()[lBreak.pos-1]; + if (c.unicode() == SOFT_HYPHEN) + chopMidpointsAt(lBreak.obj, lBreak.pos-2); + } + return lBreak; } --- branches/KDE/3.5/kdelibs/khtml/rendering/render_text.cpp #468637:468638 @@ -1041,9 +1041,13 @@ m_hasBeginWS = true; if ((isSpace || isNewline) && i == len-1) m_hasEndWS = true; + + if (i && c == SOFT_HYPHEN) + continue; int wordlen = 0; - while( i+wordlen < len && !(isBreakable( str->s, i+wordlen, str->l )) ) + while( i+wordlen < len && (i+wordlen == 0 || str->s[i+wordlen].unicode() != SOFT_HYPHEN) && + !(isBreakable( str->s, i+wordlen, str->l )) ) wordlen++; if (wordlen) --- branches/KDE/3.5/kdelibs/khtml/rendering/render_text.h #468637:468638 @@ -36,6 +36,9 @@ class QPainter; class QFontMetrics; +// Define a constant for soft hyphen's unicode value. +#define SOFT_HYPHEN 173 + namespace khtml { class RenderText;