134036 – [PATCH] Stop treating every line as a separate paragraph for BIDI purposes

Bug 134036 - [PATCH] Stop treating every line as a separate paragraph for BIDI purposes

Summary: [PATCH] Stop treating every line as a separate paragraph for BIDI purposes

Status:	RESOLVED FIXED

Alias:	None

Product:	kmail
Classification:	Unmaintained
Component:	general (show other bugs)
Version:	unspecified
Platform:	Debian testing Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Diego Iastrubni

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-09-13 23:07 UTC by Shai
Modified:	2008-09-30 20:30 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
An example of what the mail should look like (58.62 KB, image/png) 2006-09-13 23:28 UTC, Shai	Details
An example of what the mail looks like when saved (58.15 KB, image/png) 2006-09-13 23:31 UTC, Shai	Details
Patch for better treatment of directionality (3.93 KB, patch) 2007-12-20 17:45 UTC, Shai	Details
corrected version of the patch (4.91 KB, patch) 2008-08-21 03:10 UTC, Shai	Details
Show Obsolete (1) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Shai 2006-09-13 23:07:36 UTC

Version:            (using KDE KDE 3.5.4)
Installed from:    Debian testing/unstable Packages

When writing or reading messages containing bidirectional text (e.g. messages in Hebrew with some English words included), every line in the message is considered as a new paragraph for the purpose of deciding its direction. Thus, if a word of the wrong language happens to start a line, that line is presented all wrong (not only are its words mis-ordered, it is usually justified to the wrong side). Of course, this appears in full force when wordwrap is used.

What I'd like to see is that paragraphs are only separated by a blank (all-whitespace) line. 

I'll try to  add a  picture demonstrating the problem.

Comment 1 Shai 2006-09-13 23:23:59 UTC

Ok, I described the bug a little worse than it really is. The problem does exist, both in composing and in presenting existing messages, but only with saved messages, so while composing a completely new message it does not appear.

Anyway, see the difference between the two images coming up to see the problem. In the good image, every paragraph is in its own side. In the bad image, the second line of each paragraph is in the wrong side, and messed up. The steps taken to go from good to bad: Save draft, and re-open.

Just to make sure I'm not misunderstood: The problem shows up also with received messages, sent messages, every time the message is saved; and my interpretation is that the line break character (which I guess is not used for wordwrap until saving) is interpreted, for BiDi purposes, as a paragraph separator, and it shouldn't be.

Comment 2 Shai 2006-09-13 23:28:19 UTC

Created attachment 17759 [details]
An example of what the mail should look like

In this image, we see the composer with two paragraphs, one in English (LTR)
and one in Hebrew (RTL), each containing two lines of which the second starts
with a word from the other language. This causes no problem, as the message is
newly created and has not been saved yet.

Comment 3 Shai 2006-09-13 23:31:31 UTC

Created attachment 17760 [details]
An example of what the mail looks like when saved

In this image, we see the same message after it has been saved and re-opened.
the second line in each paragraph is messed up and justified wrong, because it
is treated as an independent paragraph, and its directionality is then set
according to the first strong character.

Comment 4 Diego Iastrubni 2007-04-28 00:11:47 UTC

Hi Shay,

I definetly agree with you, just it does not work in real life. When I created the original patch (4-5 years ago) I tested the idea of detecting paragraphs instead of lines. I also detected paragraphs a sequence of 2 "newlines".

The problem is that in emails, many people do not use '2 newlines' as a new paragraph, but 1. I also found that the line breaking will happen at random places (due to encoding and the fact that emails get truncated at char 78).

Unless you provide a working patch (for trunk, not branch please), I will close this bug as "wont fix", just because I tried it once, and I have no idea about how to implement it. Sorry

Comment 5 Shai 2007-04-28 18:46:12 UTC

Hi Cuco,

1) Thanks for the warning. To my regret, I am not in a position to work on fixing KMail right now (I might be in a couple of months, but not now).

2) I don't understand why you want to close the bug. I understand that currently, nobody has a good solution. Closing with "won't fix" is like saying nobody will ever have a good solution. Do you really think this cannot be solved?

3) At the very least, something needs to be done about KMail showing one thing when editing, and another after saving (or sending). In case you decide that a single newline should serve as a paragraph separator for Bidi decisions, make sure that wordwrap does the same thing (that is, each line is considered separately even when there's no newline).

Thanks,
Shai.

Comment 6 Diego Iastrubni 2007-12-14 11:47:31 UTC

Hi Shai,

I don't think that there is a way to fix it. Since line breaks happen at "random" locations as far as the text editor knows.

Comment 7 Shai 2007-12-14 18:04:22 UTC

Diego,

I will be happy to work on this a little -- I have some time for it these days. I may bug you for assistance.

I haven't looked at the text editor or any other part of KMail, but I will be very surprised if at least my item 3 above cannot be handled.

One idea I have about handling it is doing "line unwrap" -- a paragraph separator would be any newline that couldn't have been inserted by line wrap (that is, if adding the first word from the next line to the line would still not make the line length over 78). This will still miss some cases (e.g. broken-off end-lines in 5-times-nested quotations), but I think it will be a huge improvement over the current situation. And the cases where it would err on the other side (connect separate paragraphs) are, I think, really rare.

I'll try to contact you about this next week.

Have fun,
Shai.

Comment 8 Shai 2007-12-20 17:45:21 UTC

Created attachment 22634 [details]
Patch for better treatment of directionality

I am providing here a patch to implement what I described above: Each line
takes its direction from the line before it, unless it is beginning a
paragraph. A line begins a paragraph if it is the first, or the previous line
was empty, or the previous line is longer than 78 (indicating that it was in
fact a paragraph and not a line -- outlook messages sometimes arrive like
that), or the previous line is shorter than 78 and adding the first word from
this line wouldn't make it longer (that is, we guess that it wasn't broken just
for line wrap).

I'm pretty sure this is not perfect, because the problem is not well-defined.
However, I'm pretty sure it's an improvement. It should fix most of the
problems we BiDi users see with lines jumping around and getting messed up in
the middle of paragraphs.

The only situation I can see, where the patch produces a behavior which is
worse than the unpatached version, is where a message has a paragraph ending in
a line of length close to 78, the next paragraph is not separated by an empty
line (perhaps it is designated by indenting the first line in), and this next
paragraph should switch direction. I think this is much more rare than the
current problem (a line starting in a word of the other direction in the middle
of a paragraph), so I think this patch is worthy.

Of course, the patch is GPLed.

Comment 9 Diego Iastrubni 2008-08-13 09:33:35 UTC

Hi Shai,

I am very surpriced, but your patch works for KDE trunk (4.2). It applies, compiles and does not mess up English text (I have enough texts to tests...) and the minimal amount of Hebrew emails I have - I see no regresion compared to KDE3, so far so good.

Now, instead of using QString::isRightToLeft(), I would like to use a function which does a better buessing of the direction KateRenderer::isLineRightToLeft() (get it here 
http://websvn.kde.org/trunk/KDE/kdelibs/kate/render/katerenderer.cpp?revision=830563&view=markup )

Does it work better for you? Can you exchange some emails with me so I can test this any further? 
KDEpim guys, as much as I think this is a stupid idea - Shai is proving me stupid and this patch can be added in the next few weeks into trunk for KDE 4.2. Can any of you comment on this issue? My understanding of email internals is not that good.

Comment 10 Shai 2008-08-15 01:10:19 UTC

Hi Diego and PIMsters,

I am currently abroad and with very limited connectivity -- I'll be able to test and say more about this next week, and also write you some more.

I have a feeling that using this patch with KateRenderer::isLineRightToLeft will produce better results most of the time, but (assuming this is the method you mentioned earlier, where the direction is set by majority of strong characters) it yields a strange algorithm, where the direction of each paragraph is set by the majority of characters in its first line -- this will be hard to debug, when the corner cases show up. With QString::isRightToLeft, there is no such problem, because the first strong char of the line (assuming one exists) is also the first of the paragraph, so it's really just using paragraphs instead of lines for the BiDi algorithm.

On the other hand, if you just use majority for every line separately, you'll probably get as much lines right as this patch would (there will be a small number of mistakes for each of them, on different weird cases).

On the third hand, identifying paragraphs can be useful for other things too -- improvements, that is, not just bug-fixes -- so I think it better if this patch is not completely forgotten...

It's your call.

Comment 11 Thomas McGuire 2008-08-15 13:52:49 UTC

Hi,

sorry that it took me so long to respond to this patch.
I have actually not tested it, but since it only affects RTL text I'm fine with committing it and trust you that it works correctly.

Please change a few things until this can be committed:

> unsigned int end = s.indexOf('\n', pos+1, Qt::CaseInsensitive);
> if (end == (unsigned int)(-1))
Please use int instead of unsigned int, which doesn't make sense here.

- Use more comments. From just reading the code, I don't understand at all
  how looksLikeParagraphBreak works, please add some description there, for
  example how the decision algorithm works.

- What is the pos paramter in looksLikeParaBreak? Use a more descriptive   
  parameter name

> which calls for some refactoring
Actually doing the refactoring here would be nice.

- Coding style, see http://websvn.kde.org/trunk/KDE/kdepim/kmail/HACKING?revision=839299&view=markup
  Especially remove trailing whitespace and add more whitespace like described in the document

Comment 12 Shai 2008-08-21 03:10:35 UTC

Created attachment 26954 [details]
corrected version of the patch

This newer patch addresses most of Thomas' notes, and also fixes some bugs (thanks Diego).

The one note I know I haven't addressed is about the refactoring vis-a-vis KMMessage::textFlow() -- I would like to first have this working (if accepted), and only then start to worry about factoring out the separation-to-words functionality. There are other things to factor out too -- like the definition of the WRAP_COL constant; 78 appears in kmmessage.cpp and some other places as a "magic" constant.

I hope this one is good and clear enough. As Diego said, he has a different method to solve the same problem, which may prove simpler, but as I said, identifying paragraphs may have further benefits.

Thanks for your attention,

Shai.

Comment 13 Thomas McGuire 2008-09-30 20:30:37 UTC

SVN commit 866375 by tmcguire:

Improve detection of when a line in the mail text should be rendered right-to-left.

Patch by Shai Berger <shai@platonix.com>

BUG: 134036