TAB characters may appear in Subject headers. Currently, TAB characters are correctly shown in the message list view, but are replaced by a SPACE character in the envelope view and the titles of windows (composer, message view window). Reproducible: Always Steps to Reproduce: 1. Create message with TAB in Subject header 2. Send it, so that it is in an IMAP mailbox 3. Fetch the message and view it with Trojita Actual Results: TAB is replaced by SPACE in envelope view and in window titles Expected Results: TAB is not replaced by SPACE in envelope view and in window titles, but is kept as-is, as in the message list view. I looked at the source, but could not find the function actually doing the incorrect replacement.
"toHtmlEscaped()" The concept of a tabulator doesn't exist in html and I *assume* the function won't translate it to a stack of 's Kinda "wontfix" (notably for window titles, WMs will usually simplify all whitespace and other funky junk, just assume there was a couple of newlines in it... ;-)
(In reply to Thomas Lübking from comment #1) > "toHtmlEscaped()" > The concept of a tabulator doesn't exist in html and I *assume* the function > won't translate it to a stack of 's I understand. Another option might be to translate TAB to  , an em space, to better approximate the visual intent of a TAB. > Kinda "wontfix" (notably for window titles, WMs will usually simplify all > whitespace and other funky junk, just assume there was a couple of newlines > in it... ;-) Ok, I see why window titling code may do its own pre-processing. Feel free to close as WONFIX.
I wonder what is a correct rendering of \t in a GUI with non-monospace fonts. How do you expect to see them, and what it the use case behind showing them in a different way than "just space"?
(In reply to Jan Kundrát from comment #3) > I wonder what is a correct rendering of \t in a GUI with non-monospace > fonts. How do you expect to see them, and what it the use case behind > showing them in a different way than "just space"? Assume the TAB was intentionally placed, so that a space would have been used if that was meant instead. Then it was meant to convey meaning. That meaning could be ‘a longer stretch of white space’, e.g, for obtaining some visual effect. It cannot be ‘alignment’, because line-breaks in subjects are not allowed. The use case of not using a space here is to convey that meaning. (I currently do not see another meaning than ‘a longer stretch of white space’ in the context of a Subject header.) If it is ‘a longer stretch of white space’, then using an em-space would be more appropriate than a normal space.
On Monday 07 March 2016 12:31:08 Erik Quaeghebeur via KDE Bugzilla wrote: > line-breaks in subjects are not allowed. Hm... is there restriction in RFC for this? Just asking.
> If it is ‘a longer stretch of white space’, then using an em-space would be > more appropriate than a normal space. Sounds like plan -- please submit a patch for this. > Hm... is there restriction in RFC for this? Just asking. None that I'm aware of.
(In reply to Pali Rohár from comment #5) > On Monday 07 March 2016 12:31:08 Erik Quaeghebeur via KDE Bugzilla wrote: > > line-breaks in subjects are not allowed. > > Hm... is there restriction in RFC for this? Just asking. In RFC 5322, it is said that Subject is an unstructured field, which is described in Section 2.2.1 as « Some field bodies in this specification are defined simply as "unstructured" (which is specified in section 3.2.5 as any printable US-ASCII characters plus white space characters) with no further restrictions. These are referred to as unstructured field bodies. Semantically, unstructured field bodies are simply to be treated as a single line of characters with no further processing (except for "folding" and "unfolding" as described in section 2.2.3). » More generally, as far as I've understood (and modulo folding), a header value MUST NOT contain any line breaks.
> More generally, as far as I've understood (and modulo folding), > a header value MUST NOT contain any line breaks. Line breaks are special when they appear as-is because RFC5322 assigns a special meaning to them. There is RFC2047 which specifies how to send Unicode. It gets triggered by presence of LFs in my testing, and I think that this usage is actually kosher -- but i'll be happy to be proven wrong by a quite from some other RFC.
On Monday 07 March 2016 13:17:45 Erik Quaeghebeur via KDE Bugzilla wrote: > https://bugs.kde.org/show_bug.cgi?id=360205 > > --- Comment #7 from Erik Quaeghebeur <kdebugs@equaeghe.nospammail.net> --- > (In reply to Pali Rohár from comment #5) > > On Monday 07 March 2016 12:31:08 Erik Quaeghebeur via KDE Bugzilla wrote: > > > line-breaks in subjects are not allowed. > > > > Hm... is there restriction in RFC for this? Just asking. > > In RFC 5322, it is said that Subject is an unstructured field, which is > described in Section 2.2.1 as > > « Some field bodies in this specification are defined simply as > "unstructured" (which is specified in section 3.2.5 as any printable > US-ASCII characters plus white space characters) with no further > restrictions. These are referred to as unstructured field bodies. > Semantically, unstructured field bodies are simply to be treated as a > single line of characters with no further processing (except for > "folding" and "unfolding" as described in section 2.2.3). » > > More generally, as far as I've understood (and modulo folding), a header value > MUST NOT contain any line breaks. > This is truth, but RFC 2047 allows you to include any unicode character into header body (via encoding unicode character to UTF-8 and encoding it to either QP or Base64). It means that also newline can be encoded into Base64. From RFC 5322 point of view, Base64 is single line (after unfolding) but you have funny unicode and also ascii whitespaces in header body... Or is there some restriction that above is violation of MIME or something else?
(In reply to Jan Kundrát from comment #8) > > More generally, as far as I've understood (and modulo folding), > > a header value MUST NOT contain any line breaks. > > There is RFC2047 which specifies how to send Unicode. It gets triggered by > presence of LFs in my testing, and I think that this usage is actually > kosher -- but i'll be happy to be proven wrong by a quite from some other > RFC. The token definition in RFC822 that allowed bare CR and bare LF was updated in RFC2822 to preclude them: RFC2047 Section 5. (1) on p.7: « An ’encoded-word’ may replace a ’text’ token (as defined by RFC 822) in any Subject or Comments header field [...] » RFC822 Section 3.3 on p.10: « text = <any CHAR, including bare CR & bare LF, but NOT including CRLF> » RFC2822 Section 3.2.1 on p.10: « text = %d1-9 / %d11 / %d12 / %d14-127 / obs-text ; Characters excluding CR and LF » (RFC5322 doesn't reintroduce them.) So digesting them is needed for dealing with old messages, but producing them is, in my understanding, against the latest RFC.
On Monday 07 March 2016 15:05:24 Erik Quaeghebeur via KDE Bugzilla wrote: > https://bugs.kde.org/show_bug.cgi?id=360205 > > --- Comment #10 from Erik Quaeghebeur <kdebugs@equaeghe.nospammail.net> --- > (In reply to Jan Kundrát from comment #8) > > > More generally, as far as I've understood (and modulo folding), > > > a header value MUST NOT contain any line breaks. > > > > There is RFC2047 which specifies how to send Unicode. It gets triggered by > > presence of LFs in my testing, and I think that this usage is actually > > kosher -- but i'll be happy to be proven wrong by a quite from some other > > RFC. > > The token definition in RFC822 that allowed bare CR and bare LF was updated in > RFC2822 to preclude them: > > RFC2047 Section 5. (1) on p.7: > « An ’encoded-word’ may replace a ’text’ token (as defined by RFC 822) in any > Subject or Comments header field [...] » So it means that UTF-8 encoded unicode character "newline" in MIME Base64 is valid content for Subject header. Right? > RFC822 Section 3.3 on p.10: > « text = <any CHAR, including bare CR & bare LF, but NOT including CRLF> » Ah, so old RFC 822 allows directly "\n" in Subject. I did not know that. > RFC2822 Section 3.2.1 on p.10: > « text = %d1-9 / %d11 / %d12 / %d14-127 / obs-text ; Characters excluding > CR and LF » > > (RFC5322 doesn't reintroduce them.) > > So digesting them is needed for dealing with old messages, but producing them > is, in my understanding, against the latest RFC. Right, my original question about newlines was because I wanted to know if compliant email client needs to deal with such thing. So answer is yes, "\n" in Subject is valid character.
(In reply to Jan Kundrát from comment #6) > > If it is ‘a longer stretch of white space’, then using an em-space would be > > more appropriate than a normal space. > > Sounds like plan -- please submit a patch for this. It seems unicode em-spaces get converted to plain spaces by QString::toHtmlEscaped, or my font doesn't have the em-spaces. Is there an easy way to have a look at the html of a message view in Trojita? Another observation: it seems that tabs in message bodies get converted to 4 spaces within the message view. Where should I look for that code? I guess it is best to treat tabs in the subject the same as in the body.
The message works here (it's pre-wrapped) and lead me to a solution that should work for you: diff --git a/src/Gui/EnvelopeView.cpp b/src/Gui/EnvelopeView.cpp index 1850456..14c049a 100644 --- a/src/Gui/EnvelopeView.cpp +++ b/src/Gui/EnvelopeView.cpp @@ -87,7 +87,7 @@ void EnvelopeView::setMessage(const QModelIndex &index) if (e.date.isValid()) { subDate = QStringLiteral("<table style=\"margin:0px; margin-left:4em; float:right;\"><tr style=\"margin:0px;\"><td style=\"margin:0px;\">%1</td></tr></table>").arg(e.date.toLocalTime().toString(Qt::SystemLocaleLongDate)); } - subDate += QStringLiteral("<span style=\"font:bold large;\">%1</span>").arg(e.subject.toHtmlEscaped()); + subDate += QStringLiteral("<span style=\"font:bold large; white-space:pre;\">%1</span>").arg(e.subject.toHtmlEscaped()); auto lbl = new QLabel(subDate, this); SET_LABEL_OPTIONS(lbl) layout()->addWidget(lbl);
> - subDate += QStringLiteral("<span style=\"font:bold > large;\">%1</span>").arg(e.subject.toHtmlEscaped()); > + subDate += QStringLiteral("<span style=\"font:bold large; > white-space:pre;\">%1</span>").arg(e.subject.toHtmlEscaped()); While you're changing this, try to consider a situation where the subject consists of just one overly long sequence of characters with no spaces in the middle -- just a reminder for future. On topic of preserving whitespace, let's be realistic, please. We already wrap the subject in funny ways around the date, and I *like* this. If we want to go full RFC-lawyering here, nothing defines how to render these subjects, and wrapping them into several lines is an obvious usability feature. I do not think that distinguishing between tabs, spaces, emspaces and newlines makes a lot of sense in subjects. If anything, my today's vote :) is to ensure that newlines do not lead to visible line breaks (think possible information hiding) and that spaces are normalized to a reasonable amount of whitespace. > It seems unicode em-spaces get converted to plain spaces by > QString::toHtmlEscaped I've checked the source of that method, and it only replaces <, >, & and ". > Is there an easy way to have a look at the html of a message view in Trojita? Add a call to qDebug() to an appropriate place. The markup is asked from from within src/Gui/SimplePartWidget.cpp.
(In reply to Jan Kundrát from comment #14) > While you're changing this, try to consider a situation where the subject > consists of just one overly long sequence of characters with no spaces in > the middle -- just a reminder for future. => "white-space:pre-line;" Personally, I'm not overly interested in showing special whitespace *anywhere* (including the message list) since it only means that valuable information ends up being out of sight ;-P
(In reply to Jan Kundrát from comment #14) > > [...] If anything, my today's vote > :) is to ensure that newlines do not lead to visible line breaks (think > possible information hiding) and that spaces are normalized to a reasonable > amount of whitespace. (In reply to Thomas Lübking from comment #15) > > Personally, I'm not overly interested in showing special whitespace > *anywhere* (including the message list) [...] 1. I agree that information-hiding should be avoided. 2. I agree that stretches of white-space should not be overly long. I feel that 1. includes faithful representation of the message subject and content, including tabs. If a tab is there, it is either by intent (which we should consider), or by some error (which we could guess at... but I don't think we want to, as tabs in subjects are rather rare). Currently, I think that in the message list, information hiding may take place. (My patch does not address this, but I feel that there actually, it is OK to deviate from faithful representation.) My patch should not give rise to information-hiding, perhaps the tooltip case needs to be treated differently from the envelope view; let me know. Regarding 2.: I tried to set the tab length to something smaller than 8 characters, but alas, it seems that the ‘tab-size’ CSS is not yet supported. In any case, now there is some consistency between what is in the message list and what is in the envelope (which is actually what prompted my report/patch). In the future, this should ‘start to work’, once the Qt html viewer is brought up-to-date.
> I feel that 1. includes faithful representation of the message subject and > content, including tabs. If a tab is there, it is either by intent (which we > should consider), or by some error (which we could guess at... but I don't > think we want to, as tabs in subjects are rather rare). I've, unfortunately, seen totally botched subjects, including a case where a single Unicode codepoint which gets encoded into two bytes in UTF-8 was separted into two encoded-words bby the sender's RFC2047 generator. Based on this experience and on a quick consultation with an oracle (the grapefruit-flavored Salzburger Stiegel was particularly helpful), I think that it's going to be much more common to see tabs as an accident. Otherwise, consider the question of there to draw a line. Should we support line breaks for the same reason, for example? What about Unicode characters for drawing boxes? > In the future, this should ‘start to work’, once the Qt html viewer is brought > up-to-date. We're using a simple QLabel. It will likely never come with a full-blown HTML renderer. The supported subset of HTML is documented somewhere in Qt's docs, but it seems that it is actualy richer than what is documented.
Fwwi, mutt displays "?" for tabs (didn't try newline) I actually wonder about the trigger of this - few MUAs will (easily) allow to enter a tab into the subject (but jump to the next GUI element), ie. you'll copy it there (from a webpage)?
(In reply to Jan Kundrát from comment #17) > > [...], I think that it's going to be much more common to see tabs as an accident. (In reply to Thomas Lübking from comment #18) > > I actually wonder about the trigger of this - few MUAs will (easily) allow > to enter a tab into the subject (but jump to the next GUI element), [...] I'm convinced. I suggest you close this WONTFIX. I hope you don't mind the effort spent too much.
(In reply to Erik Quaeghebeur from comment #19) > I hope you don't mind the effort spent too much. *Your* efforts? ;-) Questioning things and justification is never wasted time ever. And: we figured to maybe re-check on the display in the message list.