Bug 448112

Summary:	Parsing broken subjects and other possibly UTF-8-encoded headers
Product:	[Applications] kmail2	Reporter:	ratijas <me>
Component:	general	Assignee:	kdepim bugs <pim-bugs-null>
Status:	REPORTED ---
Severity:	normal
Priority:	NOR
Version First Reported In:	unspecified
Target Milestone:	---
Platform:	Other
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description ratijas 2022-01-08 12:13:33 UTC

SUMMARY

Sometimes I'm getting automated emails from systems that could not properly encode multi-line UTF-8 subjects.
For example:

> Кассовый чек 500 ₽ от «ПАО "ТАТ��ЕЛЕКОМ"»

In the source view of that email it was represented in one line as this:

> From: "OFD.RU" <noreply@ofd.ru>
> Subject: =?UTF-8?B?0JrQsNGB0YHQvtCy0YvQuSDRh9C10LogNTAw?==?UTF-8?B?IOKCvSDQvtGCIMKr0J/QkNCeICLQotCQ0KLQ?==?UTF-8?B?otCV0JvQldCa0J7QnCLCuw==?=

The are two problems, as far as I can tell:

1. It was supposed to be split in multiple lines, after each closing ?= sequence
2. Unicode code-points should not be split across multiple =?UTF-8?B?...?= chunks.

But maybe we could make our lives easier by trying to recover broken subjects?

At least, we are already doing a good job of recovering from unspecified encoding, such as in this follow-up email I got from my internet provider:

> From: <pay@ais.tattelecom.ru>
> Subject: ÐÐ²Ð¸ÑÐ°Ð½ÑÐ¸Ñ Ð¿Ð¾ Ð¾Ð¿Ð»Ð°ÑÐµ ÑÑÐ»ÑÐ³ ÑÐ²ÑÐ·Ð¸ ÐÐÐ Â«Ð¢Ð°ÑÑÐµÐ»ÐµÐºÐ¾Ð¼Â»

…which KMail tried hard to «correctly» recover as

> Квитанция по оплате услуг связи ПАО «Таттелеком»

STEPS TO REPRODUCE
1. Get an email from OFD.RU

OBSERVED RESULT

Unicode symbols shred into pieces, as in

> "ТАТ��ЕЛЕКОМ"

EXPECTED RESULT

> "ТАТТЕЛЕКОМ"

SOFTWARE/OS VERSIONS
Operating System: Arch Linux
KDE Plasma Version: 5.23.80
KDE Frameworks Version: 5.90.0
Qt Version: 5.15.2
Kernel Version: 5.15.12-arch1-1 (64-bit)
Graphics Platform: X11
Processors: 8 × Intel® Core™ i7-6700HQ CPU @ 2.60GHz
Memory: 15.6 GiB of RAM
Graphics Processor: NVIDIA GeForce GTX 970M/PCIe/SSE2

ADDITIONAL INFORMATION