Bug 510945 - okular fails to render strike through formatted text of markdown documents (2)
Summary: okular fails to render strike through formatted text of markdown documents (2)
Status: RESOLVED FIXED
Alias: None
Product: okular
Classification: Applications
Component: markdown backend (other bugs)
Version First Reported In: 25.08.2
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Okular developers
URL:
Keywords:
: 510944 (view as bug list)
Depends on:
Blocks:
 
Reported: 2025-10-23 07:37 UTC by Christian Hartmann
Modified: 2025-11-14 10:37 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Hartmann 2025-10-23 07:37:22 UTC
SUMMARY

okular failes to render striked through text (using ~~ ... ~~) in markdown 
formatted documents, when it also have some characters, that are converted by 
https://www.pell.portland.or.us/~orc/Code/discount/
into html entities such as … or ©

Might affect all 
https://www.pell.portland.or.us/~orc/Code/discount/#smartypants

We has a discussion already here:
https://www.reddit.com/r/kde/comments/1o86fz9/kate_and_kwrite_know_and_can_display_markdown/

OBSERVED RESULT
strike through text is not rendered as such

EXPECTED RESULT
strike through text is rendered as such

SOFTWARE/OS VERSIONS
Operating System: Fedora Linux 42
KDE Plasma Version: 6.4.5
KDE Frameworks Version: 6.19.0
Qt Version: 6.9.2
Kernel Version: 6.16.12-200.fc42.x86_64 (64-bit)
Graphics Platform: Wayland

ADDITIONAL INFORMATION
https://imgbox.com/g/63LQenR1jY
https://imgbox.com/51ZHNRej (the better one)
Comment 1 Christian Hartmann 2025-10-23 07:38:36 UTC
*** Bug 510944 has been marked as a duplicate of this bug. ***
Comment 2 Ben Morris 2025-11-09 23:35:22 UTC
I think I see where this is going wrong. It's this line: https://invent.kde.org/graphics/okular/-/blob/420d551a9fe9c2904f67c6b47efdde7b9e4faa98/generators/markdown/converter.cpp#L54

QDomDocument::setContent() parses XML. Discount emits named character references which are valid HTML but *not* valid XML.

This can be demonstrated by inserting the line
    qDebug() << dom.setContent(html).errorMessage;

and then loading a file containing an ellipsis. Okular prints "Entity 'hellip' not declared."
Comment 3 Christian Hartmann 2025-11-10 07:30:12 UTC
> and then loading a file containing an ellipsis. Okular prints "Entity
> 'hellip' not declared."

nice finding! thx for digging into it
Comment 4 Bug Janitor Service 2025-11-10 23:35:21 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/1270
Comment 5 Ben Morris 2025-11-11 12:43:22 UTC
I've put in an MR for, as it were, the stupid solution: just fix the tags with a plain-text find and replace.

I came across a couple of alternative solutions which seemed complex beyond my comfort level, and also possibly bad ideas anyway:

1. Something a bit like this: https://invent.kde.org/education/rkward/-/commit/4ea710a77a90f1329ab57661283495bffdffe42c#6f1ea88f82d81036331f50a8ce6c32e36556e4e0

2. I *thought* QDomDocument offered a way of declaring additional entity references, or of handling undeclared ones, but a) I can't actually find it any more and b) that feels a bit too much like trying to build an HTML parser on top of the XML parser.
Comment 6 Bug Janitor Service 2025-11-14 09:38:29 UTC
A possibly relevant merge request was started @ https://invent.kde.org/graphics/okular/-/merge_requests/1275
Comment 7 Sune Vuorela 2025-11-14 09:46:38 UTC
Git commit 6566838bb259622f023476af17f753ae4a9b3530 by Sune Vuorela, on behalf of Ben Morris.
Committed on 14/11/2025 at 09:36.
Pushed by sune into branch 'master'.

Do not process HTML with QDomDocument

`QDomDocument` was used to replace `<del>` tags in Discount's output with `<s>` tags.

`QDomDocument::setContent()` parses XML only. This usually works, because Discount's HTML is usually valid XML. However, since it is in "Smartypants" mode, Discount generates named character references in response to certain inputs, e.g. `(c)` -> `&copy;` and `...` -> `&hellip;`. These are valid HTML, but most are not predefined in standard XML, and so QDomDocument refuses to parse them.

This MR uses `QString::replace()` in place of `QDomDocument`.

I know it's generally frowned upon to process HTML by such simple approaches, but within the constraints of HTML which Discount generates, I can't see a way that this could go wrong.

M  +6    -18   autotests/markdowntest.cpp
M  +4    -28   generators/markdown/converter.cpp

https://invent.kde.org/graphics/okular/-/commit/6566838bb259622f023476af17f753ae4a9b3530
Comment 8 Sune Vuorela 2025-11-14 10:28:27 UTC
Git commit 2080ad79ab08d17c5f7f244bad36c108a69bd7f1 by Sune Vuorela.
Committed on 14/11/2025 at 09:38.
Pushed by sune into branch 'release/25.12'.

Do not process HTML with QDomDocument

`QDomDocument` was used to replace `<del>` tags in Discount's output with `<s>` tags.

`QDomDocument::setContent()` parses XML only. This usually works, because Discount's HTML is usually valid XML. However, since it is in "Smartypants" mode, Discount generates named character references in response to certain inputs, e.g. `(c)` -> `&copy;` and `...` -> `&hellip;`. These are valid HTML, but most are not predefined in standard XML, and so QDomDocument refuses to parse them.

This MR uses `QString::replace()` in place of `QDomDocument`.

I know it's generally frowned upon to process HTML by such simple approaches, but within the constraints of HTML which Discount generates, I can't see a way that this could go wrong.


(cherry picked from commit 6566838bb259622f023476af17f753ae4a9b3530)

9acf9eeb Do not process HTML with QDomDocument
1337effa Removed wrapper tags from Markdown tests
8238e4e7 Whitespace-only changes to some Markdown tests

Co-authored-by: Ben Morris <bugs@benmorris.org.uk>

M  +6    -18   autotests/markdowntest.cpp
M  +4    -28   generators/markdown/converter.cpp

https://invent.kde.org/graphics/okular/-/commit/2080ad79ab08d17c5f7f244bad36c108a69bd7f1
Comment 9 Sune Vuorela 2025-11-14 10:37:53 UTC
(In reply to Bug Janitor Service from comment #4)
> A possibly relevant merge request was started @
> https://invent.kde.org/graphics/okular/-/merge_requests/1270

I ended up being convinced it is an improvement so I merged it. 
Note that it just missed the 25.12 beta cutoff, but will be in the RC.