Bug 515749

Summary: HTML Named Character References in Markdown Are Not Parsed Correctly
Product: [Applications] okular Reporter: Jonathan Gruber <jonathan.gruber.jg>
Component: markdown backendAssignee: Okular developers <okular-devel>
Status: REPORTED ---    
Severity: normal CC: jonathan.gruber.jg
Priority: NOR    
Version First Reported In: 25.12.2   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: Test file for reproducing bug.
Test file for reproducing bug.

Description Jonathan Gruber 2026-02-09 01:08:33 UTC
Created attachment 189373 [details]
Test file for reproducing bug.

SUMMARY
When certain HTML named character references appear in a Markdown file, then those named character references are not parsed correctly. I only tested the named character references "&OpenCurlyDoubleQuote;" and "&CloseCurlyDoubleQuote;", but I'm sure the problem is not isolated to them.

STEPS TO REPRODUCE
1. Open the attached file test.html in a web browser.
2. Open the attached file test.md in Okular.
3. Compare the displays of the two files.

OBSERVED RESULT
Okular displays the named character references "&OpenCurlyDoubleQuote;" and "&CloseCurlyDoubleQuote;" verbatim. However, Okular does correctly parse the ostensibly equivalent "&#8220;" and "&#8221;".

EXPECTED RESULT
Okular should parse the Markdown "&OpenCurlyDoubleQuote;" as the unicode character U+201C ("Left Double Quotation Mark") and "&CloseCurlyDoubleQuote;" as the unicode character U+201D ("Right Double Quotation Mark").

SOFTWARE/OS VERSIONS
Operating System: Arch Linux 
KDE Plasma Version: 6.5.5
KDE Frameworks Version: 6.22.0
Qt Version: 6.10.2
Kernel Version: 6.18.7-arch1-1 (64-bit)
Graphics Platform: Wayland
Processors: 12 × 12th Gen Intel® Core™ i7-1255U
Memory: 17 GB of RAM (16.4 GB usable)
Graphics Processor: Intel® Iris® Xe Graphics
Comment 1 Jonathan Gruber 2026-02-09 01:09:33 UTC
Created attachment 189374 [details]
Test file for reproducing bug.