Bug 515749

Summary:	HTML Named Character References in Markdown Are Not Parsed Correctly
Product:	[Applications] okular	Reporter:	Jonathan Gruber <jonathan.gruber.jg>
Component:	markdown backend	Assignee:	Okular developers <okular-devel>
Status:	REPORTED ---
Severity:	normal	CC:	jonathan.gruber.jg
Priority:	NOR
Version First Reported In:	25.12.2
Target Milestone:	---
Platform:	Arch Linux
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:
Attachments:	Test file for reproducing bug. Test file for reproducing bug.

Description Jonathan Gruber 2026-02-09 01:08:33 UTC

Created attachment 189373 [details]
Test file for reproducing bug.

SUMMARY
When certain HTML named character references appear in a Markdown file, then those named character references are not parsed correctly. I only tested the named character references "&OpenCurlyDoubleQuote;" and "&CloseCurlyDoubleQuote;", but I'm sure the problem is not isolated to them.

STEPS TO REPRODUCE
1. Open the attached file test.html in a web browser.
2. Open the attached file test.md in Okular.
3. Compare the displays of the two files.

OBSERVED RESULT
Okular displays the named character references "&OpenCurlyDoubleQuote;" and "&CloseCurlyDoubleQuote;" verbatim. However, Okular does correctly parse the ostensibly equivalent "&#8220;" and "&#8221;".

EXPECTED RESULT
Okular should parse the Markdown "&OpenCurlyDoubleQuote;" as the unicode character U+201C ("Left Double Quotation Mark") and "&CloseCurlyDoubleQuote;" as the unicode character U+201D ("Right Double Quotation Mark").

SOFTWARE/OS VERSIONS
Operating System: Arch Linux 
KDE Plasma Version: 6.5.5
KDE Frameworks Version: 6.22.0
Qt Version: 6.10.2
Kernel Version: 6.18.7-arch1-1 (64-bit)
Graphics Platform: Wayland
Processors: 12 × 12th Gen Intel® Core™ i7-1255U
Memory: 17 GB of RAM (16.4 GB usable)
Graphics Processor: Intel® Iris® Xe Graphics

Comment 1 Jonathan Gruber 2026-02-09 01:09:33 UTC

Created attachment 189374 [details]
Test file for reproducing bug.