Bug 390309

Summary: Markdown: Bold style not applied when underscore(s) presented in text surrounded by ** or __
Product: [Frameworks and Libraries] frameworks-syntax-highlighting Reporter: CnZhx <zhx>
Component: syntaxAssignee: Nibaldo G. <nibgonz>
Status: RESOLVED FIXED    
Severity: normal CC: nibgonz
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In: 5.62.0
Attachments: Kate syntax highlighting issues with Markdown

Description CnZhx 2018-02-12 09:49:41 UTC
Created attachment 110554 [details]
Kate syntax highlighting issues with Markdown

Kate has a very impressive syntax highlighting for users but there are some glitches while highlighting Markdown format. A screenshot is attached for showing the issues (please refer to the first several lines).

This could easily be reproduced in Kate: create a new file and save it in markdown format, then copy and paste the following text into it to see the syntax highlighting:
```

# Bold not correct with underscore presented in the text

**md**

**m_d**

__m_d__

# Bold not correct when only one symbol (This looks familiar to Bug #70726)

**b**

__md__

# Bold and Emphasis together not correct

**asterisks and _underscores_**
```
Comment 1 Nibaldo G. 2019-08-24 09:45:20 UTC
Git commit c461c5a78fc8f952a31d7ea846ebd2eeb2278cd6 by Nibaldo González.
Committed on 24/08/2019 at 09:45.
Pushed by ngonzalez into branch 'master'.

Markdown: multiple improvements and fixes

Summary:
## Improve & fix detection of bold/italic text

In bold and italic text highlighting, underscores and asterics aren't allowed inside, for example `**some_text**` isn't highlighted correctly. I have improved regex to detect bold and italic text: now asterisks, underscores and escapes are allowed inside.
{F7273185}

## Improve fenced code blocks

The way to write fenced code blocks varies depending on the Markdown/MultiMarkdown implementation. Some support only 3 backticks, others more than 3 backticks, others both; and others 3 tildes (`~`). [1]

Therefore, I think it's a good idea to support all implementations: code blocks can be written with 3 backticks or more, or with 3 tildes or more . Other editors, such as Visutal Studio Code, Atom and Sublime Text highlight fenced code blocks in this way.
Unfortunately, this is only possible using dynamic rules with several RegExpr, which isn't very optimal. I have no problem modifying that if you don't like it.
{F7273187}

**Also:**
* Add folding in fenced code blocks.
* Add more keywords to identify languages.
* Add to highlight in fenced code blocks the languages that are already loaded by default, since they are included via IncludeRules (JavaScript, TypeScript, JSX, SQL, Mustache/Handlebars, reST & Doxygen).
* Add more languages in the code blocks. Some of the most popular languages are included [3] [4]: C, Go, Java, JavaScript, TypeScript, Matlab, Perl, R & Ruby.

## Improve code blocks

* Highlight single code with more than one backticks. [2]
   {F7273189}
* Fix: highlight indented code only after an empty line.
   {F7273193}

## Improve highlighting of links and references
{F7273195}

## Improve metadata highlighting

Previously, some metadata Keys were highlighted anywhere in the document.  Now the metadata is highlighted only on the first line of the Markdown document. [6] [7]
{F7273196}

## Improves list detection

Through dynamic rules, the indentation of lists is captured, in order to correctly highlight the indented content in them.
Now the text within the lists is highlighted using "dsNormal", so as not to saturate Markdown documents with many colors.
{F7273197}

## Add support of inline HTML

`IncludeRules` is used to highlight only HTML tags. [8]

## Others

* Highlight checkboxs in lists.
   {F7273202}
* Add escape characters [5].
* Add `##Alerts` and `##Modelines` in the comments.
* Some minor improvements, such as replacing some RegExpr rules and adding `column="0"` in some rules.

**Sources:**
* [1] Fenced Markdown code blocks: <https://meta.stackexchange.com/questions/125148/implement-style-fenced-markdown-code-blocks/143705#143705>
* [2] Markdown Syntax Documentation. Code: <https://daringfireball.net/projects/markdown/syntax#code>
* [3] GitHut - Programming Languages and GitHub: <https://githut.info/>
* [4] Most Popular and Influential Programming Languages of 2018: <https://stackify.com/popular-programming-languages-2018/>
* [5] Markdown Syntax Documentation. Backslash escapes: <https://daringfireball.net/projects/markdown/syntax#backslash>
* [6] MutiMarkdown Metadata: <https://fletcher.github.io/MultiMarkdown-5/metadata.html>
* [7] Markdown metadata format: <https://stackoverflow.com/questions/44215896/markdown-metadata-format>
* [8] Markdown Syntax Documentation. Inline HTML: <https://daringfireball.net/projects/markdown/syntax#html>

Reviewers: cullmann, dhaumann, #framework_syntax_highlighting

Reviewed By: dhaumann, #framework_syntax_highlighting

Subscribers: kwrite-devel, kde-frameworks-devel

Tags: #kate, #frameworks

Differential Revision: https://phabricator.kde.org/D23371

M  +40   -11   autotests/folding/example.rmd.fold
M  +453  -53   autotests/folding/test.markdown.fold
M  +2    -2    autotests/html/basic.markdown.html
M  +35   -6    autotests/html/example.rmd.html
M  +414  -14   autotests/html/test.markdown.html
M  +29   -0    autotests/input/example.rmd
M  +405  -5    autotests/input/test.markdown
M  +3    -3    autotests/reference/basic.markdown.ref
M  +37   -8    autotests/reference/example.rmd.ref
M  +467  -67   autotests/reference/test.markdown.ref
M  +567  -133  data/syntax/markdown.xml
M  +104  -12   data/syntax/rmarkdown.xml

https://commits.kde.org/syntax-highlighting/c461c5a78fc8f952a31d7ea846ebd2eeb2278cd6