Bug 475921 - Clang Format custom script formatter fails when includes violate the style-defined order
Summary: Clang Format custom script formatter fails when includes violate the style-de...
Status: CONFIRMED
Alias: None
Product: kdevelop
Classification: Applications
Component: Language Support: CPP (Clang-based) (show other bugs)
Version: 5.12.230802
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: kdevelop-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-10-21 12:14 UTC by shlyonskikh.alexey
Modified: 2023-10-23 08:01 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Example file where formatting fails (174 bytes, text/x-c++src)
2023-10-21 12:14 UTC, shlyonskikh.alexey
Details

Note You need to log in before you can comment on or make changes to this bug.
Description shlyonskikh.alexey 2023-10-21 12:14:07 UTC
Created attachment 162474 [details]
Example file where formatting fails

When using Clang Format custom script formatter, Kdevelop fails to format lines that follow lines starting with "#include" if they are ordered in a way that violates the order defined by clang-format's style.

STEPS TO REPRODUCE
1. Use a clang-format style that enforces the order of #includes (for example, by including "BasedOnStyle: Google" in .clang-format in the project's directory)
2. Try to format some but not all lines in a file in which the order of #includes is violated (for example, line 11 in the attached file)

OBSERVED RESULT
No formatting takes place. If the order-violating lines precede the formatted lines, a "problem matching the left context" error message is displayed in the terminal. If the order-violating lines are included in the formatted lines, but the whole file is not formatted, a "problem matching the text while formatting" error message is displayed.

EXPECTED RESULT
Selected lines are formatted according to clang-format's output.

ADDITIONAL INFORMATION
The cause is likely the KDevelop::extractFormattedTextFromContext function from kdevplatform/util/formattinghelpers.cpp file, called from plugins/customscript/customscript_plugin.cpp . The text matching done there fails to match parts of the file when its lines have been switched around.
Comment 1 Igor Kushnir 2023-10-22 11:11:52 UTC
> The cause is likely the KDevelop::extractFormattedTextFromContext function from kdevplatform/util/formattinghelpers.cpp file, called from plugins/customscript/customscript_plugin.cpp . The text matching done there fails to match parts of the file when its lines have been switched around.
You are right. My reimplementation of extractFormattedTextFromContext() in https://invent.kde.org/kdevelop/kdevelop/-/merge_requests/118 was recently merged and will be available in the next released KDevelop version (5.13 or 6.0). But the reimplementation does not support reordering include lines.

Since the include lines contain alphanumeric (significant) characters, a separate specific mechanism to ignore such diff will have to be implemented. The implementation should be very careful and limited to avoid breaking user code by mismatching context(s) or text. If you have an idea how to fix this, we can discuss it here or you can create a (Draft) merge request for a more involved discussion and review.
Comment 2 shlyonskikh.alexey 2023-10-22 21:52:03 UTC
(In reply to Igor Kushnir from comment #1)
> > The cause is likely the KDevelop::extractFormattedTextFromContext function from kdevplatform/util/formattinghelpers.cpp file, called from plugins/customscript/customscript_plugin.cpp . The text matching done there fails to match parts of the file when its lines have been switched around.
> You are right. My reimplementation of extractFormattedTextFromContext() in
> https://invent.kde.org/kdevelop/kdevelop/-/merge_requests/118 was recently
> merged and will be available in the next released KDevelop version (5.13 or
> 6.0). But the reimplementation does not support reordering include lines.
> 
> Since the include lines contain alphanumeric (significant) characters, a
> separate specific mechanism to ignore such diff will have to be implemented.
> The implementation should be very careful and limited to avoid breaking user
> code by mismatching context(s) or text. If you have an idea how to fix this,
> we can discuss it here or you can create a (Draft) merge request for a more
> involved discussion and review.

I think the most obvious solution that will work in every case is making a separate plugin that passes --offset and --length to clang-format, then replaces the whole file with the result. 

But if that's not an option, here is a rough idea for a new way to format that should handle this case well enough:
1. Split the original file and the formatted file into blocks of lines that start with #include (plus empty ones) and other lines. Take care not to confuse #include lines with raw literal text where one of its lines starts with #include.
2. For every #include block, ensure the user either selects the whole block or none of it, error otherwise. Ensure all blocks match up, error otherwise (if it's not the whole file that's selected): match the normal blocks of code normally (for brace checking, the non-#include line blocks could be treated as continuous text); match includes by remembering the lines contained in the block, splitting them into lexemes beforehand --- if all lines in such a block are the same includes, everything is fine, error otherwise.
3. If no lines in a block are selected, write the original lines. If the whole block is selected, write its lines from the formatter's output. Otherwise (it's not an #include block), do the same thing as now but instead of the whole file only consider this block of lines.
Comment 3 Igor Kushnir 2023-10-23 07:55:31 UTC
(In reply to shlyonskikh.alexey from comment #2)
> I think the most obvious solution that will work in every case is making a
> separate plugin that passes --offset and --length to clang-format, then
> replaces the whole file with the result. 
That's a great idea. clang-format is probably the most popular automatic formatting tool. I don't think many other tools support sorting includes. So a separate plugin for clang-format would be most useful. Furthermore, if other formatting tools support specifying offset and length, they can be supported by the new plugin as well. This would also allow to optimize clang-format-ting by passing the code to be formatted via the standard input and reading from standard output instead of formatting a temporary file (though in case KDevelop's temporary directory resides in a tmpfs, this optimization might not be significant). The plugin would also greatly decrease the probability of misformatting or refusing to format a block of code. After all, even the new implementation of extractFormattedTextFromContext() is crude and unsophisticated (doesn't go much further than matching parentheses).

https://invent.kde.org/kdevelop/kdevelop/-/merge_requests/52 proposed a clang-format plugin in 2019. That plugin's main feature is a UI to configure formatting options. Merging that plugin into KDevelop was rejected because of its dependence on the C++ API of LLVM/Clang. But that plugin could be distributed separately from KDevelop, like kdev-python and other language plugins are. However, this bug can be fixed by a simpler plugin that keeps using the clang-format executable, and therefore can be included in KDevelop by default. A merge request is welcome.

> But if that's not an option, here is a rough idea for a new way to format
> that should handle this case well enough:
I like the separate plugin idea much more, but here are my thoughts on this approach:
1. Confusing with string literals is not much of a concern, because a formatter tool is highly unlikely to reorder include lines within a string literal.
2. The include-reordering validator could kick in only when extractFormattedTextFromContext() is about to refuse to format because of a fatal mismatch. It would search for the include lines around the mismatch spot. First parse the block of includes starting at the fatal-mismatch line in the smaller text fragment (prefix or unformatted text, not the whole formatted file). Put all the parsed include file names (but not '<' or '"', which could theoretically be replaced by a formatter tool too) into a std::vector and sort it. Then parse the number of includes equal to the computed std::vector's size starting at the fatal-mismatch line in the whole formatted file, put these includes into another std::vector and sort it too. Then compare the two vectors: if equal, continue regular non-include matching after the matched include blocks in formatted and unformatted texts.
Comment 4 Igor Kushnir 2023-10-23 08:01:22 UTC
(In reply to shlyonskikh.alexey from comment #2)
> I think the most obvious solution that will work in every case is making a
> separate plugin that passes --offset and --length to clang-format, then
> replaces the whole file with the result. 
Before anyone sets to write such a plugin, [s]he should make sure (e.g. thoroughly test) that these two clang-format options make it work in the same way as extractFormattedTextFromContext() aims to. That is, that clang-format never automatically extends formatting to surrounding code and never breaks code in the formatted code fragment.