332161 – Add support for identifiers with special characters or equal to reserved words in queries

Bug 332161 - Add support for identifiers with special characters or equal to reserved words in queries

Summary: Add support for identifiers with special characters or equal to reserved word...

Status:	CONFIRMED

Alias:	None

Product:	KEXI
Classification:	Applications
Component:	Queries (other bugs)
Version First Reported In:	2.8.0
Platform:	unspecified All

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	Kexi Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-03-14 23:13 UTC by Jarosław Staniek
Modified:	2020-04-29 15:39 UTC (History)
CC List:	1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Jarosław Staniek 2014-03-14 23:13:57 UTC

It's not possible to use identifier equal to a reserved word in a query.

Reproducible: Always

Steps to Reproduce:
Assuming table T with column A exists.

Then, query "SELECT A AS order FROM T" fails.

SELECT A AS "order" FROM T -works in native SQLite and PostgreSQL
` ` quotes work in MySQL and SQLite

Escaping using [ ] works in SQLite, this is MS Access and MS SQL Server compatibility.

Expected: a way to escape identifier in the SQL view.

Remark:
Using " " to escape identifier is:
- dangerous because can be silently interpreted as string in SQLite; from http://www.sqlite.org/lang_keywords.html : "If a keyword in double quotes (ex: "key" or "glob") is used in a context where it cannot be resolved to an identifier but where a string literal is allowed, then the token is understood to be a string literal instead of an identifier."
- breaks backward compatibility with Kexi release so far

So suggested escaping is [ ] or ` `.  [ ] is well known from MS databases and supported by SQLite already, more visible. Kexi is able to support it for other backends too.

Comment 1 Jarosław Staniek 2014-03-14 23:17:38 UTC

Remark: Using identifier escaping in SQL will break backward compatibility with earlier Kexi anyway. All that can be done is to remove escaping upon saving the SQL unless really needed.

Comment 2 Jarosław Staniek 2015-12-03 00:33:46 UTC

Note: when [] notation is used it's not possible to distinguish between escaped identifier and query parameter using just grammar. To distinguish one needs to try to find the identifier in the current scope/context. If it's not found, it's assumed to be a query parameter.

This is also why query parameters dialogs appear in MSA when field is not found (e.g. because it has been removed or renamed). See also https://msdn.microsoft.com/en-us/library/office/ff845220.aspx "PARAMETERS Declaration (Microsoft Access SQL)".

Summing up: using the [] notation for identifiers breaks compatibility with query parameters notation introduced in Kexi 1.x. Example:

Let T be table with one field "foo INTEGER".
Let Q be query "SELECT [foo], [bar] from T"

Results of running Q:
- Before the proposed semantics of identifier escaping: Ask for "foo" parameter, ask for "bar" parameter
- After the proposed semantics of identifier escaping: Ask for "bar" parameter. [foo] is recognized as T.foo.

Comment 3 Jarosław Staniek 2015-12-03 10:45:57 UTC

My conclusion. One requirement is to keep *format* compatibility with [] notation for  query parameters.

Another "nice to have" requirement is to avoid (I think) a bad practice of MSA, which treats undefined identifiers as query parameters. This is both confusing and error-prone practice, which once introduced, is hard to fix (so MSA is in a trap somehow here).

Based on that:

By format I mean representation of a query stored in Kexi projects. (note: it's represented in KEXISQL / KDBSQL format, normalized subset of SQL)

Proposed solution is to introduce an internal format transparently:
- in the format use [ ] for query parameters as before, so no change for Kexi <= 2.x
- in the format use ` ` ("grave accent" quotes) for escaping identifiers
- in the Kexi GUI use [[ ]] for query parameters, this is a change compared to Kexi <= 2.x
- in the Kexi GUI use [ ] for escaping identifiers (Like in MSA) 

By 'internal' we mean that end-users never see queries in this format unless they read debug information of KDb.

The above means that when saving user-supplied SQL (or fragments of it) to KEXISQL format, following conversions should be made:
- [ ] to ` `
- [[ ]] to [ ]

Similarly, opposite conversion is needed while reading stored SQL to the KEXISQL format.

Conversions are deterministic, do not require complex parsing, we're not loosing any information.

Update for the notations ([[ ]] and [ ]) used by Kexi GUI should be made by modification in the KEXISQL / KDBSQL parser. Conversions should happen in KDb's routines where KDbQuerySchema instances are stored/retrieved. The routines can be exposed in KDb's Utils API for engineers who wish to use the internal format.

Extras:
- '[', ']', ` characters should be escaped as \[, \], respectively, if they exist inside of [ ] or [[ ]]
- ` characters should be escaped as \` respectively if they exist inside of ` `

Comment 4 Jarosław Staniek 2015-12-03 11:06:58 UTC

Version information:
- Despite backward format compatibility built into the solution, the change from [] to [[]] for query parameters in the GUI is too large for an update in 2.9.x series. I propose to move it to Kexi 3.0 and related KDb framework.
- Kexi 2.9.x needs no change for query parameters support: its storage format won't change.
- A small addition in Kexi 2.9.x is welcome to react to the identifiers escaping coming from Kexi >= 3.x. When ` ` is encountered in the internal (KEXISQL) source format, Kexi should display error message (on query loading) such as "This query contains identifiers enclosed in [ ] brackets, what is only supported by Kexi 3 or newer. To use the query with Kexi 2, open the query in SQL view and rename the identifiers so they start with a latin letter or '_' character and only contain only latin letters, numbers and '_' characters".
- In Kexi M= 2.9.9 "SELECT ą" query has a syntax error. This can be improved to be more usable. KEXISQL scanner should be changed so it allows all characters (except space) so this query. Checking should be performed in the scanning routine instead. If invalid character is encountered, message "Identifiers should start with a latin letter or '_' character and only contain only latin letters, numbers and '_' characters" is presented and query can be only opened in the SQL view.

PS: If identifier is a reserved keyword Kexi (calligradb) already has "\"%1\" is a reserved keyword." error.

Comment 5 Jarosław Staniek 2015-12-03 11:08:49 UTC

Changed the summary to more general "Add support for identifiers with special characters or equal to reserved words in queries".

Comment 6 Jarosław Staniek 2015-12-03 11:39:22 UTC

In Kexi select <= 2.9.9 "SELECT `ą`" gives syntax error too. It will stay like this, but in newer Kexi 2.9.x we can display the same error as for the "SELECT ą" explained in comment https://bugs.kde.org/show_bug.cgi?id=332161#c4. Just add "`" character to the list of scanned characters for identifiers.

Comment 7 jordi fita i mas 2020-04-29 15:20:50 UTC

This comment is just to see whether i understood the change proposed in this report.

Let's say we have a table with the terrible name of “table” and two columns, “id” and “integer”, to make things difficult, and want to filter this table by the “integer” column using some criteria.

If i understood all the comments correctly, the native KDbSQL statement, the string that someone using KDbParser::parse(KDbEscapedString) to get a KDbQuerySchema would need to write, should be: 

    SELECT id, `integer` FROM `table` WHERE `integer` = [some criteria]

This is also the string that Kexi would store in kexi__objectdata for that query, regardless of whether it was generated with the designer or in the text editor. What you call the “internal format”.

However, you do not want Kexi users to see that string and, to make it look more like an Access’ query i imagine, when switching to text view it should be shown like this instead:

    SELECT id, [integer] FROM [table] WHERE [integer] = [[some criteria]]

Let's call this the “external format” and, as i understand, it is merely a “cosmetic” query, as it were, because the actual query that KDb should parse and convert to the driver-native SQL is the first, the one stored in kexi__objectdata using the internal format.

Kexi, somehow, should transform from internal format to the external when switching *to* text view and convert back to internal when switching *from* text. However, in comment 3 you said:

> Conversions should happen in KDb's routines where KDbQuerySchema instances are
> stored/retrieved. The routines can be exposed in KDb's Utils API for engineers who
> wish to use the internal format.

I may be completely wrong here, but that seems to imply that the conversion between internal and external formats should be done in KDbConnection::storeObjectData and KDbConnection::loadObjectData. But these functions are *not* called from Kexi’s query plugin, as far as i can see: it uses storeDataBlock/loadDataBlock with the query as a string.

The other possibility is that the format transformation is in KDbParser and KDbNativeStatementBuilder, the classes that Kexi use to convert the SQL string to KDbQuerySchema and back to string.  This option would mean no need to change Kexi’s code, but then i do not see what the internal format gives us.

I am no trying to say that the internal format is useless, just that i do not see it yet. If you could, please, clarify where this conversion would be perhaps i would understand it  better. Thank you :)

Finally, regarding the desire to keep queries to look like MSA as much as possible, is the change for criteria grammar from [this] to [[this]] much better than changing columns references from [this] to `this`?

What i mean is that, assuming KDb would only quote columns when strictly necessary, people will see parameters more often than quoted columns in the SQL generated by Kexi.

It seems, though, that in all probability you had another use case here that i can not see. I would appreciate if you could also explain this a bit more.


Sorry about the rumbling.

Comment 8 Jarosław Staniek 2020-04-29 15:24:45 UTC

Git commit a85f2c20c8af962fadc7566ec5ae5249df079389 by Jaroslaw Staniek.
Committed on 29/04/2020 at 15:24.
Pushed by staniek into branch '332161-autotests'.

Initial change for autotests supporting #332161 (support for id escaping)

- add table 'table' to sqlParserTest.kexi
- add test cases (3 fail for now)

M  +-    --    autotests/parser/data/sqlParserTest.kexi
M  +8    -0    autotests/parser/data/statements.txt

https://commits.kde.org/kdb/a85f2c20c8af962fadc7566ec5ae5249df079389

Comment 9 Jarosław Staniek 2020-04-29 15:39:30 UTC

Conclusion from https://bugs.kde.org/show_bug.cgi?id=332161#c3 is the solution I'd propose. So we're using [] not the quotes in the KDBSQL.
 
> SELECT id, `integer` FROM `table` WHERE `integer` = [some criteria]

SELECT id, [integer] FROM [table] WHERE [integer] = [some criteria]


And I propose to call it "KDBSQL", not a "native KDbSQL" which would be confusing.
Native is a term reserved for whatever dialect the driver's backend requires.

I am sorry that the description. Bugs.kde.org does not allow message editing so it is hard to follow. More complex are proposed on phabricator or https://community.kde.org/KDb / https://community.kde.org/Kexi.

I will not comment others just now, because my answer above probably solves some of them as well. If you visit KEXI IRC or Telegram I can be there.