Bug 420599

Summary:	KDbNativeStatementBuilder escapes identifiers in a way that KDbParser parses as string literals
Product:	[Frameworks and Libraries] KDb	Reporter:	jordi fita i mas <jfita>
Component:	General	Assignee:	Jarosław Staniek <staniek>
Status:	REPORTED ---
Severity:	normal
Priority:	NOR
Version:	3.2.0
Target Milestone:	---
Platform:	Compiled Sources
OS:	Linux
Latest Commit:		Version Fixed In:
Sentry Crash Report:
Attachments:	Database with a table using SQL keywords as column names and a query showing incorrect results

Description jordi fita i mas 2020-04-26 02:43:17 UTC

Created attachment 127876 [details]
Database with a table using SQL keywords as column names and a query showing incorrect results

SUMMARY

This is a problem for KEXI when building a query based on a table that has columns named after SQL reserved keywords (e.g., “group”).


STEPS TO REPRODUCE (IN KEXI 3.2.1)
1. Create a table with a column named “group”; it does not matter its type or whether its primary key or not.
2. Insert any data that column, but avoid “group” as value.
3. Create a new query with only the “group” column from that table.
4. Switch to SQL view and check that the SQL is indeed correct, with group double quoted.
5. Switch to data view
6. Switch back to SQL view

OBSERVED RESULT
In data view, all rows have the literal string “group” as value of a column named “expr1”.

In the second switch to SQL, the query has changed from

  SELECT "group" FROM source

To

  SELECT 'group' AS expr1 FROM source

In the attached database file the SQL query does not changed because it was saved before switching view, but in data mode it still shows all rows with the same incorrect literal string.


EXPECTED RESULT
In data view, the column should be named “group” and all its values should be the same as in the source table.

In SQL, the query should still have the “group” column double quoted and without any alias.


SOFTWARE/OS VERSIONS
Linux/KDE Plasma: openSUSE 15.1
KDE Plasma Version: N/A
KDE Frameworks Version: 5.69.0
Qt Version: 5.12.2

ADDITIONAL INFORMATION
This seems to be a side effect of bug #364950 (SQL: Handle escape sequences in string literals), in where the parser was changed to parse double-quoted strings as literals in order to recognize escape sequences in them.

As a quick test, i modified KDbSqlScanner.l from

   e_str1  (\\\'|\'\'|[^'])
   e_str2  (\\\"|\"\"|[^"])
   string  (\'{e_str1}*\'|\"{e_str2}*\")

to

   e_str1  (\\\'|\'\'|[^'])
   e_str2  (\\\"|\"\"|[^"])
   string  (\'{e_str1}*\')
   quoted_identifier (\"{e_str2}*\")

plus a new quoted_identifier block, and then KEXI produced the expected result.

I am not sure if this is a but of KDb per se; maybe KEXI is just using the wrong components to generate the SQL. Any thoughts on this?

Comment 1 Jarosław Staniek 2020-04-28 20:01:52 UTC

Thanks Jordi for the greatly explained issue, as always.

This time I am unsure it applies here. See #376052 and particular solution at 
 https://bugs.kde.org/show_bug.cgi?id=376052#c2.

We have this autotested already as well.

We're victims of insufficent docs for the KDbSQL dialect.

PS: There are two consumers of such docs:
- KEXI users
- KDb users.

It would be good to have a single "KEXI SQL documentation" chapter (kind of appendix) and appear, accordingly, both in KDb Doxygen docs and KEXI's handbook.
I am unsure what format would be good for that, maybe markdown. Things are also undecided recently also technical-wise because user docs based on wikis are not the most modern and convenient tech today (https://userbase.kde.org/Kexi/Handbook).

Comment 2 Jarosław Staniek 2020-04-28 20:22:02 UTC

For the example query given in https://bugs.kde.org/show_bug.cgi?id=420599#attach_127876

This works: SELECT group FROM source
(because KDb escapes indentifiers in the driver's dialect and there is no confusion for the parser)

This should work 

SELECT [group] FROM source

once this wish is implemented: https://bugs.kde.org/show_bug.cgi?id=332161

Don't forget to read conclusion from there. And possible find disadvantages.

(As of KDb 3.2.0 [group] means unconditionally a query parameter)

PS: Any such improvements are apparently KDb-only. KEXI would follow.

Comment 3 Jarosław Staniek 2020-04-28 20:25:53 UTC

BTW the task for the bug #332161 is open for adoption.

Comment 4 jordi fita i mas 2020-04-28 21:19:04 UTC

Sorry if i misunderstood it, but from your comment that says

> This time I am unsure it applies here. See #376052 and particular solution at 
> https://bugs.kde.org/show_bug.cgi?id=376052#c2.

it seems like you are talking about the conversion from KDbSQL dialect to the driver’s dialect.
That works as expected because, as you noted, the query

SELECT group FROM source

does indeed produce the expected result.
However, what i am reporting is a problem with the SQL that Kexi stores in its kexi__objectdata table for queries.

In fact, i did not write the SQL query in the attached file; i only used the designer mode and switched to text view just to debug, as it were, the results.

My understanding is that KexiQueryDesignerGuiEditor uses a KDbQueryScheme to build the SELECT query without resorting to string manipulations, but when it is time to save the query and serialize it into kexi__objectdata uses a KDbNativeStatementBuilder to create the SQL string. KDbNativeStatementBuilder calls KDb::escapeIdentifier to escape, if needed, field and table names and, in this case, recognizes the group column as a SQL keyword and escapes it as "group" — KDb::escapeIdentifier always uses double quote. Then, to load the query in data mode, Kexi reads the serialized SQL from kexi__objectdata with KDbParser that recognizes "group" as the string literal 'group' instead of a column name.  Afterwards, this query is converted to the driver-specific dialect, but at that moment it is too late and the column is now an anonymous literal.

One source of confusion is that i dot know if the “native” in KDbNativeStatementBuilder refers to driver-native or KDbSql-native statement. If the former, the problem is from Kexi and not KDb, but then KDbNativeStatementBuilder should use KDbConnection::escapeIdentifier instead of KDb::escapeIdentifier. If the latter, KDbNativeStatementBuilder should build its statements in a way that KDbParser should be able read correctly.


> This should work 
>
> SELECT [group] FROM source
>
> once this wish is implemented: https://bugs.kde.org/show_bug.cgi?id=332161

Using [] to escape identifiers would free the double quotes to be used only for string literals, but it seems to me that KDbNativeStatementBuilder, or KDB::escapeIdentifier, should still be corrected to use the proper KDbSql escape characters.


Again, sorry if i misunderstood your response.

Comment 5 Jarosław Staniek 2020-04-28 22:10:00 UTC

The SQL string in kexi__objectdata is in KDBSQL dialect. This makes the KEXI format backend independent and the database content can be migrated without any change e.g. from a file to a server. I am sorry - it must be too deeply hidden behind the mask.

So that string, when retrieved from kexi__objectdata, is displayed as-is in the SQL view of KEXI because it's assumed to be the portable KDBSQL dialect. (Well, in recent KEXIs (>=3.0 probably) it's actually parsed to check validity and if we're switched from the visual editor, it's formatted as well).

> One source of confusion is that i dot know if the “native” in KDbNativeStatementBuilder refers 
> to driver-native or KDbSql-native statement.


In KexiQueryDesignerGuiEditor::storeLayout() there's 

    KDbNativeStatementBuilder builder(conn, KDb::KDbEscaping);

This means that we store the KDbSql-native statements. The generateSelectStatement() has null driver in the logic everywhere in this case.
I admit the KDbNativeStatementBuilder class name can be confusing in a way that it serves both native SQLs and the KDBSQL. Original purpose was for the native only but then it was soon apparent that large portions of code are common for KDBSQL and native SQLs.

KDbNativeStatementBuilder started as an internal class but now in >=3.1 it's a public API. Therefore renaming to a better name, maybe KDbStatementBuilder, is not possible :)

> Using [] to escape identifiers would free the double quotes to be used only for string literals, but 
> it seems to me that KDbNativeStatementBuilder, or KDB::escapeIdentifier, should still be corrected to 
> use the proper KDbSql escape characters.

Is this still the case given what I said above?

Comment 6 jordi fita i mas 2020-04-28 23:30:10 UTC

>> Using [] to escape identifiers would free the double quotes to be used
>> only for string literals, but it seems to me that
>> KDbNativeStatementBuilder, or KDB::escapeIdentifier, should still be
>> corrected to use the proper KDbSql escape characters.
> 
> Is this still the case given what I said above?

I believe it is, yes.

When i open the query from the attached database, using Kexi 3.2 or 3.3, in text view it is true that the displayed SQL remains unchanged, as you said, but when switching to data view all the rows have a value of 'group', even though the source table, the one with the "group" column, has only integers; design view the expression under “query columns” is now “expr1: 'group'”.

Moreover, if i make any change to the query in design mode, for example toggling twice the visible check box, and *then* switch to text view without saving the query, it changes again to SELECT 'group' AS expr1 FROM source.

Checking with sqlite3 the value stored in kexi__objectdata, it outputs:

> sqlite> select * from kexi__objectdata where o_sub_id = 'sql';
> 2|SELECT "group" FROM source|sql
> sqlite> 

If i am not mistaken, it means that KDbNativeStatementBuilder generated a statement with the column quoted as "group" straight from the KDbQuerySchema, but KDbParser reads it as if it were:

SELECT 'group' AS expr1 FROM source

Thus, even if KDbParser could read columns quoted [as this], unless KDbNativeStatementBuilder used [ and ] as quote characters, the problem would still be there, wouldn’t it?

Sorry if a am not making any sense; i must not understand something from KDb internals :)

Comment 7 Jarosław Staniek 2020-04-29 08:58:18 UTC

> When i open the query from the attached database, using Kexi 3.2 or 3.3, 
> in text view it is true that the displayed SQL remains unchanged, as you 
> said, but when switching to data view all the rows have a value of 'group', 
> even though the source table, the one with the "group" column, has only 
> integers; design view the expression under “query columns” is now “expr1: 
> 'group'”.

KEXI Query plugin tries to avoid modifications if it's possible or easy.
WHen SQL view is open directly, no SQL is altered.
If the design view is used, sometimes alias is added (this is a feature of KDb) in order to have clearly named all columns. There's such case when we use constants or in general, expressions, instead of column names in the "FROM" SQL section.

This is all still KDBSQL dialect we're within. Native dialects are touched only when data view is involved, so - as expected - in a lazy manner.

> Moreover, if i make any change to the query in design mode, for example 
> toggling twice the visible check box, and *then* switch to text view 
> without saving the query, it changes again to SELECT 'group' AS expr1 FROM 
> source.

This is a quality of round-tripping between these two views and can be always improved. I remember a lot of code and states being added to achieve as small changes as possible. 

> If i am not mistaken, it means that KDbNativeStatementBuilder generated a statement with the column quoted as "group" straight from the KDbQuerySchema, but KDbParser reads it as if it were:


There's string  (\'{e_str1}*\'|\"{e_str2}*\") in the parser; type of quotes are unimportant. Double quotes are never meant to be column names (unlike in some SQL dialects) and for 2 more reason this would stay as is: 1. compatibility with prev. KEXIs and MS Access, 2. "group" really looks like a string literal.


> Thus, even if KDbParser could read columns quoted [as this], unless KDbNativeStatementBuilder used [ and ] as quote characters, the problem would still be there, wouldn’t it?

When the #332161 is implemented, KDbNativeStatementBuilder shall support [foo] the way it's designed in #332161, that is, first attempt is to locate identifier foo and if it fails, foo is meant to be a query parameter.