Generated columns and COLLATE in the AS parens

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Generated columns and COLLATE in the AS parens

Markus Winand
Hi!

I think there might be a glitch in the way SQLite 3.31.x derives the collation information from the expression of a generated column.

In particular, COLLATE inside the AS parens seems to be ignored, but it is honoured after the parens:

SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE TABLE tmp (str VARCHAR(255), str_nc1 GENERATED ALWAYS AS (str COLLATE NOCASE), str_nc2 GENERATED ALWAYS AS (str) COLLATE NOCASE);
sqlite> INSERT INTO tmp VALUES ('a'), ('A'), ('b'), ('B');
sqlite> SELECT * FROM tmp ORDER BY str;
A|A|A
B|B|B
a|a|a
b|b|b
sqlite> SELECT * FROM tmp ORDER BY str_nc1;
A|A|A
B|B|B
a|a|a
b|b|b
sqlite> SELECT * FROM tmp ORDER BY str_nc2;
a|a|a
A|A|A
b|b|b
B|B|B
sqlite> SELECT * FROM tmp ORDER BY (str COLLATE NOCASE);
a|a|a
A|A|A
b|b|b
B|B|B
sqlite>

I believe the "ORDER BY str_nc1” should yield the same row order as the last two queries.

-markus
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Generated columns and COLLATE in the AS parens

Richard Hipp-3
On 1/29/20, Markus Winand <[hidden email]> wrote:
> Hi!
>
> I think there might be a glitch in the way SQLite 3.31.x derives the
> collation information from the expression of a generated column.

I think the current behavior is correct.

If you want a column to have a non-standard collating sequence, you
should add a COLLATE constraint to that column definition.  The fact
that there is a COLLATE operator on the expression that determines the
value of that column seems irrelevant.

Consider this:

CREATE TABLE t1(a TEXT DEFAULT('xyzzy' COLLATE nocase));

Would you expect the COLLATE operator in the DEFAULT clause to change
the collating sequence associated with column a?  Why should a
GENERATED ALWAYS AS constraint work differently from a DEFAULT
constraint?

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Generated columns and COLLATE in the AS parens

Markus Winand

> On 29 Jan 2020, at 14:59, Richard Hipp <drh at sqlite.org> wrote:
>
> On 1/29/20, Markus Winand <markus.winand at winand.at> wrote:
>> Hi!
>>
>> I think there might be a glitch in the way SQLite 3.31.x derives the
>> collation information from the expression of a generated column.
>
> I think the current behavior is correct.
>
> If you want a column to have a non-standard collating sequence, you
> should add a COLLATE constraint to that column definition.  The fact
> that there is a COLLATE operator on the expression that determines the
> value of that column seems irrelevant.
>
> Consider this:
>
> CREATE TABLE t1(a TEXT DEFAULT('xyzzy' COLLATE nocase));
>
> Would you expect the COLLATE operator in the DEFAULT clause to change
> the collating sequence associated with column a?  Why should a
> GENERATED ALWAYS AS constraint work differently from a DEFAULT
> constraint?

The critical point here is that SQLite accepts a GENERATED clause without explicit type:

CREATE TABLE tmp (str VARCHAR(255), str_nc1 GENERATED ALWAYS AS (str COLLATE NOCASE), str_nc2 GENERATED ALWAYS AS (str) COLLATE NOCASE);

In that case, the type of the generated column should taken from the result type of the expression[0]. As collations are part of the character string types[1], also the collation is taken from the expression. The COLLATE clause applied to expressions (as opposed to following the name of a data type) just changes the collation in of the character string type of that expression[2].

If the GENERATED clause explicitly sets a data type, then the collation of the expression is irrelevant as it doesn’t affect the type of the generated column, similar to your example with DEFAULT (it is only a value assignment in that case).

At the end everything boils down to this question:
What is the type, including the character set and collation, of generated columns that don’t specify a type explicitly? I think the only sensible answer is that it is the type of the expression, including its character set and collation.

Unfortunately, the “what would PostgreSQL do” approach doesn’t provide guidance here as PostgreSQL requires an explicit type for generated columns (and so do MySQL and MariaDB). In SQL Server, however, it works like I would expect it.

I guess the SQLite approach to types might affect all of my reasoning, but in other cases like the following it seems that SQLite also “transports” the applicable collation together with the result (type?) of an expression:

sqlite> CREATE TABLE tmp (str VARCHAR(255));
sqlite> INSERT INTO tmp VALUES ('a'), ('A'), ('b'), ('B');
sqlite> SELECT * FROM (SELECT str COLLATE NOCASE FROM tmp) ORDER BY str;
a
A
b
B

-markus

References to SQL:2016, Part 2. I know that this is not a strong argument to you, but nevertheless ;)

[0] 11.4 SR 13 c:
        • If <generation clause> is specified, then the declared type of GE.
 Whereas GE is defined in SR 10 a:
        • Let GE be the <generation expression> contained in GC.

[1] 4.2.1
   A character string type descriptor contains:
   [….]
    • —  The catalog name, schema name, and collation name of the collation of the character string type.

[2] 6.31 SR 4a:
        • If <collate clause> is specified, then the declared type of the <character factor> is the declared type of the <character primary>, except that the declared type collation is the collation identified by <collate clause>, and its collation derivation is explicit.


>
> --
> D. Richard Hipp
> drh at sqlite.org
>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Generated columns and COLLATE in the AS parens

Richard Hipp-3
On 1/30/20, Markus Winand <[hidden email]> wrote:
>
> Unfortunately, the “what would PostgreSQL do” approach doesn’t provide
> guidance here.

Maybe it does.  PostgreSQL doesn't allow typeless columns, but it does
allow columns with unspecified collating sequences, does it not?  What
if you have a normal column X with some collating sequence C and then
a generated column Y that as just "AS(X)".  If you do comparisons on
column Y, which collating sequence does it use - the default or C?
Can you run that experiment for us?

Or maybe you are thinking the collating sequence of the expression in
the AS clause should only be carried through into the generated column
if it is explicitly stated, and not implied?

What happens if there is a collating sequence specified in the AS
clause and also another collating sequence on the column definitions?

  CREATE TABLE t1(x TEXT, y TEXT AS (x COLLATE nocase) COLLATE rtree);

Which collating sequence should be used for "SELECT * FROM t1 WHERE y='xyz';"?

Regarding WWPD: If you can convince Tom Lane and/or Bruce Momjian to
send me an email that says "An explicit collating sequence at the
top-level of a GENERATED ALWAYS AS clause should be carried through as
the default collating sequence of the generated column itself", then
I'll change it.  :-)

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Generated columns and COLLATE in the AS parens

David Raymond
From a Windows installation of 12.0


testing=> select version();
                          version
------------------------------------------------------------
 PostgreSQL 12.0, compiled by Visual C++ build 1914, 64-bit
(1 row)

Time: 0.283 ms
testing=> create table foo (
testing(> a text,
testing(> b text collate "C",
testing(> c text collate "en-US-x-icu",
testing(> d text generated always as (a) stored,
testing(> e text generated always as (b) stored,
testing(> f text generated always as (c) stored,
testing(> g text collate "C" generated always as (c) stored,
testing(> h text generated always as (c collate "C") stored,
testing(> i text collate "C" generated always as (c collate "en-US-x-icu") stored
testing(> );
CREATE TABLE
Time: 81.025 ms
testing=> \d foo
                                      Table "public.foo"
 Column | Type |  Collation  | Nullable |                       Default
--------+------+-------------+----------+------------------------------------------------------
 a      | text |             |          |
 b      | text | C           |          |
 c      | text | en-US-x-icu |          |
 d      | text |             |          | generated always as (a) stored
 e      | text |             |          | generated always as (b) stored
 f      | text |             |          | generated always as (c) stored
 g      | text | C           |          | generated always as (c) stored
 h      | text |             |          | generated always as (c COLLATE "C") stored
 i      | text | C           |          | generated always as (c COLLATE "en-US-x-icu") stored

testing=> select attnum, attname, attgenerated, collname from pg_attribute left outer join pg_collation on pg_attribute.attcollation = pg_collation.oid where attrelid = 'foo'::regclass and attnum > 0 order by attnum;
 attnum | attname | attgenerated |  collname
--------+---------+--------------+-------------
      1 | a       |              | default
      2 | b       |              | C
      3 | c       |              | en-US-x-icu
      4 | d       | s            | default
      5 | e       | s            | default
      6 | f       | s            | default
      7 | g       | s            | C
      8 | h       | s            | default
      9 | i       | s            | C
(9 rows)

Time: 0.590 ms
testing=>


-----Original Message-----
From: sqlite-users <[hidden email]> On Behalf Of Richard Hipp
Sent: Thursday, January 30, 2020 12:21 PM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Generated columns and COLLATE in the AS parens

On 1/30/20, Markus Winand <[hidden email]> wrote:
>
> Unfortunately, the “what would PostgreSQL do” approach doesn’t provide
> guidance here.

Maybe it does.  PostgreSQL doesn't allow typeless columns, but it does
allow columns with unspecified collating sequences, does it not?  What
if you have a normal column X with some collating sequence C and then
a generated column Y that as just "AS(X)".  If you do comparisons on
column Y, which collating sequence does it use - the default or C?
Can you run that experiment for us?

Or maybe you are thinking the collating sequence of the expression in
the AS clause should only be carried through into the generated column
if it is explicitly stated, and not implied?

What happens if there is a collating sequence specified in the AS
clause and also another collating sequence on the column definitions?

  CREATE TABLE t1(x TEXT, y TEXT AS (x COLLATE nocase) COLLATE rtree);

Which collating sequence should be used for "SELECT * FROM t1 WHERE y='xyz';"?

Regarding WWPD: If you can convince Tom Lane and/or Bruce Momjian to
send me an email that says "An explicit collating sequence at the
top-level of a GENERATED ALWAYS AS clause should be carried through as
the default collating sequence of the generated column itself", then
I'll change it.  :-)

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Generated columns and COLLATE in the AS parens

Markus Winand
In reply to this post by Richard Hipp-3


> On 30 Jan 2020, at 18:20, Richard Hipp <[hidden email]> wrote:
>
> On 1/30/20, Markus Winand <[hidden email]> wrote:
>>
>> Unfortunately, the “what would PostgreSQL do” approach doesn’t provide
>> guidance here.
>
> Maybe it does.  PostgreSQL doesn't allow typeless columns, but it does
> allow columns with unspecified collating sequences, does it not?  

If you do not specify a COLLATE clause in a column definition, the default collation of the effective character set is used.

The name ‘default’ is still a collation. Quoting https://www.postgresql.org/docs/current/collation.html:

        The collation of an expression can be the "default" collation,
        which means the locale settings defined for the database.


> What
> if you have a normal column X with some collating sequence C and then
> a generated column Y that as just "AS(X)”.

It depends on whether the generated column specifies a <data type> or not (see below).


>  If you do comparisons on
> column Y, which collating sequence does it use - the default or C?
>
> Can you run that experiment for us?

David Raymond did.

Here is what happens, split into the three relevant cases.

> testing=> create table foo (
> testing(> a text,
> testing(> b text collate "C",
> testing(> c text collate "en-US-x-icu”,
> […]
> attnum | attname | attgenerated |  collname
> --------+---------+--------------+-------------
>      1 | a       |              | default
>      2 | b       |              | C
>      3 | c       |              | en-US-x-icu

Pretty obvious.

The next columns:

> testing(> d text generated always as (a) stored,
> testing(> e text generated always as (b) stored,
> testing(> f text generated always as (c) stored,

>      4 | d       | s            | default
>      5 | e       | s            | default
>      6 | f       | s            | default

The generated column definitions mention a <data type>, in that case the types character set’s default collation is used — coincidentally called “default”. Thus, it is the same case as column “a”.

Finally:

> testing(> g text collate "C" generated always as (c) stored,
> testing(> h text generated always as (c collate "C") stored,
> testing(> i text collate "C" generated always as (c collate "en-US-x-icu") stored
> testing(> );

>      7 | g       | s            | C
>      8 | h       | s            | default
>      9 | i       | s            | C

Again, <data type> is specified, thus this collation is used. The collation of “h” is not “C” because there is an explicit type definition “text”, which includes a character set and a default collation.

The collation of the expression is really only relevant if there is no <data type> (and no COLLATE outside the expression).

>
> Or maybe you are thinking the collating sequence of the expression in
> the AS clause should only be carried through into the generated column
> if it is explicitly stated, and not implied?

There is only one case when the type of the expression, including the character set and collation should be carried through into the generated column: if the generated column doesn’t explicitly define a type.

>
> What happens if there is a collating sequence specified in the AS
> clause and also another collating sequence on the column definitions?
>
>  CREATE TABLE t1(x TEXT, y TEXT AS (x COLLATE nocase) COLLATE rtree);
>
> Which collating sequence should be used for "SELECT * FROM t1 WHERE y='xyz';"?

There are three places where a COLLATE clause in the definition of a generated column is allowed:

(1) After the data type (see 6.1, <predefined type> in the BNF)
(2) Inside the expression (see 6.31, <character factor> in the BNF)
(3) At the very end (see 11.4, <column definition> in the BNF).

There is a syntax rule prohibiting (1) and (3) being used at the same time (11.4 SR12b). The same rule says that the effect of (1) and (3) is the same:

> • Otherwise, <collate clause> shall not be both specified in <data type> and immediately contained in <column definition>. If <collate clause> is immediately contained in <column definition>, then it is equivalent to specifying an equivalent <collate clause> in <data type>.

This is the case you were just are asking about.

Case (2) might affect the collation of the expression, which is only relevant in case there is neither (1) or (3) specified. In that case, also the collation is taken from the expression—via the data type (11.4 SR13c).

> 13)  The declared type of the column is
> Case:
> • a)  If <data type> is specified, then that data type. If <generation clause> is also specified, then the declared type of <generation expression> shall be assignable to the declared type of the column.
> • b)  If <domain name> is specified, […]
>   • c)  If <generation clause> is specified, then the declared type of GE.

As you see, as soon as a <data type> is specified, the collation of the expression is irrelevant. This is what can be demonstrated in PostgreSQL. Skipping <data type> is currently not possible in PostgreSQL, thus PostgreSQL cannot be used to demonstrate c).

> Regarding WWPD: If you can convince Tom Lane and/or Bruce Momjian to
> send me an email that says "An explicit collating sequence at the
> top-level of a GENERATED ALWAYS AS clause should be carried through as
> the default collating sequence of the generated column itself", then
> I'll change it.  :-)

I won’t because that’s not how it works.

It doesn’t matter if there is a COLLATE clause in the expression nor where it appears. My example was just making it explicit.

The only thing that matters is whether there is a <data type> or COLLATE clause outside the expression or not. If neither of them is present, the collation of the expression is the relevant one for the generated column. It doesn’t matter where the expression got its collation from.

Even if there is no COLLATE clause in the expression, the expression’s collation should still be taken if there is neither a <data type> nor a COLLATE outside the expression—like in this example:

sqlite> CREATE TABLE tmp (base_column text COLLATE NOCASE, gen_column GENERATED ALWAYS AS (base_column));
sqlite> INSERT INTO tmp VALUES ('a'), ('A'), ('b'), ('B');
sqlite> select * from tmp order by base_column;
a|a
A|A
b|b
B|B
sqlite> select * from tmp order by gen_column;
A|A
B|B
a|a
b|b

The expected result would be the same order for both queries (NOCASE). IMHO: that would also be the only reasonable expectation. Similar to a “pass through” column of a view:

sqlite> CREATE VIEW tmpv AS SELECT * FROM tmp;
sqlite> SELECT * FROM tmpv ORDER BY base_column;
a|a
A|A
b|b
B|B

As soon as you add a <data type> to the generated column, the expression's collation becomes irrelevant.

   gen_column TEXT GENERATED ALWAYS AS (base_column)

Adding “TEXT” as <data type> activates the branch a) of the above quoted SR13.

In lack of a <data type> and COLLATE outside the expression, brach c) triggers. This is the case I’m talking about. Whether that collation of the expression is implicit or explicit doesn’t matter.

-markus

>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Generated columns and COLLATE in the AS parens

Keith Medcalf
In reply to this post by Markus Winand

On: Wednesday, 29 January, 2020 06:45, Markus Winand <[hidden email]> wrote:

>I think there might be a glitch in the way SQLite 3.31.x derives the
>collation information from the expression of a generated column.

>In particular, COLLATE inside the AS parens seems to be ignored, but it
>is honoured after the parens:

Carrying the COLLATE from an expression into the column definition is incorrect.  The definition of a generated column is:

<identifier> [type affinity] [GENERATED ALWAYS AS (<expression>)] [COLLATE <identifier>] [<constraints> ...]

so why would the so including a COLLATE as part of the expression applies to the expression, and not to the column.  Except for the ordering of <identifier> which must be first, followed by the type which must come second, the other bits can appear in pretty much any ordering you heart desires (it is a Dim Sum of clauses).  

If the <type affinity> of the column not specified then its "type affinity" is BLOB (None).
If no COLLATE is specified for the column, then the default collation BINARY applies.
If no NULL constraint is specified then NULL values are permitted.
If no DEFAULT expression is specified then the default value is NULL.

Why would you think that the COLLATE applied to the evaluation of an expression would "carry out" to the column itself?  Especially in light of the fact that the type/collation of the expression in a "DEFAULT (<expression>)" does not carry out to the column definition?

The collation sequence applies to "comparison" operations of TEXT values, and not to anything else (it is ignored for all other value types).  It is an attribute of a column (like column affinity) and not an attribute of a value.

In other words,

x AS (y == 'yahoo' collate nocase)

applies the affinity nocase to the evaluation of the == comparison operation.  The result of evaluation of the expression is a data value with a concrete type (integer) and no associated collation.  This value then has whatever column affinity is specified applied to it and inherits the collation of the containing column.

x AS (y collate nocase)

is the same thing as

x AS (y)

since there is no "comparison" operation in the expression to which the collation can be applied (it is a useless attribute to the expression).  The result of the expression is a value of the concrete type of y with no associated collation.  This value then has the whatever column affinity is specified applied to it and inherits the collation of the containing column.

ie, "values" do not have a collation associated with them, column definitions and comparison operators on text have collation sequences associated with them.

>SQLite version 3.31.1 2020-01-27 19:55:54
>Enter ".help" for usage hints.
>Connected to a transient in-memory database.
>Use ".open FILENAME" to reopen on a persistent database.
>sqlite> CREATE TABLE tmp (str VARCHAR(255), str_nc1 GENERATED ALWAYS AS
>(str COLLATE NOCASE), str_nc2 GENERATED ALWAYS AS (str) COLLATE NOCASE);
>sqlite> INSERT INTO tmp VALUES ('a'), ('A'), ('b'), ('B');
>sqlite> SELECT * FROM tmp ORDER BY str;
>A|A|A
>B|B|B
>a|a|a
>b|b|b
>sqlite> SELECT * FROM tmp ORDER BY str_nc1;
>A|A|A
>B|B|B
>a|a|a
>b|b|b
>sqlite> SELECT * FROM tmp ORDER BY str_nc2;
>a|a|a
>A|A|A
>b|b|b
>B|B|B
>sqlite> SELECT * FROM tmp ORDER BY (str COLLATE NOCASE);
>a|a|a
>A|A|A
>b|b|b
>B|B|B
>sqlite>

>I believe the "ORDER BY str_nc1” should yield the same row order as the
>last two queries.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Generated columns and COLLATE in the AS parens

Markus Winand
Let me first explain how collations work in standard SQL. I think my answers below make more sense then.

In the SQL standard….

- character string **types** have properties such as fix-length vs. variable length, the length (limit), character set and also the collation.

- columns have types, which include the collation if it is a character string type.
  However, it is not the column that has a collation. The column has a type, which might have a collation.

- values have a static type (the so-called “declared type”). Static means it is determined at “compile time” from the syntax and the data dictionary.

- Expressions also have a declared type. More elaborate: the result value of an expression has a type that is statically determined (again  the “declared type”).

- expressions of which the declared type is a character string type, will also have all the properties of character string types (including the collation).

- When character strings are compared, the effective collation is determined from the declared types of the operands.
   The SQL standard defines rules how to do that (e.g. in 9075-2 9.15 "Collation determination”)
   The most important rule is the "collation derivation” order: explicit, implicit, none. That’s the rule that says
   the COLLATE clause on expressions (“explicit” derivation) is stronger than the COLLATE clause following type names (“implicit” derivation).

But note that the last statement is not the reason the collation of the generation expression takes precedence of the data type — IT DOESN’T!


As far as I know, this is basically unchanged sind SQL-92. If you do not have access to the current release of the standard (of 2016, to which my references apply), you can also have a look at SQL-92 here (search for "4.2  Character strings”):

https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

Generated columns where introduced with SQL:2003 and are thus not in the linked document.


Putting these mechanics of working with collations in context of generated columns:

- Generated columns need a type like any other column.
  If that type happens to be a character string type, the type includes the collation as part of the types properties.

- Even though generated columns need a type, they allow skipping the type in the definition of the generated column—that’s very unique to generated columns.
  E.g. specifying a DEFAULT clause does not lift the requirement to explicitly state the type of the column.

- If a generated column doesn’t have an explicit data type mentioned, the the data type of the result of the expression is used.
  Whatever data type that is, whatever properties that type has.
  If it happens to be a character string type, it will also have a “collation” property.

However, the declared type of the expression (including its properties) is only relevant if the column definition doesn’t explicitly state a data type.

These are two different cases in standard SQL:

  C1 TEXT GENERATED ALWAYS AS (…)
  C2      GENERATED ALWAYS AS (…)

The type of C1 is TEXT, including all its default properties.
The type of C2 is the type of the result of the expression, including all its properties.

The “what’s new” paper for SQL:2003 mentions that case explicitly.

http://sigmodrecord.org/publications/sigmodRecord/0403/E.JimAndrew-standard.pdf

The example on the last page:

> CREATE TABLE EMPLOYEES (
>  EMP_ID INTEGER,
>  SALARY DECIMAL(7,2),
>  BONUS DECIMAL(7,2),
>  TOTAL_COMP GENERATED ALWAYS AS (SALARY + BONUS)
>  )
>
> TOTAL_COMP is a generated column of the EMPLOYEES table. The data type of the TOTAL_COMP is the data type of the expression (SALARY_BONUS).

I think that’s a typo and should read (SALARY + BONUS).

> Users may optionally specify a data type for a generated column, in which case the specified data type must match with the data type of the associated expression.

The last part is not 100% in line with the current standard. The types don’t need to match, they need to be assignable. I don’t know if this is just a simplification for this paper or if SQL:2003 really hat that requirement.

Further comments to your statements below…

> On 30 Jan 2020, at 23:06, Keith Medcalf <[hidden email]> wrote:
>
>
> On: Wednesday, 29 January, 2020 06:45, Markus Winand <[hidden email]> wrote:
>
>> I think there might be a glitch in the way SQLite 3.31.x derives the
>> collation information from the expression of a generated column.
>
>> In particular, COLLATE inside the AS parens seems to be ignored, but it
>> is honoured after the parens:
>
> Carrying the COLLATE from an expression into the column definition is incorrect.  The definition of a generated column is:
>
> <identifier> [type affinity] [GENERATED ALWAYS AS (<expression>)] [COLLATE <identifier>] [<constraints> ...]
>
> so why would the so including a COLLATE as part of the expression applies to the expression, and not to the column.  Except for the ordering of <identifier> which must be first, followed by the type which must come second, the other bits can appear in pretty much any ordering you heart desires (it is a Dim Sum of clauses).  
>
> If the <type affinity> of the column not specified then its "type affinity" is BLOB (None).

This is where the SQL standard is different: if there is no type, the column gets the type of the expression (ISO/IEC 9075-2, 11.4 SR13c)

> If no COLLATE is specified for the column, then the default collation BINARY applies.

Again, the SQL standard is different: If there is no COLLATE clause **and** not data type specified, the collation is inherited via the data type of the result of the expression.

If the column definition explicitly states a data type, that type comes with a collation, which takes precedence of the type of the expression.

> If no NULL constraint is specified then NULL values are permitted.

If not NOT NULL constraint….

> If no DEFAULT expression is specified then the default value is NULL.
>
> Why would you think that the COLLATE applied to the evaluation of an expression would "carry out" to the column itself?

Because this is what is written in the SQL standard. And it make sense when collations are a property of character string types, which they are in the SQL standard.

> Especially in light of the fact that the type/collation of the expression in a "DEFAULT (<expression>)" does not carry out to the column definition?

Because you cannot use DEFAULT without specifying a type name at the very same time (in standard SQL).

11.4, SR 11) If <generation clause> is omitted, then either <data type> or <domain name> shall be specified.

If it would be possible to skip the data type when a DEFAULT clause is present, I guess the same rules as for generated columns would apply (take the type, including all its properties) of the expressions result.

>
> The collation sequence applies to "comparison" operations of TEXT values, and not to anything else (it is ignored for all
> other value types).  It is an attribute of a column (like column affinity) and not an attribute of a value.

It is an attribute of character string **types**.

Character string **columns** have collations because their type has one.

Values have a collation because their type has one.

>
> In other words,
>
> x AS (y == 'yahoo' collate nocase)
>
> applies the affinity nocase to the evaluation of the == comparison operation.  The result of evaluation of the expression is a data value with a concrete type (integer) and no associated collation.  This value then has whatever column affinity is specified applied to it and inherits the collation of the containing column.

But the case was bringing up is that the generated column doesn’t have a type declared.

>
> x AS (y collate nocase)
>
> is the same thing as
>
> x AS (y)
>
> since there is no "comparison" operation in the expression to which the collation can be applied (it is a useless attribute to the expression).

Assuming Y is a character string and the collation of Y is not NOCASE, the type of both expressions is different (but not the value). As the type contains the collation, it is not necessarily useless.

>  The result of the expression is a value of the concrete type of y with no associated collation.

The result of the expression “(Y)” has the very same declared type as Y itself. If Y is a character string, the result of the expression has the same collation as Y itself.

>  This value then has the whatever column affinity is specified applied to it and inherits the collation of the containing column.

If the column has a defined data type or an explicit COLLATE outside the expression, these collations will be used. Otherwise: the collation of the declared type of the expression is used because the entire data type of the result of the expression is used.

>
> ie, "values" do not have a collation associated with them, column definitions and comparison operators on text have collation sequences associated with them.

Values have a declared types, character string types have collations.

Nobody cares about the collation of a character string until it takes part in a comparison.

-markus

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Generated columns and COLLATE in the AS parens

Keith Medcalf

If the standard requires static typing then anything it says about static types does not apply to SQLite3 because SQLite3 uses dynamic strong typing and does not use static fixed typing.

That means that the "type" assigned in a column is merely a "suggestion" on how to store the data for this column, and that data will be converted to this type if possible (this is called the column affinity).  A particular "value" (stored at the intersection of a column and a row) may be of any "type", notwithstanding what the column declaration of the table has to say about it.

The supported "types" are:

 NULL - a NULL value
 INTEGER a 64-bit signed integer
 REAL - a 64-bit IEEE-754 double precision float
 TEXT - a sequence of bytes in the encoding of the underlying database (set by pragma encoding)
 BLOB - a sequence of bytes with no meaning

A "value" has a concrete "type" associated with it and nothing else.
A "column" has a preferred storage "type" (called the affinity) and a bunch of constraints, plus a collation specifying the collation sequence to apply when the "value" contained in some particular row of that column contains TEXT.

For example:

sqlite> create table x(x integer default ('1'));
sqlite> insert into x default values;
sqlite> select typeof(x), x from x;
integer|1

The column "x" in the table "x" has integer affinity (would prefer data to be stored as integers if it can be).
The default value is a text string '1'.
When you execute the insert, x takes the default value '1'.
When this value is stored the column affinity is applied and the value that is actually stored is the integer 1.

Similarly,

sqlite> drop table x;
sqlite> create table x(x integer default ('test'));
sqlite> insert into x default values;
sqlite> select typeof(x), x from x;
text|test

The application of affinity integer for the column cannot convert the value 'test' into an integer, so the value is stored as text.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Markus Winand
>Sent: Friday, 31 January, 2020 03:29
>To: SQLite mailing list <[hidden email]>
>Subject: Re: [sqlite] Generated columns and COLLATE in the AS parens
>
>Let me first explain how collations work in standard SQL. I think my
>answers below make more sense then.
>
>In the SQL standard….
>
>- character string **types** have properties such as fix-length vs.
>variable length, the length (limit), character set and also the
>collation.
>
>- columns have types, which include the collation if it is a character
>string type.
>  However, it is not the column that has a collation. The column has a
>type, which might have a collation.
>
>- values have a static type (the so-called “declared type”). Static means
>it is determined at “compile time” from the syntax and the data
>dictionary.
>
>- Expressions also have a declared type. More elaborate: the result value
>of an expression has a type that is statically determined (again  the
>“declared type”).
>
>- expressions of which the declared type is a character string type, will
>also have all the properties of character string types (including the
>collation).
>
>- When character strings are compared, the effective collation is
>determined from the declared types of the operands.
>   The SQL standard defines rules how to do that (e.g. in 9075-2 9.15
>"Collation determination”)
>   The most important rule is the "collation derivation” order: explicit,
>implicit, none. That’s the rule that says
>   the COLLATE clause on expressions (“explicit” derivation) is stronger
>than the COLLATE clause following type names (“implicit” derivation).
>
>But note that the last statement is not the reason the collation of the
>generation expression takes precedence of the data type — IT DOESN’T!
>
>
>As far as I know, this is basically unchanged sind SQL-92. If you do not
>have access to the current release of the standard (of 2016, to which my
>references apply), you can also have a look at SQL-92 here (search for
>"4.2  Character strings”):
>
>https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
>
>Generated columns where introduced with SQL:2003 and are thus not in the
>linked document.
>
>
>Putting these mechanics of working with collations in context of
>generated columns:
>
>- Generated columns need a type like any other column.
>  If that type happens to be a character string type, the type includes
>the collation as part of the types properties.
>
>- Even though generated columns need a type, they allow skipping the type
>in the definition of the generated column—that’s very unique to generated
>columns.
>  E.g. specifying a DEFAULT clause does not lift the requirement to
>explicitly state the type of the column.
>
>- If a generated column doesn’t have an explicit data type mentioned, the
>the data type of the result of the expression is used.
>  Whatever data type that is, whatever properties that type has.
>  If it happens to be a character string type, it will also have a
>“collation” property.
>
>However, the declared type of the expression (including its properties)
>is only relevant if the column definition doesn’t explicitly state a data
>type.
>
>These are two different cases in standard SQL:
>
>  C1 TEXT GENERATED ALWAYS AS (…)
>  C2      GENERATED ALWAYS AS (…)
>
>The type of C1 is TEXT, including all its default properties.
>The type of C2 is the type of the result of the expression, including all
>its properties.
>
>The “what’s new” paper for SQL:2003 mentions that case explicitly.
>
>http://sigmodrecord.org/publications/sigmodRecord/0403/E.JimAndrew-
>standard.pdf
>
>The example on the last page:
>
>> CREATE TABLE EMPLOYEES (
>>  EMP_ID INTEGER,
>>  SALARY DECIMAL(7,2),
>>  BONUS DECIMAL(7,2),
>>  TOTAL_COMP GENERATED ALWAYS AS (SALARY + BONUS)
>>  )
>>
>> TOTAL_COMP is a generated column of the EMPLOYEES table. The data type
>of the TOTAL_COMP is the data type of the expression (SALARY_BONUS).
>
>I think that’s a typo and should read (SALARY + BONUS).
>
>> Users may optionally specify a data type for a generated column, in
>which case the specified data type must match with the data type of the
>associated expression.
>
>The last part is not 100% in line with the current standard. The types
>don’t need to match, they need to be assignable. I don’t know if this is
>just a simplification for this paper or if SQL:2003 really hat that
>requirement.
>
>Further comments to your statements below…
>
>> On 30 Jan 2020, at 23:06, Keith Medcalf <[hidden email]> wrote:
>>
>>
>> On: Wednesday, 29 January, 2020 06:45, Markus Winand
><[hidden email]> wrote:
>>
>>> I think there might be a glitch in the way SQLite 3.31.x derives the
>>> collation information from the expression of a generated column.
>>
>>> In particular, COLLATE inside the AS parens seems to be ignored, but
>it
>>> is honoured after the parens:
>>
>> Carrying the COLLATE from an expression into the column definition is
>incorrect.  The definition of a generated column is:
>>
>> <identifier> [type affinity] [GENERATED ALWAYS AS (<expression>)]
>[COLLATE <identifier>] [<constraints> ...]
>>
>> so why would the so including a COLLATE as part of the expression
>applies to the expression, and not to the column.  Except for the
>ordering of <identifier> which must be first, followed by the type which
>must come second, the other bits can appear in pretty much any ordering
>you heart desires (it is a Dim Sum of clauses).
>>
>> If the <type affinity> of the column not specified then its "type
>affinity" is BLOB (None).
>
>This is where the SQL standard is different: if there is no type, the
>column gets the type of the expression (ISO/IEC 9075-2, 11.4 SR13c)
>
>> If no COLLATE is specified for the column, then the default collation
>BINARY applies.
>
>Again, the SQL standard is different: If there is no COLLATE clause
>**and** not data type specified, the collation is inherited via the data
>type of the result of the expression.
>
>If the column definition explicitly states a data type, that type comes
>with a collation, which takes precedence of the type of the expression.
>
>> If no NULL constraint is specified then NULL values are permitted.
>
>If not NOT NULL constraint….
>
>> If no DEFAULT expression is specified then the default value is NULL.
>>
>> Why would you think that the COLLATE applied to the evaluation of an
>expression would "carry out" to the column itself?
>
>Because this is what is written in the SQL standard. And it make sense
>when collations are a property of character string types, which they are
>in the SQL standard.
>
>> Especially in light of the fact that the type/collation of the
>expression in a "DEFAULT (<expression>)" does not carry out to the column
>definition?
>
>Because you cannot use DEFAULT without specifying a type name at the very
>same time (in standard SQL).
>
>11.4, SR 11) If <generation clause> is omitted, then either <data type>
>or <domain name> shall be specified.
>
>If it would be possible to skip the data type when a DEFAULT clause is
>present, I guess the same rules as for generated columns would apply
>(take the type, including all its properties) of the expressions result.
>
>>
>> The collation sequence applies to "comparison" operations of TEXT
>values, and not to anything else (it is ignored for all
>> other value types).  It is an attribute of a column (like column
>affinity) and not an attribute of a value.
>
>It is an attribute of character string **types**.
>
>Character string **columns** have collations because their type has one.
>
>Values have a collation because their type has one.
>
>>
>> In other words,
>>
>> x AS (y == 'yahoo' collate nocase)
>>
>> applies the affinity nocase to the evaluation of the == comparison
>operation.  The result of evaluation of the expression is a data value
>with a concrete type (integer) and no associated collation.  This value
>then has whatever column affinity is specified applied to it and inherits
>the collation of the containing column.
>
>But the case was bringing up is that the generated column doesn’t have a
>type declared.
>
>>
>> x AS (y collate nocase)
>>
>> is the same thing as
>>
>> x AS (y)
>>
>> since there is no "comparison" operation in the expression to which the
>collation can be applied (it is a useless attribute to the expression).
>
>Assuming Y is a character string and the collation of Y is not NOCASE,
>the type of both expressions is different (but not the value). As the
>type contains the collation, it is not necessarily useless.
>
>>  The result of the expression is a value of the concrete type of y with
>no associated collation.
>
>The result of the expression “(Y)” has the very same declared type as Y
>itself. If Y is a character string, the result of the expression has the
>same collation as Y itself.
>
>>  This value then has the whatever column affinity is specified applied
>to it and inherits the collation of the containing column.
>
>If the column has a defined data type or an explicit COLLATE outside the
>expression, these collations will be used. Otherwise: the collation of
>the declared type of the expression is used because the entire data type
>of the result of the expression is used.
>
>>
>> ie, "values" do not have a collation associated with them, column
>definitions and comparison operators on text have collation sequences
>associated with them.
>
>Values have a declared types, character string types have collations.
>
>Nobody cares about the collation of a character string until it takes
>part in a comparison.
>
>-markus
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Generated columns and COLLATE in the AS parens

Richard Hipp-3
In reply to this post by Markus Winand
On 1/31/20, Markus Winand <[hidden email]> wrote:
>
> - Expressions also have a declared type. More elaborate: the result value of
> an expression has a type that is statically determined (again  the “declared
> type”).

SQLite is a generalization of "Standard SQL" that allows expressions
that can have an arbitrary type.  Some (most?) expressions in SQLite
will always return the same type, but it is not hard to construct an
SQLite expression that returns different types based on its inputs.
You can also create application-defined functions including
table-valued functions that return different types based on their
inputs.

Internally, SQLite makes no attempt to track the datatype of an
expression.  Such tracking could be added, but doing so would just
make the library bigger and slower without serving any useful purpose.

>
> - If a generated column doesn’t have an explicit data type mentioned, the
> the data type of the result of the expression is used.

Having no explicit data type is a kind of explicit data type in
SQLite, since any SQLite column (generated or otherwise) can omit the
data type specification and the data type affinity becomes the
equivalent of "ANY".  From that point of view, SQLite is like PG in
that it requires an explicit data type on generated columns, though
SQLite extends PG by allowing that explicit data type to be the
unnamed "ANY" data type.  And since the data type is always implicit
in the column definition, there is never an occasion to inherit the
data type from the AS expression.

Furthermore, SQLite expressions do not have data types, so even if a
generated column were said to omit the data type, there would be no
way of capturing it from the AS expression.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users