How to enforce a specific order of group_concat?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

How to enforce a specific order of group_concat?

mailing lists
Assume I create the following table:

CREATE TABLE Test (ID INTEGER PRIMARY KEY, Value TEXT);
INSERT INTO Test (Value) VALUES('Alpha');
INSERT INTO Test (Value) VALUES('Beta');
INSERT INTO Test (Value) VALUES('Beta');
INSERT INTO Test (Value) VALUES('Alpha');

According to the documentation of group_concat the order is undefined, indeed:

SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value ASC;
SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value DESC;

Both queries result in Alpha,Beta.

Changing the queries to

WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value ASC) SELECT group_concat(x) FROM Result;
WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value DESC) SELECT group_concat(x) FROM Result;

leads to the results Alpha,Beta, respectively Beta,Alpha.

Is this a coincidence or is this guaranteed to work?

Are there any other solutions / possibilities?

Regards,
Hardy
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to enforce a specific order of group_concat?

Keith Medcalf

On Sunday, 1 March, 2020 14:58, mailing lists <[hidden email]> wrote:

>Assume I create the following table:

>CREATE TABLE Test (ID INTEGER PRIMARY KEY, Value TEXT);
>INSERT INTO Test (Value) VALUES('Alpha');
>INSERT INTO Test (Value) VALUES('Beta');
>INSERT INTO Test (Value) VALUES('Beta');
>INSERT INTO Test (Value) VALUES('Alpha');

>According to the documentation of group_concat the order is undefined,
>indeed:

>SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value ASC;
>SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value DESC;

>Both queries result in Alpha,Beta.

>Changing the queries to

>WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value ASC)
>SELECT group_concat(x) FROM Result;
>WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value DESC)
>SELECT group_concat(x) FROM Result;

>leads to the results Alpha,Beta, respectively Beta,Alpha.

>Is this a coincidence or is this guaranteed to work?

>Are there any other solutions / possibilities?

group_concat builds a "group" by concatenating the values sent to it in the order they are sent.  If you do not know this order then for all intents and purposes the order is "undefined" because it is defined as the order in which the query planner decides to visit the rows forming the group.  SQLite3 believes that all aggregate functions are commutative and that the order in which rows are fed into them is immaterial to the result and there (presently) is no way to specify that this is not the case.

So in the rare case where the aggregate is not commutative and you depend on the presentation order, then you must specify it.  The only built-in aggregate that is not commutative is the group_concat function.  If you were to write another non-commutative aggregate function, lets say SHA1(...), that computed the SHA1 hash of the values fed into it, you would also have to control the presentation order or the result would be "undefined".

select group_concat(value) from (select distinct value from test order by value desc); will do that.  (rephrasing as a CTE makes no difference)

This works because the query as phrased cannot be flattened since the outer query contains an aggregate and the inner query contains an order by.

Moving the distinct does not alter the fact that the query cannot be flattened.

select group_concat(distinct value) from (select value from test order by value desc);

Whether the query planner will always not flatten a query where the outer query contains an aggregate and the inner query contains an order by is something on which I cannot comment other than to say that is does not flatten such a query up to now.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to enforce a specific order of group_concat?

mailing lists
Hi Keith,

thanks for the explanation.

PS: I used a CTE because official examples (e.g. Mandelbrot) also used CTEs in combination with group_concat. Although the incorporation of group_concat was not the primary reason to use CTEs.
PPS: Is it possible to rephrase the documentation for group_concat in the sense that it does not mention that the order is (always) arbitrary?

Regards,
Hardy

> Am 2020-03-02 um 04:46 schrieb Keith Medcalf <[hidden email] <mailto:[hidden email]>>:
>
>
> On Sunday, 1 March, 2020 14:58, mailing lists <[hidden email] <mailto:[hidden email]>> wrote:
>
>> Assume I create the following table:
>
>> CREATE TABLE Test (ID INTEGER PRIMARY KEY, Value TEXT);
>> INSERT INTO Test (Value) VALUES('Alpha');
>> INSERT INTO Test (Value) VALUES('Beta');
>> INSERT INTO Test (Value) VALUES('Beta');
>> INSERT INTO Test (Value) VALUES('Alpha');
>
>> According to the documentation of group_concat the order is undefined,
>> indeed:
>
>> SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value ASC;
>> SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value DESC;
>
>> Both queries result in Alpha,Beta.
>
>> Changing the queries to
>
>> WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value ASC)
>> SELECT group_concat(x) FROM Result;
>> WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value DESC)
>> SELECT group_concat(x) FROM Result;
>
>> leads to the results Alpha,Beta, respectively Beta,Alpha.
>
>> Is this a coincidence or is this guaranteed to work?
>
>> Are there any other solutions / possibilities?
>
> group_concat builds a "group" by concatenating the values sent to it in the order they are sent.  If you do not know this order then for all intents and purposes the order is "undefined" because it is defined as the order in which the query planner decides to visit the rows forming the group.  SQLite3 believes that all aggregate functions are commutative and that the order in which rows are fed into them is immaterial to the result and there (presently) is no way to specify that this is not the case.
>
> So in the rare case where the aggregate is not commutative and you depend on the presentation order, then you must specify it.  The only built-in aggregate that is not commutative is the group_concat function.  If you were to write another non-commutative aggregate function, lets say SHA1(...), that computed the SHA1 hash of the values fed into it, you would also have to control the presentation order or the result would be "undefined".
>
> select group_concat(value) from (select distinct value from test order by value desc); will do that.  (rephrasing as a CTE makes no difference)
>
> This works because the query as phrased cannot be flattened since the outer query contains an aggregate and the inner query contains an order by.
>
> Moving the distinct does not alter the fact that the query cannot be flattened.
>
> select group_concat(distinct value) from (select value from test order by value desc);
>
> Whether the query planner will always not flatten a query where the outer query contains an aggregate and the inner query contains an order by is something on which I cannot comment other than to say that is does not flatten such a query up to now.
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
>
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email] <mailto:[hidden email]>
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users <http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to enforce a specific order of group_concat?

Jean-Luc Hainaut
In reply to this post by mailing lists
On 1/03/2020 22:57, mailing lists wrote:

> Assume I create the following table:
>
> CREATE TABLE Test (ID INTEGER PRIMARY KEY, Value TEXT);
> INSERT INTO Test (Value) VALUES('Alpha');
> INSERT INTO Test (Value) VALUES('Beta');
> INSERT INTO Test (Value) VALUES('Beta');
> INSERT INTO Test (Value) VALUES('Alpha');
>
> According to the documentation of group_concat the order is undefined, indeed:
>
> SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value ASC;
> SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value DESC;
>
> Both queries result in Alpha,Beta.
>
> Changing the queries to
>
> WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value ASC) SELECT group_concat(x) FROM Result;
> WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value DESC) SELECT group_concat(x) FROM Result;
>
> leads to the results Alpha,Beta, respectively Beta,Alpha.
>
> Is this a coincidence or is this guaranteed to work?
>
> Are there any other solutions / possibilities?

I also sometimes need a deterministic version of group_concat(). For
such cases, I have written the short UDF below (in Python, but I guess
the algorithm can easily be translated in other languages):

JLH

class group_concat2:
     # Rewriting of "group_concat" of SQLite to simulate that of MySQL.
     # Implements "distinct", "order by", "descending" and "separator".
     # Interprets "null" values "intuitively"
     #
     # Format: group_concat2(value,distinct,sortkey,direction,separator)
     #   value:     char or numeric SQL expression; if numeric,
converted into char;
     #              the next value to concatenate;
     #              discarded if None (Python translation of SQL null).
     #   distinct:  numeric or char SQL expression; if char, converted
into integer;
     #              uniqueness indicator;
     #              if 1, duplicates ignored; if 0, duplicates allowed.
     #   sortkey:   char or numeric SQL expression (no conversion);
     #              the order key value for the current "value" instance;
     #              If None or u'', the current "value" instance is used
instead.
     #   direction: numeric or char SQL expression; if char, converted
into integer;
     #              ordering direction (1 = asc; 2 = desc).
     #   sep:       char or numeric SQL expression; if numeric,
converted into char;
     #              value separator;
     #              If None, = default u','.
     # Example:
     #   select City,group_concat2(lower(CustID),1,Account,'2','; ') as
Customers
     #   from   CUSTOMER group by City;

     def __init__(self):
         # Initialize
         self.number  = 0    # number of values added
         self.valList = []   # List of values to concatenate
         self.orderby = []   # list of values of the order key
         self.distinct = 0   # whether "valList" values must be unique
(0 = no; 1 = yes)
         self.direction = 1  # ordering direction (1 = asc; 2 = desc)
         self.sep = u','     # separator

     def step(self,value,distinct,sortkey,direction,sep):
         # Adding a new value to concatenate.
         # Each call of this method may specify different values of
         #      (distinct,sortkey,direction,sep) parameters.
         # However, only those specified by the call of the first
"value" instance
         # will be considered, the others being ignored.

         import numbers

         self.number += 1

         # Initialize user values of "distinct", "direction" and "sep"
         if self.number == 1:
             if distinct in [1,u'1']:
                 self.distinct = 1
             if direction in [1,2,u'1',u'2']:
                 self.direction = int(direction)
             if sep is not None:
                 if isinstance(sep,numbers.Number):
                     self.sep = unicode(sep)
                 else:
                     self.sep = sep

         if sortkey is None:
             sortkey = value if value is not None else 1
         elif sortkey == u'':
             sortkey = value if value is not None else 1

         if value is not None:
             if isinstance(value,numbers.Number):
                 value = unicode(value)

             if self.distinct:
                 if value not in self.valList:
                     self.valList.append(value)
                     self.orderby.append(sortkey)
             else:
                 self.valList.append(value)
                 self.orderby.append(sortkey)
         else:
             # value discarded
             pass

     def finalize(self):
         if self.direction == 1:
             self.valList = [y for x,y in
sorted(zip(self.orderby,self.valList),reverse=False)]
         else:
             self.valList = [y for x,y in
sorted(zip(self.orderby,self.valList),reverse=True)]
         return self.sep.join(self.valList)








_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to enforce a specific order of group_concat?

Dominique Devienne
In reply to this post by mailing lists
On Sun, Mar 1, 2020 at 10:58 PM mailing lists <[hidden email]> wrote:
> Are there any other solutions / possibilities?

I thought someone more knowledgeable than I about Window Functions [1]
would answer,
but since nobody mentioned them so far, I'll do it, as I believe this
is the "SQL native" way
to achieve what you want (modulo DISTINCT perhaps). Notably (from the doc):

Every aggregate window function can also work as a ordinary aggregate function,
simply by omitting the OVER and FILTER clauses. Furthermore, all of
the built-in aggregate
functions of SQLite can be used as an aggregate window function by
adding an appropriate OVER clause

[2] has an example with group_concat() and OVER (ORDER BY ...). I
assume that's what you need,
someone better at Window Functions then me (not difficult!) can
confirm or not that. --DD

[1] https://www.sqlite.org/windowfunctions.html
[2] https://www.sqlite.org/windowfunctions.html#aggwinfunc
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to enforce a specific order of group_concat?

Keith Medcalf

You mean like:

select group_concat(value) over (order by value rows between unbounded preceding and unbounded following) from (select distinct value from test) limit 1;
and
select group_concat(value) over (order by value desc rows between unbounded preceding and unbounded following) from (select distinct value from test) limit 1;

which seems far more convoluted than just:

select group_concat(value) from (select distinct value from test order by value);
and
select group_concat(value) from (select distinct value from test order by value desc);


--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Dominique Devienne
>Sent: Monday, 2 March, 2020 08:02
>To: SQLite mailing list <[hidden email]>
>Subject: Re: [sqlite] How to enforce a specific order of group_concat?
>
>On Sun, Mar 1, 2020 at 10:58 PM mailing lists <[hidden email]>
>wrote:
>> Are there any other solutions / possibilities?
>
>I thought someone more knowledgeable than I about Window Functions [1]
>would answer,
>but since nobody mentioned them so far, I'll do it, as I believe this
>is the "SQL native" way
>to achieve what you want (modulo DISTINCT perhaps). Notably (from the
>doc):
>
>Every aggregate window function can also work as a ordinary aggregate
>function,
>simply by omitting the OVER and FILTER clauses. Furthermore, all of
>the built-in aggregate
>functions of SQLite can be used as an aggregate window function by
>adding an appropriate OVER clause
>
>[2] has an example with group_concat() and OVER (ORDER BY ...). I
>assume that's what you need,
>someone better at Window Functions then me (not difficult!) can
>confirm or not that. --DD
>
>[1] https://www.sqlite.org/windowfunctions.html
>[2] https://www.sqlite.org/windowfunctions.html#aggwinfunc
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to enforce a specific order of group_concat?

Dominique Devienne
On Mon, Mar 2, 2020 at 5:09 PM Keith Medcalf <[hidden email]> wrote:
> select group_concat(value) from (select distinct value from test order by value);

But is that guaranteed to be ordered correctly "forever" instead of by
"happenstance"
from current implementation details? My point was that the Window
Function version
is ordered "by design", and not an implementation detail (as I think
the simpler version is).

Your subquery returns rows in a given order too, but "who" says
they'll be processed in that order?
Tables are just "sets of rows" after all, and the relational model is
about set-theory, no? order by
in subquery therefore make little to no sense in nested SQL (in theory...). --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to enforce a specific order of group_concat?

Keith Medcalf

On Monday, 2 March, 2020 09:20, Dominique Devienne <[hidden email]> wrote:

>On Mon, Mar 2, 2020 at 5:09 PM Keith Medcalf <[hidden email]> wrote:

>> select group_concat(value) from (select distinct value from test order by value);

>But is that guaranteed to be ordered correctly "forever" instead of by
>"happenstance" from current implementation details?

>My point was that the Window Function version is ordered "by design",
>and not an implementation detail (as I think
>the simpler version is).

>Your subquery returns rows in a given order too, but "who" says
>they'll be processed in that order?

>Tables are just "sets of rows" after all, and the relational model is
>about set-theory, no? order by in subquery therefore make little to no
>sense in nested SQL (in theory...). --DD

Well, in theory an order by in a nested select means that the result of the operation is an ordered projection and not merely a set of rows.  For this particular case (a nested select with an order by and the outer query with an aggregate) the query will not be flattened (#16)

https://sqlite.org/optoverview.html#flattening

Consider that

create table t (x,y);
insert into t values (1,1),(2,2),(3,3),(4,1),(4,2),(4,3),(4,4);
select x,y from (select x, y from t order by y) order by x;

will do two order-by sorts to obtain the result even though the query could be (in this particular case) re-written as "select x, y from t order by x, y" because a query with an order-by in both the outer and nested query cannot be flattened (#11).

This is why putting an "order by" in a view will usually preclude query flattening because the view is not merely producing a "set of rows" it is producing an "ordered projection" and the ordering must be significant else it would not be there.

Of course in the case of the original statement:

select group_concat(distinct value) from test order by value;

The "order by value" might in fact be used to select the use of an index on value to scan, rather than the table test, if that results in less I/O that scanning the table.  However, the Query Planner doesn't believe aggregate functions (including group_concat) results are non-commutative so determines that the "order by value" clause is superfluous (perhaps this is an error, since other functions such as sum(), total(), avg() can also be non-commutative in certain pathological cases and have varying results depending on the ordering of the data sent to them, especially avg() since it merely returns sum()/count() rather than a successive approximation to the mean, though successive approximation still has pathological cases for non-sorted input, they are fewer than the sum()/count() method).

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: How to enforce a specific order of group_concat?

Dominique Devienne
On Mon, Mar 2, 2020 at 6:35 PM Keith Medcalf <[hidden email]> wrote:
> Well, in theory an order by in a nested select means that the result of the operation is an ordered projection and not merely a set of rows.
> For this particular case (a nested select with an order by and the outer query with an aggregate) the query will not be flattened (#16)

OK. I was more trying to find out whether such nested "ordered"
projections were a standard-SQL thing or not.

> select x,y from (select x, y from t order by y) order by x;
> will do two order-by sorts to obtain the result even though the query could be (in this particular case) re-written as "select x, y from t order by x, y"

That's assuming the sort is "stable" :)  Stable-sort is typically
slower than non-stable-sort, that's why the STL has std::sort and
std::stable_sort.

> This is why putting an "order by" in a view will usually preclude query flattening because the view is not merely producing a "set of rows" it is producing an "ordered projection" and the ordering must be significant else it would not be there.

I would actually prefer these nested order-by to be ignored, and the
"set of rows" being assumed, forcing the outer query to do its own
ordering.
The very notion of "ordered projection" for nested query sounds more
like an implementation detail, to word-around the lack of window
functions,
than something "official" from the SQL standard or relational theory.

I'm not disputing how SQLite implements things, for historical or
practical reasons, I just want to understand whether such "ordered
projection"
is an official concept from SQL or just an SQLite thing. --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users