Assume I create the following table:
CREATE TABLE Test (ID INTEGER PRIMARY KEY, Value TEXT); INSERT INTO Test (Value) VALUES('Alpha'); INSERT INTO Test (Value) VALUES('Beta'); INSERT INTO Test (Value) VALUES('Beta'); INSERT INTO Test (Value) VALUES('Alpha'); According to the documentation of group_concat the order is undefined, indeed: SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value ASC; SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value DESC; Both queries result in Alpha,Beta. Changing the queries to WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value ASC) SELECT group_concat(x) FROM Result; WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value DESC) SELECT group_concat(x) FROM Result; leads to the results Alpha,Beta, respectively Beta,Alpha. Is this a coincidence or is this guaranteed to work? Are there any other solutions / possibilities? Regards, Hardy _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
On Sunday, 1 March, 2020 14:58, mailing lists <[hidden email]> wrote: >Assume I create the following table: >CREATE TABLE Test (ID INTEGER PRIMARY KEY, Value TEXT); >INSERT INTO Test (Value) VALUES('Alpha'); >INSERT INTO Test (Value) VALUES('Beta'); >INSERT INTO Test (Value) VALUES('Beta'); >INSERT INTO Test (Value) VALUES('Alpha'); >According to the documentation of group_concat the order is undefined, >indeed: >SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value ASC; >SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value DESC; >Both queries result in Alpha,Beta. >Changing the queries to >WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value ASC) >SELECT group_concat(x) FROM Result; >WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value DESC) >SELECT group_concat(x) FROM Result; >leads to the results Alpha,Beta, respectively Beta,Alpha. >Is this a coincidence or is this guaranteed to work? >Are there any other solutions / possibilities? group_concat builds a "group" by concatenating the values sent to it in the order they are sent. If you do not know this order then for all intents and purposes the order is "undefined" because it is defined as the order in which the query planner decides to visit the rows forming the group. SQLite3 believes that all aggregate functions are commutative and that the order in which rows are fed into them is immaterial to the result and there (presently) is no way to specify that this is not the case. So in the rare case where the aggregate is not commutative and you depend on the presentation order, then you must specify it. The only built-in aggregate that is not commutative is the group_concat function. If you were to write another non-commutative aggregate function, lets say SHA1(...), that computed the SHA1 hash of the values fed into it, you would also have to control the presentation order or the result would be "undefined". select group_concat(value) from (select distinct value from test order by value desc); will do that. (rephrasing as a CTE makes no difference) This works because the query as phrased cannot be flattened since the outer query contains an aggregate and the inner query contains an order by. Moving the distinct does not alter the fact that the query cannot be flattened. select group_concat(distinct value) from (select value from test order by value desc); Whether the query planner will always not flatten a query where the outer query contains an aggregate and the inner query contains an order by is something on which I cannot comment other than to say that is does not flatten such a query up to now. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
Hi Keith,
thanks for the explanation. PS: I used a CTE because official examples (e.g. Mandelbrot) also used CTEs in combination with group_concat. Although the incorporation of group_concat was not the primary reason to use CTEs. PPS: Is it possible to rephrase the documentation for group_concat in the sense that it does not mention that the order is (always) arbitrary? Regards, Hardy > Am 2020-03-02 um 04:46 schrieb Keith Medcalf <[hidden email] <mailto:[hidden email]>>: > > > On Sunday, 1 March, 2020 14:58, mailing lists <[hidden email] <mailto:[hidden email]>> wrote: > >> Assume I create the following table: > >> CREATE TABLE Test (ID INTEGER PRIMARY KEY, Value TEXT); >> INSERT INTO Test (Value) VALUES('Alpha'); >> INSERT INTO Test (Value) VALUES('Beta'); >> INSERT INTO Test (Value) VALUES('Beta'); >> INSERT INTO Test (Value) VALUES('Alpha'); > >> According to the documentation of group_concat the order is undefined, >> indeed: > >> SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value ASC; >> SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value DESC; > >> Both queries result in Alpha,Beta. > >> Changing the queries to > >> WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value ASC) >> SELECT group_concat(x) FROM Result; >> WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value DESC) >> SELECT group_concat(x) FROM Result; > >> leads to the results Alpha,Beta, respectively Beta,Alpha. > >> Is this a coincidence or is this guaranteed to work? > >> Are there any other solutions / possibilities? > > group_concat builds a "group" by concatenating the values sent to it in the order they are sent. If you do not know this order then for all intents and purposes the order is "undefined" because it is defined as the order in which the query planner decides to visit the rows forming the group. SQLite3 believes that all aggregate functions are commutative and that the order in which rows are fed into them is immaterial to the result and there (presently) is no way to specify that this is not the case. > > So in the rare case where the aggregate is not commutative and you depend on the presentation order, then you must specify it. The only built-in aggregate that is not commutative is the group_concat function. If you were to write another non-commutative aggregate function, lets say SHA1(...), that computed the SHA1 hash of the values fed into it, you would also have to control the presentation order or the result would be "undefined". > > select group_concat(value) from (select distinct value from test order by value desc); will do that. (rephrasing as a CTE makes no difference) > > This works because the query as phrased cannot be flattened since the outer query contains an aggregate and the inner query contains an order by. > > Moving the distinct does not alter the fact that the query cannot be flattened. > > select group_concat(distinct value) from (select value from test order by value desc); > > Whether the query planner will always not flatten a query where the outer query contains an aggregate and the inner query contains an order by is something on which I cannot comment other than to say that is does not flatten such a query up to now. > > -- > The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. > > > > > _______________________________________________ > sqlite-users mailing list > [hidden email] <mailto:[hidden email]> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users <http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users> sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by mailing lists
On 1/03/2020 22:57, mailing lists wrote:
> Assume I create the following table: > > CREATE TABLE Test (ID INTEGER PRIMARY KEY, Value TEXT); > INSERT INTO Test (Value) VALUES('Alpha'); > INSERT INTO Test (Value) VALUES('Beta'); > INSERT INTO Test (Value) VALUES('Beta'); > INSERT INTO Test (Value) VALUES('Alpha'); > > According to the documentation of group_concat the order is undefined, indeed: > > SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value ASC; > SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value DESC; > > Both queries result in Alpha,Beta. > > Changing the queries to > > WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value ASC) SELECT group_concat(x) FROM Result; > WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value DESC) SELECT group_concat(x) FROM Result; > > leads to the results Alpha,Beta, respectively Beta,Alpha. > > Is this a coincidence or is this guaranteed to work? > > Are there any other solutions / possibilities? I also sometimes need a deterministic version of group_concat(). For such cases, I have written the short UDF below (in Python, but I guess the algorithm can easily be translated in other languages): JLH class group_concat2: # Rewriting of "group_concat" of SQLite to simulate that of MySQL. # Implements "distinct", "order by", "descending" and "separator". # Interprets "null" values "intuitively" # # Format: group_concat2(value,distinct,sortkey,direction,separator) # value: char or numeric SQL expression; if numeric, converted into char; # the next value to concatenate; # discarded if None (Python translation of SQL null). # distinct: numeric or char SQL expression; if char, converted into integer; # uniqueness indicator; # if 1, duplicates ignored; if 0, duplicates allowed. # sortkey: char or numeric SQL expression (no conversion); # the order key value for the current "value" instance; # If None or u'', the current "value" instance is used instead. # direction: numeric or char SQL expression; if char, converted into integer; # ordering direction (1 = asc; 2 = desc). # sep: char or numeric SQL expression; if numeric, converted into char; # value separator; # If None, = default u','. # Example: # select City,group_concat2(lower(CustID),1,Account,'2','; ') as Customers # from CUSTOMER group by City; def __init__(self): # Initialize self.number = 0 # number of values added self.valList = [] # List of values to concatenate self.orderby = [] # list of values of the order key self.distinct = 0 # whether "valList" values must be unique (0 = no; 1 = yes) self.direction = 1 # ordering direction (1 = asc; 2 = desc) self.sep = u',' # separator def step(self,value,distinct,sortkey,direction,sep): # Adding a new value to concatenate. # Each call of this method may specify different values of # (distinct,sortkey,direction,sep) parameters. # However, only those specified by the call of the first "value" instance # will be considered, the others being ignored. import numbers self.number += 1 # Initialize user values of "distinct", "direction" and "sep" if self.number == 1: if distinct in [1,u'1']: self.distinct = 1 if direction in [1,2,u'1',u'2']: self.direction = int(direction) if sep is not None: if isinstance(sep,numbers.Number): self.sep = unicode(sep) else: self.sep = sep if sortkey is None: sortkey = value if value is not None else 1 elif sortkey == u'': sortkey = value if value is not None else 1 if value is not None: if isinstance(value,numbers.Number): value = unicode(value) if self.distinct: if value not in self.valList: self.valList.append(value) self.orderby.append(sortkey) else: self.valList.append(value) self.orderby.append(sortkey) else: # value discarded pass def finalize(self): if self.direction == 1: self.valList = [y for x,y in sorted(zip(self.orderby,self.valList),reverse=False)] else: self.valList = [y for x,y in sorted(zip(self.orderby,self.valList),reverse=True)] return self.sep.join(self.valList) _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
In reply to this post by mailing lists
On Sun, Mar 1, 2020 at 10:58 PM mailing lists <[hidden email]> wrote:
> Are there any other solutions / possibilities? I thought someone more knowledgeable than I about Window Functions [1] would answer, but since nobody mentioned them so far, I'll do it, as I believe this is the "SQL native" way to achieve what you want (modulo DISTINCT perhaps). Notably (from the doc): Every aggregate window function can also work as a ordinary aggregate function, simply by omitting the OVER and FILTER clauses. Furthermore, all of the built-in aggregate functions of SQLite can be used as an aggregate window function by adding an appropriate OVER clause [2] has an example with group_concat() and OVER (ORDER BY ...). I assume that's what you need, someone better at Window Functions then me (not difficult!) can confirm or not that. --DD [1] https://www.sqlite.org/windowfunctions.html [2] https://www.sqlite.org/windowfunctions.html#aggwinfunc _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
You mean like: select group_concat(value) over (order by value rows between unbounded preceding and unbounded following) from (select distinct value from test) limit 1; and select group_concat(value) over (order by value desc rows between unbounded preceding and unbounded following) from (select distinct value from test) limit 1; which seems far more convoluted than just: select group_concat(value) from (select distinct value from test order by value); and select group_concat(value) from (select distinct value from test order by value desc); -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-----Original Message----- >From: sqlite-users <[hidden email]> On >Behalf Of Dominique Devienne >Sent: Monday, 2 March, 2020 08:02 >To: SQLite mailing list <[hidden email]> >Subject: Re: [sqlite] How to enforce a specific order of group_concat? > >On Sun, Mar 1, 2020 at 10:58 PM mailing lists <[hidden email]> >wrote: >> Are there any other solutions / possibilities? > >I thought someone more knowledgeable than I about Window Functions [1] >would answer, >but since nobody mentioned them so far, I'll do it, as I believe this >is the "SQL native" way >to achieve what you want (modulo DISTINCT perhaps). Notably (from the >doc): > >Every aggregate window function can also work as a ordinary aggregate >function, >simply by omitting the OVER and FILTER clauses. Furthermore, all of >the built-in aggregate >functions of SQLite can be used as an aggregate window function by >adding an appropriate OVER clause > >[2] has an example with group_concat() and OVER (ORDER BY ...). I >assume that's what you need, >someone better at Window Functions then me (not difficult!) can >confirm or not that. --DD > >[1] https://www.sqlite.org/windowfunctions.html >[2] https://www.sqlite.org/windowfunctions.html#aggwinfunc >_______________________________________________ >sqlite-users mailing list >[hidden email] >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
On Mon, Mar 2, 2020 at 5:09 PM Keith Medcalf <[hidden email]> wrote:
> select group_concat(value) from (select distinct value from test order by value); But is that guaranteed to be ordered correctly "forever" instead of by "happenstance" from current implementation details? My point was that the Window Function version is ordered "by design", and not an implementation detail (as I think the simpler version is). Your subquery returns rows in a given order too, but "who" says they'll be processed in that order? Tables are just "sets of rows" after all, and the relational model is about set-theory, no? order by in subquery therefore make little to no sense in nested SQL (in theory...). --DD _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
On Monday, 2 March, 2020 09:20, Dominique Devienne <[hidden email]> wrote: >On Mon, Mar 2, 2020 at 5:09 PM Keith Medcalf <[hidden email]> wrote: >> select group_concat(value) from (select distinct value from test order by value); >But is that guaranteed to be ordered correctly "forever" instead of by >"happenstance" from current implementation details? >My point was that the Window Function version is ordered "by design", >and not an implementation detail (as I think >the simpler version is). >Your subquery returns rows in a given order too, but "who" says >they'll be processed in that order? >Tables are just "sets of rows" after all, and the relational model is >about set-theory, no? order by in subquery therefore make little to no >sense in nested SQL (in theory...). --DD Well, in theory an order by in a nested select means that the result of the operation is an ordered projection and not merely a set of rows. For this particular case (a nested select with an order by and the outer query with an aggregate) the query will not be flattened (#16) https://sqlite.org/optoverview.html#flattening Consider that create table t (x,y); insert into t values (1,1),(2,2),(3,3),(4,1),(4,2),(4,3),(4,4); select x,y from (select x, y from t order by y) order by x; will do two order-by sorts to obtain the result even though the query could be (in this particular case) re-written as "select x, y from t order by x, y" because a query with an order-by in both the outer and nested query cannot be flattened (#11). This is why putting an "order by" in a view will usually preclude query flattening because the view is not merely producing a "set of rows" it is producing an "ordered projection" and the ordering must be significant else it would not be there. Of course in the case of the original statement: select group_concat(distinct value) from test order by value; The "order by value" might in fact be used to select the use of an index on value to scan, rather than the table test, if that results in less I/O that scanning the table. However, the Query Planner doesn't believe aggregate functions (including group_concat) results are non-commutative so determines that the "order by value" clause is superfluous (perhaps this is an error, since other functions such as sum(), total(), avg() can also be non-commutative in certain pathological cases and have varying results depending on the ordering of the data sent to them, especially avg() since it merely returns sum()/count() rather than a successive approximation to the mean, though successive approximation still has pathological cases for non-sorted input, they are fewer than the sum()/count() method). -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
On Mon, Mar 2, 2020 at 6:35 PM Keith Medcalf <[hidden email]> wrote:
> Well, in theory an order by in a nested select means that the result of the operation is an ordered projection and not merely a set of rows. > For this particular case (a nested select with an order by and the outer query with an aggregate) the query will not be flattened (#16) OK. I was more trying to find out whether such nested "ordered" projections were a standard-SQL thing or not. > select x,y from (select x, y from t order by y) order by x; > will do two order-by sorts to obtain the result even though the query could be (in this particular case) re-written as "select x, y from t order by x, y" That's assuming the sort is "stable" :) Stable-sort is typically slower than non-stable-sort, that's why the STL has std::sort and std::stable_sort. > This is why putting an "order by" in a view will usually preclude query flattening because the view is not merely producing a "set of rows" it is producing an "ordered projection" and the ordering must be significant else it would not be there. I would actually prefer these nested order-by to be ignored, and the "set of rows" being assumed, forcing the outer query to do its own ordering. The very notion of "ordered projection" for nested query sounds more like an implementation detail, to word-around the lack of window functions, than something "official" from the SQL standard or relational theory. I'm not disputing how SQLite implements things, for historical or practical reasons, I just want to understand whether such "ordered projection" is an official concept from SQL or just an SQLite thing. --DD _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
Free forum by Nabble | Edit this page |