Ordering a GROUP BY, or other concatenating tricks?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Ordering a GROUP BY, or other concatenating tricks?

Doug Nebeker-2
Imagine a table that holds individual lines of text documents:

CREATE TABLE DocLines
(
        DocID INTEGER,
        LineIndex INTEGER,
        LineText TEXT
);

INSERT INTO DocLines (DocID, LineIndex, LineText) VALUES (1, 1, 'Mary had a little lamb');
INSERT INTO DocLines (DocID, LineIndex, LineText) VALUES (1, 2, 'It had white fleece');

//inserted in reverse order so insertion order doesn't happen to make the SELECT work

INSERT INTO DocLines (DocID, LineIndex, LineText) VALUES (2, 2, 'Humpty dumpty had a great fall');
INSERT INTO DocLines (DocID, LineIndex, LineText) VALUES (2, 1, 'Humpty dumpty sat on a wall');

How can I select a document and get the complete sorted text back in a single row (so I can do a JOIN on a different table with additional information)?  

The below happens to work fine for DocID 1 (because of insertion order), but fails for DocID 2.

SELECT group_concat(LineText, '\n')
FROM DocLines
WHERE DocID = 1
GROUP BY DocID;

Is there any way to order a GROUP BY, or some other way to concatenate text?


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Ordering a GROUP BY, or other concatenating tricks?

Simon Slavin-3


On 3 Oct 2017, at 11:13pm, Doug Nebeker <[hidden email]> wrote:

> How can I select a document and get the complete sorted text back in a single row (so I can do a JOIN on a different table with additional information)?  

There is a way which will probably work but the documentation adds a careful note that it will not always work.

<https://sqlite.org/lang_aggfunc.html#groupconcat>

So you would want something like

SELECT group_concat(LineText, '\n') FROM
        (SELECT LineText FROM DocLines
                WHERE DocID = 10
                ORDER BY LineIndex)

The problem is that the order of concatenation is arbitrary, in other words you shouldn’t rely on this working.

If you don’t want to use group_concat(), do it in your programming language.  Use

SELECT LineText FROM DocLines
                WHERE DocID = 10
                ORDER BY LineIndex

and concatenate the retrieved values in your programming language.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Ordering a GROUP BY, or other concatenating tricks?

Jean-Luc Hainaut
On 04/10/2017 02:16, Simon Slavin wrote:

The differences between SQLite and (a.o.) MySQL versions of
"group_concat" are a recurrent topic.
Since I often need to specify "distinct", "order by", "order direction"
and "separator", I have written a simple UDF class that simulates the
MySQL full version.
It is written in Python 2.7 through the standard SQLite3 interface but
it should be easy to translate it in C:

https://www.dropbox.com/s/ilpx8duppbus8u3/group_concat2.py?dl=0

Hoping it will help!

Jean-Luc Hainaut

> On 3 Oct 2017, at 11:13pm, Doug Nebeker <[hidden email]> wrote:
>
>> How can I select a document and get the complete sorted text back in a single row (so I can do a JOIN on a different table with additional information)?
> There is a way which will probably work but the documentation adds a careful note that it will not always work.
>
> <https://sqlite.org/lang_aggfunc.html#groupconcat>
>
> So you would want something like
>
> SELECT group_concat(LineText, '\n') FROM
> (SELECT LineText FROM DocLines
> WHERE DocID = 10
> ORDER BY LineIndex)
>
> The problem is that the order of concatenation is arbitrary, in other words you shouldn’t rely on this working.
>
> If you don’t want to use group_concat(), do it in your programming language.  Use
>
> SELECT LineText FROM DocLines
> WHERE DocID = 10
> ORDER BY LineIndex
>
> and concatenate the retrieved values in your programming language.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Ordering a GROUP BY, or other concatenating tricks?

Doug Nebeker-2
In reply to this post by Simon Slavin-3
Is it just a matter of using sqlite3_create_function to register a function that guarantees it will concatenate in the order rows are received?  Would that guarantee that your example works, or is order no longer guaranteed once they leave the inner select?

SELECT group_concat(LineText, '\n') FROM
        (SELECT LineText FROM DocLines
                WHERE DocID = 10
                ORDER BY LineIndex)

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Ordering a GROUP BY, or other concatenating tricks?

Simon Slavin-3


On 4 Oct 2017, at 4:06pm, Doug Nebeker <[hidden email]> wrote:

> Is it just a matter of using sqlite3_create_function to register a function that guarantees it will concatenate in the order rows are received?  Would that guarantee that your example works, or is order no longer guaranteed once they leave the inner select?
>
> SELECT group_concat(LineText, '\n') FROM
> (SELECT LineText FROM DocLines
> WHERE DocID = 10
> ORDER BY LineIndex)

I’ve never tried it but I can’t think why it wouldn’t work.  Yes, you could register a function written in your favourite shimmed language.  All this really does is move the "concat_in_order" function from your own code into SQLite.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Ordering a GROUP BY, or other concatenating tricks?

Richard Hipp-3
In reply to this post by Doug Nebeker-2
On 10/4/17, Doug Nebeker <[hidden email]> wrote:
> Is it just a matter of using sqlite3_create_function to register a function
> that guarantees it will concatenate in the order rows are received?  Would
> that guarantee that your example works, or is order no longer guaranteed
> once they leave the inner select?
>
> SELECT group_concat(LineText, '\n') FROM
> (SELECT LineText FROM DocLines
> WHERE DocID = 10
> ORDER BY LineIndex)

The group_concat() function has always concatenated rows in the order
they are received.  The problem is that the order they are received by
the function is not necessarily obvious from the input SQL, because
SQLite is prone doing some serious reorganizations of the input SQL in
its quest to come up with the fastest execution plan.

But, as it happens, we long ago added constraint 16 to the query
flattener (https://www.sqlite.org/draft/optoverview.html#flattening)
to prevent the flattener from running on queries like the one you show
above.  This restriction on the query flattener causes your example
query above to do what you want.

SQLite version 3.21.0 adds new restrictions on the query flattener
which allows the application to control whether expensive functions
(or subqueries) are run before or after sorting.  See
https://www.sqlite.org/draft/optoverview.html#deferred_work for
further information.  These new flattener restrictions, together with
the increased preference for using co-routines, are found in the
latest "Pre-release Snapshot".  Please try them out if you are able
to.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Ordering a GROUP BY, or other concatenating tricks?

sub sk79
On Wed, Oct 4, 2017 at 12:29 PM, Richard Hipp <[hidden email]> wrote:

>
> This restriction on the query flattener causes your example
> query above to do what you want.
>

If subquery-flattening needs to be disabled explicitly, is using "LIMIT -1
OFFSET 0 " the recommended way?



> SQLite version 3.21.0 adds new restrictions on the query flattener
> which allows the application to control whether expensive functions
> (or subqueries) are run before or after sorting.
>

Is 'expensive' going to be a flag in fourth parameter to create_function
like SQLITE_DETERMINISTIC is?

Thanks,
SK
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Ordering a GROUP BY, or other concatenating tricks?

Richard Hipp-3
On 10/4/17, sub sk79 <[hidden email]> wrote:

> On Wed, Oct 4, 2017 at 12:29 PM, Richard Hipp <[hidden email]> wrote:
>
>>
>> This restriction on the query flattener causes your example
>> query above to do what you want.
>>
>
> If subquery-flattening needs to be disabled explicitly, is using "LIMIT -1
> OFFSET 0 " the recommended way?
>

Just LIMIT -1 is sufficient.

>
>
>> SQLite version 3.21.0 adds new restrictions on the query flattener
>> which allows the application to control whether expensive functions
>> (or subqueries) are run before or after sorting.
>>
>
> Is 'expensive' going to be a flag in fourth parameter to create_function
> like SQLITE_DETERMINISTIC is?
>

No.  "expensive_function()" is just a place-holder.  It might be a
large expression, involving one or more subquery expressions, or it
might be something that uses a lot of memory rather than a lot of CPU
cycles.  That is part of the problem of why the query planner has such
a hard time determining whether or not to make this optimization
automatically - it is difficult to pin down what the application is
likely to think is "expensive".
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Ordering a GROUP BY, or other concatenating tricks?

petern
In reply to this post by Jean-Luc Hainaut
You could also write it even more clearly as:

WITH IndexedLines AS (SELECT LineText FROM DocLines WHERE DocID = 10 ORDER
BY LineIndex)
SELECT group_concat(LineText, char(10)) FROM IndexedLines;

That code will actually work.  As it is not C, SQLite will not recognize
the '\n' C escaped line feed in Simon's example. Special characters, like
char(10), must be generated by call out to an extension or values from a
table.

As was explained many times, despite malicious insistence on the
preservation of obtuse comments in the group_concat() docs, the output of
group_concat() IS directly controlled by the order of the supplied rowset
and one is free to specify that order directly.

Group_concat() itself is not idiosyncratically jumbling the output order.
If the optimizer's discretionary reordering is to be communicated in the
documents of built in functions then, for completeness, every aggregate
function should have the same absurd disclaimer of inexorable randomness.

Consider the aggregate round off error of the built in avg() function.  By
the same illogic isn't the roundoff error of the avg() output both
intractable and implacable because the optimizer might change the order of
the rows of an unordered query?  Yet, it is completely true that one is
free to specify a round off error minimizing order if one chooses to do so.

OK.  So everybody in favor of flawed documentation, raise their hand.  The
avg() function should also have the same obtuse remark suggesting, with a
wink and nod, that that using avg() comes with an inexorable and implacable
input ordering that one cannot control.





On Wed, Oct 4, 2017 at 2:12 AM, Jean-Luc Hainaut <[hidden email]
> wrote:

> On 04/10/2017 02:16, Simon Slavin wrote:
>
> The differences between SQLite and (a.o.) MySQL versions of "group_concat"
> are a recurrent topic.
> Since I often need to specify "distinct", "order by", "order direction"
> and "separator", I have written a simple UDF class that simulates the MySQL
> full version.
> It is written in Python 2.7 through the standard SQLite3 interface but it
> should be easy to translate it in C:
>
> https://www.dropbox.com/s/ilpx8duppbus8u3/group_concat2.py?dl=0
>
> Hoping it will help!
>
> Jean-Luc Hainaut
>
> On 3 Oct 2017, at 11:13pm, Doug Nebeker <[hidden email]> wrote:
>>
>> How can I select a document and get the complete sorted text back in a
>>> single row (so I can do a JOIN on a different table with additional
>>> information)?
>>>
>> There is a way which will probably work but the documentation adds a
>> careful note that it will not always work.
>>
>> <https://sqlite.org/lang_aggfunc.html#groupconcat>
>>
>> So you would want something like
>>
>> SELECT group_concat(LineText, '\n') FROM
>>         (SELECT LineText FROM DocLines
>>                 WHERE DocID = 10
>>                 ORDER BY LineIndex)
>>
>> The problem is that the order of concatenation is arbitrary, in other
>> words you shouldn’t rely on this working.
>>
>> If you don’t want to use group_concat(), do it in your programming
>> language.  Use
>>
>> SELECT LineText FROM DocLines
>>                 WHERE DocID = 10
>>                 ORDER BY LineIndex
>>
>> and concatenate the retrieved values in your programming language.
>>
>> Simon.
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users