Do you really need an ORDER BY after a GROUP BY

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Do you really need an ORDER BY after a GROUP BY

Cecil Westerhof-5
I always use an ORDER BY after a GROUP BY. For example:
SELECT   used
,        COUNT(used)
FROM     usedProverbs
GROUP BY used
ORDER BY used

But when I leave the ORDER BY out in this case, the result is the same, but
it looks like it is a bit faster.

I probably keep using it, because in my case the performance is not
important and if this works for SQLite3, but not for other databases, it is
better to keep using it.

--
Cecil Westerhof
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Do you really need an ORDER BY after a GROUP BY

Clemens Ladisch
Cecil Westerhof wrote:
> I always use an ORDER BY after a GROUP BY.

Without an ORDER BY, there is no guarantee that the result has any
specific order (in SQLite and in any other SQL database).

> For example:
> SELECT   used
> ,        COUNT(used)
> FROM     usedProverbs
> GROUP BY used
> ORDER BY used
>
> But when I leave the ORDER BY out in this case, the result is the same, but
> it looks like it is a bit faster.

Are you sure?  How does the output of EXPLAIN QUERY PLAN look like?


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Do you really need an ORDER BY after a GROUP BY

Dave Wellman
In reply to this post by Cecil Westerhof-5
I always work on the premise that "if I want data returned to my application in a specific order then code an ORDER BY clause"

If the dbms 'happens to' return data in the desired order then that is obviously good, but if that behaviour is not documented (and I don't know about SQLite) then without the ORDER BY you have to recognise that this behaviour may change.

I typically work with a different dbms and the result set (almost gauranteed) will not be returned in 'group by' order.

Cheers,
Dave

Ward Analytics Ltd - information in motion
Tel: +44 (0) 118 9740191
Fax: +44 (0) 118 9740192
www: http://www.ward-analytics.com

Registered office address: The Oriel, Sydenham Road, Guildford, Surrey, United Kingdom, GU1 3SR
Registered company number: 3917021 Registered in England and Wales.

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Cecil Westerhof
Sent: 24 August 2017 10:13
To: SQLite mailing list
Subject: [sqlite] Do you really need an ORDER BY after a GROUP BY

I always use an ORDER BY after a GROUP BY. For example:
SELECT   used
,        COUNT(used)
FROM     usedProverbs
GROUP BY used
ORDER BY used

But when I leave the ORDER BY out in this case, the result is the same, but
it looks like it is a bit faster.

I probably keep using it, because in my case the performance is not
important and if this works for SQLite3, but not for other databases, it is
better to keep using it.

--
Cecil Westerhof
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Do you really need an ORDER BY after a GROUP BY

Cecil Westerhof-5
In reply to this post by Clemens Ladisch
2017-08-24 11:29 GMT+02:00 Clemens Ladisch <[hidden email]>:

> Cecil Westerhof wrote:
> > I always use an ORDER BY after a GROUP BY.
>
> Without an ORDER BY, there is no guarantee that the result has any
> specific order (in SQLite and in any other SQL database).
>

​That is what I thought, but you never know. So I just wanted to check.




> > For example:
> > SELECT   used
> > ,        COUNT(used)
> > FROM     usedProverbs
> > GROUP BY used
> > ORDER BY used
> >
> > But when I leave the ORDER BY out in this case, the result is the same,
> but
> > it looks like it is a bit faster.
>
> Are you sure?  How does the output of EXPLAIN QUERY PLAN look like?
>

​No, I am not sure. I executed both a few times in the database browser.
Both have widely varying used times. But at first glance it looked that
without ORDER BY is about 2 times faster. But it is only at first glance,
so certainly not something concrete.​

​The explains. I do not really see a difference. There is only an offset
difference. I also do not see a sort. (But this is the first time I look at
this kind of code.)​


​EXPLAIN SELECT used, COUNT(used) FROM usedProverbs GROUP BY used ORDER BY
used;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     39    0                    00  Start at 39
1     Noop           2     4     0                    00
2     Noop           3     1     0                    00
3     Integer        0     4     0                    00  r[4]=0; clear
abort flag
4     Integer        0     3     0                    00  r[3]=0; indicate
accumulator empty
5     Null           0     7     7                    00  r[7..7]=NULL
6     Gosub          6     36    0                    00
7     OpenRead       1     290   0     3              00  root=290 iDb=0;
proverbs
8     OpenRead       4     6     0     k(2,,)         00  root=6 iDb=0;
proverbs_used_idx
9     Rewind         4     26    9     0              00
10      Seek           4     0     1                    00  Move 1 to
4.rowid
11      Column         4     0     9     notUsed        00
r[9]=proverbs.used
12      Cast           9     68    0                    00  affinity(r[9])
13      Eq             10    25    9     (BINARY)       54  if r[9]==r[10]
goto 25
14      Column         4     0     8     notUsed        00
r[8]=proverbs.used
15      Compare        7     8     1     k(1,B)         00  r[7] <-> r[8]
16      Jump           17    21    17                   00
17      Move           8     7     1                    00  r[7]=r[8]
18      Gosub          5     30    0                    00  output one row
19      IfPos          4     38    0                    00  if r[4]>0 then
r[4]-=0, goto 38; check abort flag
20      Gosub          6     36    0                    00  reset
accumulator
21      Column         4     0     9     notUsed        00
r[9]=proverbs.used
22      AggStep0       0     9     2     count(1)       01  accum=r[2]
step(r[9])
23      Column         4     0     1     notUsed        00
r[1]=proverbs.used
24      Integer        1     3     0                    00  r[3]=1;
indicate data in accumulator
25    Next           4     10    0                    01
26    Gosub          5     30    0                    00  output final row
27    Goto           0     38    0                    00
28    Integer        1     4     0                    00  r[4]=1; set abort
flag
29    Return         5     0     0                    00
30    IfPos          3     32    0                    00  if r[3]>0 then
r[3]-=0, goto 32; Groupby result generator entry point
31    Return         5     0     0                    00
32    AggFinal       2     1     0     count(1)       00  accum=r[2] N=1
33    Copy           1     11    1                    00  r[11..12]=r[1..2]
34    ResultRow      11    2     0                    00  output=r[11..12]
35    Return         5     0     0                    00  end groupby
result generator
36    Null           0     1     2                    00  r[1..2]=NULL
37    Return         6     0     0                    00
38    Halt           0     0     0                    00
39    Transaction    0     0     65    0              01  usesStmtJournal=0
40    Integer        0     10    0                    00  r[10]=0
41    Goto           0     1     0                    00

EXPLAIN SELECT used, COUNT(used) FROM usedProverbs GROUP BY used;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     38    0                    00  Start at 38
1     Noop           2     1     0                    00
2     Integer        0     4     0                    00  r[4]=0; clear
abort flag
3     Integer        0     3     0                    00  r[3]=0; indicate
accumulator empty
4     Null           0     7     7                    00  r[7..7]=NULL
5     Gosub          6     35    0                    00
6     OpenRead       1     290   0     3              00  root=290 iDb=0;
proverbs
7     OpenRead       3     6     0     k(2,,)         00  root=6 iDb=0;
proverbs_used_idx
8     Rewind         3     25    9     0              00
9       Seek           3     0     1                    00  Move 1 to
3.rowid
10      Column         3     0     9     notUsed        00
r[9]=proverbs.used
11      Cast           9     68    0                    00  affinity(r[9])
12      Eq             10    24    9     (BINARY)       54  if r[9]==r[10]
goto 24
13      Column         3     0     8     notUsed        00
r[8]=proverbs.used
14      Compare        7     8     1     k(1,B)         00  r[7] <-> r[8]
15      Jump           16    20    16                   00
16      Move           8     7     1                    00  r[7]=r[8]
17      Gosub          5     29    0                    00  output one row
18      IfPos          4     37    0                    00  if r[4]>0 then
r[4]-=0, goto 37; check abort flag
19      Gosub          6     35    0                    00  reset
accumulator
20      Column         3     0     9     notUsed        00
r[9]=proverbs.used
21      AggStep0       0     9     2     count(1)       01  accum=r[2]
step(r[9])
22      Column         3     0     1     notUsed        00
r[1]=proverbs.used
23      Integer        1     3     0                    00  r[3]=1;
indicate data in accumulator
24    Next           3     9     0                    01
25    Gosub          5     29    0                    00  output final row
26    Goto           0     37    0                    00
27    Integer        1     4     0                    00  r[4]=1; set abort
flag
28    Return         5     0     0                    00
29    IfPos          3     31    0                    00  if r[3]>0 then
r[3]-=0, goto 31; Groupby result generator entry point
30    Return         5     0     0                    00
31    AggFinal       2     1     0     count(1)       00  accum=r[2] N=1
32    Copy           1     11    1                    00  r[11..12]=r[1..2]
33    ResultRow      11    2     0                    00  output=r[11..12]
34    Return         5     0     0                    00  end groupby
result generator
35    Null           0     1     2                    00  r[1..2]=NULL
36    Return         6     0     0                    00
37    Halt           0     0     0                    00
38    Transaction    0     0     65    0              01  usesStmtJournal=0
39    Integer        0     10    0                    00  r[10]=0
40    Goto           0     1     0                    00

--
Cecil Westerhof
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Do you really need an ORDER BY after a GROUP BY

Clemens Ladisch
Cecil Westerhof wrote:

> 2017-08-24 11:29 GMT+02:00 Clemens Ladisch <[hidden email]>:
>> Cecil Westerhof wrote:
>>> But when I leave the ORDER BY out in this case, the result is the same, but
>>> it looks like it is a bit faster.
>>
>> Are you sure?  How does the output of EXPLAIN QUERY PLAN look like?
>
> ​No, I am not sure. I executed both a few times in the database browser.
> Both have widely varying used times. But at first glance it looked that
> without ORDER BY is about 2 times faster. But it is only at first glance,
> so certainly not something concrete.​
>
> ​The explains.

But I asked for EXPLAIN QUERY PLAN instead.  :)

> I do not really see a difference.

The only difference is a Noop.

> I also do not see a sort.

Because it reads everything from an index that already is in the correct
order.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Do you really need an ORDER BY after a GROUP BY

J Decker
I like this feature :)

https://sqlite.org/pragma.html#pragma_reverse_unordered_selects

On Thu, Aug 24, 2017 at 5:38 AM, Clemens Ladisch <[hidden email]> wrote:

> Cecil Westerhof wrote:
> > 2017-08-24 11:29 GMT+02:00 Clemens Ladisch <[hidden email]>:
> >> Cecil Westerhof wrote:
> >>> But when I leave the ORDER BY out in this case, the result is the
> same, but
> >>> it looks like it is a bit faster.
> >>
> >> Are you sure?  How does the output of EXPLAIN QUERY PLAN look like?
> >
> > ​No, I am not sure. I executed both a few times in the database browser.
> > Both have widely varying used times. But at first glance it looked that
> > without ORDER BY is about 2 times faster. But it is only at first glance,
> > so certainly not something concrete.​
> >
> > ​The explains.
>
> But I asked for EXPLAIN QUERY PLAN instead.  :)
>
> > I do not really see a difference.
>
> The only difference is a Noop.
>
> > I also do not see a sort.
>
> Because it reads everything from an index that already is in the correct
> order.
>
>
> Regards,
> Clemens
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Do you really need an ORDER BY after a GROUP BY

Cecil Westerhof-5
In reply to this post by Clemens Ladisch
2017-08-24 14:38 GMT+02:00 Clemens Ladisch <[hidden email]>:

> Cecil Westerhof wrote:
> > 2017-08-24 11:29 GMT+02:00 Clemens Ladisch <[hidden email]>:
> >> Cecil Westerhof wrote:
> >>> But when I leave the ORDER BY out in this case, the result is the
> same, but
> >>> it looks like it is a bit faster.
> >>
> >> Are you sure?  How does the output of EXPLAIN QUERY PLAN look like?
> >
> > ​No, I am not sure. I executed both a few times in the database browser.
> > Both have widely varying used times. But at first glance it looked that
> > without ORDER BY is about 2 times faster. But it is only at first glance,
> > so certainly not something concrete.​
> >
> > ​The explains.
>
> But I asked for EXPLAIN QUERY PLAN instead.  :)
>

​Oops, mea culpa.

EXPLAIN QUERY PLAN SELECT used, COUNT(used) FROM usedProverbs GROUP BY used
ORDER BY used;
0|0|0|SCAN TABLE proverbs USING INDEX proverbs_used_idx

EXPLAIN QUERY PLAN SELECT used, COUNT(used) FROM usedProverbs GROUP BY used;
0|0|0|SCAN TABLE proverbs USING INDEX proverbs_used_idx




> > I do not really see a difference.
>
> The only difference is a Noop.
>

​Not important I think, but I like to know ‘everything’: why is there an
extra Noop.




> > I also do not see a sort.
>
> Because it reads everything from an index that already is in the correct
> order.
>

​Yes, and the EXPLAIN QUERY PLAN shows that.


But that leads to my next question:
With a scan the whole table is used. In the first situation I can
understand that the index is used, because that saves the sort, which can
be significant. But why is it used in the second case?
To be clear: I do not want to be smart, I want to understand.

--
Cecil Westerhof
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Do you really need an ORDER BY after a GROUP BY

R Smith

On 2017/08/24 4:48 PM, Cecil Westerhof wrote:

>
> ​Yes, and the EXPLAIN QUERY PLAN shows that.
>
>
> But that leads to my next question:
> With a scan the whole table is used. In the first situation I can
> understand that the index is used, because that saves the sort, which can
> be significant. But why is it used in the second case?
> To be clear: I do not want to be smart, I want to understand.
>

That's because the GROUP BY functions by grouping things together, and
to look-up what group the next item should fall in with, the Query
planner needs to look up that group's index in the grouped list, and for
THAT it needs an Index.  It was actually added as an optimization
recently (I think) that if your order-by happens to use the same column
structure as the group-by, then that same Index that was conjured up for
grouping is also used for the ordering, so it saves a step.

If your order-by differs however, it will need to conjure up a whole new
sorting-index mechanism just for that.

Hope that clears it up,
Cheers!
Ryan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Do you really need an ORDER BY after a GROUP BY

Clemens Ladisch
In reply to this post by Cecil Westerhof-5
Cecil Westerhof wrote:
> EXPLAIN QUERY PLAN SELECT used, COUNT(used) FROM usedProverbs GROUP BY used ORDER BY used;
> 0|0|0|SCAN TABLE proverbs USING INDEX proverbs_used_idx
>
> EXPLAIN QUERY PLAN SELECT used, COUNT(used) FROM usedProverbs GROUP BY used;
> 0|0|0|SCAN TABLE proverbs USING INDEX proverbs_used_idx

> why is there an extra Noop.

When the query planner sees the ORDER BY, it prepares a temporary index
to do the sorting.  When it can later prove that the sorting is not
actually needed, it deactivates that instruction by overwriting it with
the Noop.

> With a scan the whole table is used. In the first situation I can
> understand that the index is used, because that saves the sort, which can
> be significant. But why is it used in the second case?

The GROUP BY must compute a sum for each distinct "used" value.  When
it reads those values in order, it can compute a single sum at a time
and does not need to keep temporary results.  So it prefers to read from
an index, too.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users