Quantcast

Indexing WHERE with GROUP BY and ORDER BY

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Indexing WHERE with GROUP BY and ORDER BY

metanym
Given a table:

CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);

the query:

SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;

shows the following plan, without indexes:

0|0|0|SCAN TABLE x
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY

I can create an index to cover the WHERE clause:

CREATE INDEX b_index ON x (b);

which gives the plan:

0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY

or I can create an index to cover the GROUP BY clause:

DROP INDEX b_index;
CREATE INDEX c_index ON x (c);

which gives the plan:

0|0|0|SCAN TABLE x USING INDEX c_index
0|0|0|USE TEMP B-TREE FOR ORDER BY

but I don't seem to be able to create a combined index to cover both
the WHERE and GROUP BY clauses (let alone the ORDER BY clause).

Am I missing something, or is this just not possible?

If it's not possible, which is the more efficient of those indexes  --
or is there a third way, using an index for the ORDER BY, which would
be more efficient still?

Thanks,
Hamish
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

Jeffrey Mattox
I had nearly the same question a month ago (Subject: Index usefulness for GROUP BY).  In my case, the best index was on the WHERE clause because it eliminated the scan and returned only the few important rows for the other clauses.  However, the best result will depend on how many rows are eliminated compared to the number of rows remaining for the GROUP BY to deal with.

Jeff


> On Apr 2, 2017, at 6:27 PM, Hamish Allan <[hidden email]> wrote:
>
> Given a table:
>
> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
>
> the query:
>
> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
>
> shows the following plan, without indexes:
>
> 0|0|0|SCAN TABLE x
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>
> I can create an index to cover the WHERE clause:
>
> CREATE INDEX b_index ON x (b);
>
> which gives the plan:
>
> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>
> or I can create an index to cover the GROUP BY clause:
>
> DROP INDEX b_index;
> CREATE INDEX c_index ON x (c);
>
> which gives the plan:
>
> 0|0|0|SCAN TABLE x USING INDEX c_index
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>
> but I don't seem to be able to create a combined index to cover both
> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
>
> Am I missing something, or is this just not possible?
>
> If it's not possible, which is the more efficient of those indexes  --
> or is there a third way, using an index for the ORDER BY, which would
> be more efficient still?
>
> Thanks,
> Hamish
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

Keith Medcalf
In reply to this post by metanym

What is the purpose of the ORDER BY?  

The value of a used for the order by is from some random row in the grouping of c.  Are there relationships between a, b, c, d that you have not documented nor told us about?

In any case, your most efficient index is on (b, c).  

The order by is useless.

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Hamish Allan
> Sent: Sunday, 2 April, 2017 17:28
> To: [hidden email]
> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>
> Given a table:
>
> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
>
> the query:
>
> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
>
> shows the following plan, without indexes:
>
> 0|0|0|SCAN TABLE x
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>
> I can create an index to cover the WHERE clause:
>
> CREATE INDEX b_index ON x (b);
>
> which gives the plan:
>
> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>
> or I can create an index to cover the GROUP BY clause:
>
> DROP INDEX b_index;
> CREATE INDEX c_index ON x (c);
>
> which gives the plan:
>
> 0|0|0|SCAN TABLE x USING INDEX c_index
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>
> but I don't seem to be able to create a combined index to cover both
> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
>
> Am I missing something, or is this just not possible?
>
> If it's not possible, which is the more efficient of those indexes  --
> or is there a third way, using an index for the ORDER BY, which would
> be more efficient still?
>
> Thanks,
> Hamish
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

Simon Slavin-3
In reply to this post by metanym

On 3 Apr 2017, at 12:27am, Hamish Allan <[hidden email]> wrote:

> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;

Your problem comes down to this:

If you are GROUPing BY c, why do you want ORDER BY a ?

If you remove the "ORDER BY a" clause then the ideal index would be on (b, c).  But to deal with the ORDER BY we need to know whether "a" is a subgroup of "c".

My guess is that (b, c, a) would be the best index.  But it depends on whether a is a subgroup of b or c.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

metanym
In reply to this post by Keith Medcalf
Ah. My purpose is to determine "d for the most recent c with b=1",
with "most recent" being "largest a".

My query had been working for this, but I'd only been adding
monotonically increasing values for "a" in my tests. I just tried
doing otherwise and found that I had been relying on an implementation
detail.

Is there any way I can perform this collation at query time, or do I
need to do it programmatically later on?

Thanks,
Hamish





On 3 April 2017 at 00:41, Keith Medcalf <[hidden email]> wrote:

>
> What is the purpose of the ORDER BY?
>
> The value of a used for the order by is from some random row in the grouping of c.  Are there relationships between a, b, c, d that you have not documented nor told us about?
>
> In any case, your most efficient index is on (b, c).
>
> The order by is useless.
>
>> -----Original Message-----
>> From: sqlite-users [mailto:[hidden email]]
>> On Behalf Of Hamish Allan
>> Sent: Sunday, 2 April, 2017 17:28
>> To: [hidden email]
>> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>>
>> Given a table:
>>
>> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
>>
>> the query:
>>
>> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
>>
>> shows the following plan, without indexes:
>>
>> 0|0|0|SCAN TABLE x
>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>
>> I can create an index to cover the WHERE clause:
>>
>> CREATE INDEX b_index ON x (b);
>>
>> which gives the plan:
>>
>> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>
>> or I can create an index to cover the GROUP BY clause:
>>
>> DROP INDEX b_index;
>> CREATE INDEX c_index ON x (c);
>>
>> which gives the plan:
>>
>> 0|0|0|SCAN TABLE x USING INDEX c_index
>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>
>> but I don't seem to be able to create a combined index to cover both
>> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
>>
>> Am I missing something, or is this just not possible?
>>
>> If it's not possible, which is the more efficient of those indexes  --
>> or is there a third way, using an index for the ORDER BY, which would
>> be more efficient still?
>>
>> Thanks,
>> Hamish
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

R Smith


On 2017/04/03 10:51 AM, Hamish Allan wrote:

> Ah. My purpose is to determine "d for the most recent c with b=1",
> with "most recent" being "largest a".
>
> My query had been working for this, but I'd only been adding
> monotonically increasing values for "a" in my tests. I just tried
> doing otherwise and found that I had been relying on an implementation
> detail.
>
> Is there any way I can perform this collation at query time, or do I
> need to do it programmatically later on?

WITH CLT(d,c,a) AS (
   SELECT d,c,a FROM x WHERE b = 1 GROUP BY c,a
), CMX(c,mxa) AS (
   SELECT c, MAX(a) FROM CLT GROUP BY c
)
SELECT d
   FROM CLT
   JOIN CMX ON CMX.c = CLT.c AND CMX.mxa = CLT.a
;

Zero Order-by's



>
> Thanks,
> Hamish
>
>
>
>
>
> On 3 April 2017 at 00:41, Keith Medcalf <[hidden email]> wrote:
>> What is the purpose of the ORDER BY?
>>
>> The value of a used for the order by is from some random row in the grouping of c.  Are there relationships between a, b, c, d that you have not documented nor told us about?
>>
>> In any case, your most efficient index is on (b, c).
>>
>> The order by is useless.
>>
>>> -----Original Message-----
>>> From: sqlite-users [mailto:[hidden email]]
>>> On Behalf Of Hamish Allan
>>> Sent: Sunday, 2 April, 2017 17:28
>>> To: [hidden email]
>>> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>>>
>>> Given a table:
>>>
>>> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
>>>
>>> the query:
>>>
>>> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
>>>
>>> shows the following plan, without indexes:
>>>
>>> 0|0|0|SCAN TABLE x
>>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>
>>> I can create an index to cover the WHERE clause:
>>>
>>> CREATE INDEX b_index ON x (b);
>>>
>>> which gives the plan:
>>>
>>> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
>>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>
>>> or I can create an index to cover the GROUP BY clause:
>>>
>>> DROP INDEX b_index;
>>> CREATE INDEX c_index ON x (c);
>>>
>>> which gives the plan:
>>>
>>> 0|0|0|SCAN TABLE x USING INDEX c_index
>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>
>>> but I don't seem to be able to create a combined index to cover both
>>> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
>>>
>>> Am I missing something, or is this just not possible?
>>>
>>> If it's not possible, which is the more efficient of those indexes  --
>>> or is there a third way, using an index for the ORDER BY, which would
>>> be more efficient still?
>>>
>>> Thanks,
>>> Hamish
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

R Smith
Another option, if a is simple and a suitable index exists, is to simply
get the max in a correlated subquery, so this should work faster even:

SELECT c,d FROM x AS x1 WHERE b=1 AND a=(SELECT MAX(x2.a) FROM x AS x2
WHERE x2.b=x1.b AND x2.c=x1.c)

This needs no grouping because the sub-query ensures it, unless a can
have duplicate values for any one c value.



On 2017/04/03 11:09 AM, R Smith wrote:

>
>
> On 2017/04/03 10:51 AM, Hamish Allan wrote:
>> Ah. My purpose is to determine "d for the most recent c with b=1",
>> with "most recent" being "largest a".
>>
>> My query had been working for this, but I'd only been adding
>> monotonically increasing values for "a" in my tests. I just tried
>> doing otherwise and found that I had been relying on an implementation
>> detail.
>>
>> Is there any way I can perform this collation at query time, or do I
>> need to do it programmatically later on?
>
> WITH CLT(d,c,a) AS (
>   SELECT d,c,a FROM x WHERE b = 1 GROUP BY c,a
> ), CMX(c,mxa) AS (
>   SELECT c, MAX(a) FROM CLT GROUP BY c
> )
> SELECT d
>   FROM CLT
>   JOIN CMX ON CMX.c = CLT.c AND CMX.mxa = CLT.a
> ;
>
> Zero Order-by's
>
>
>
>>
>> Thanks,
>> Hamish
>>
>>
>>
>>
>>
>> On 3 April 2017 at 00:41, Keith Medcalf <[hidden email]> wrote:
>>> What is the purpose of the ORDER BY?
>>>
>>> The value of a used for the order by is from some random row in the
>>> grouping of c.  Are there relationships between a, b, c, d that you
>>> have not documented nor told us about?
>>>
>>> In any case, your most efficient index is on (b, c).
>>>
>>> The order by is useless.
>>>
>>>> -----Original Message-----
>>>> From: sqlite-users
>>>> [mailto:[hidden email]]
>>>> On Behalf Of Hamish Allan
>>>> Sent: Sunday, 2 April, 2017 17:28
>>>> To: [hidden email]
>>>> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>>>>
>>>> Given a table:
>>>>
>>>> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
>>>>
>>>> the query:
>>>>
>>>> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
>>>>
>>>> shows the following plan, without indexes:
>>>>
>>>> 0|0|0|SCAN TABLE x
>>>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>>
>>>> I can create an index to cover the WHERE clause:
>>>>
>>>> CREATE INDEX b_index ON x (b);
>>>>
>>>> which gives the plan:
>>>>
>>>> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
>>>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>>
>>>> or I can create an index to cover the GROUP BY clause:
>>>>
>>>> DROP INDEX b_index;
>>>> CREATE INDEX c_index ON x (c);
>>>>
>>>> which gives the plan:
>>>>
>>>> 0|0|0|SCAN TABLE x USING INDEX c_index
>>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>>
>>>> but I don't seem to be able to create a combined index to cover both
>>>> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
>>>>
>>>> Am I missing something, or is this just not possible?
>>>>
>>>> If it's not possible, which is the more efficient of those indexes  --
>>>> or is there a third way, using an index for the ORDER BY, which would
>>>> be more efficient still?
>>>>
>>>> Thanks,
>>>> Hamish
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [hidden email]
>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

Hick Gunter
In reply to this post by metanym
I am not sure I correctly understand what you want.

This is the value of d associated with a randomly chosen record from the group of records having the highest value of a tha also fulfills  b == 1;

SELECT MAX(a),d FROM x WHERE b=1;

If you want the value of d within each group of records sharing a value of c, try

SELECT MAX(a),c,d FROM x WHERE b=1 GROUP BY c ORDER BY 1 DESC, 2 ASC;

See also http://sqlite.org/lang_select.html

"Side note: Bare columns in an aggregate queries. The usual case is that all column names in an aggregate query are either arguments to aggregate functions or else appear in the GROUP BY clause. A result column which contains a column name that is not within an aggregate function and that does not appear in the GROUP BY clause (if one exists) is called a "bare" column. Example:

SELECT a, b, sum(c) FROM tab1 GROUP BY a;


In the query above, the "a" column is part of the GROUP BY clause and so each row of the output contains one of the distinct values for "a". The "c" column is contained within the sum() aggregate function and so that output column is the sum of all "c" values in rows that have the same value for "a". But what is the result of the bare column "b"? The answer is that the "b" result will be the value for "b" in one of the input rows that form the aggregate. The problem is that you usually do not know which input row is used to compute "b", and so in many cases the value for "b" is undefined.

Special processing occurs occurs when the aggregate function is either min() or max(). Example:

SELECT a, b, max(c) FROM tab1 GROUP BY a;

When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum. So in the query above, the value of the "b" column in the output will be the value of the "b" column in the input row that has the largest "c" value. There is still an ambiguity if two or more of the input rows have the same minimum or maximum value or if the query contains more than one min() and/or max() aggregate function. Only the built-in min() and max() functions work this way."

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Hamish Allan
Gesendet: Montag, 03. April 2017 10:51
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

Ah. My purpose is to determine "d for the most recent c with b=1", with "most recent" being "largest a".

My query had been working for this, but I'd only been adding monotonically increasing values for "a" in my tests. I just tried doing otherwise and found that I had been relying on an implementation detail.

Is there any way I can perform this collation at query time, or do I need to do it programmatically later on?

Thanks,
Hamish



___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

Keith Medcalf
In reply to this post by metanym
  select max(a), c, d
    from x
   where b == 1
group by c
order by 1 desc;

index on (b, c, a)

So, you are grouping by c where b=1, and within each grouping returning the grouping (c), the maximum value of (a) for that grouping, and (d) from one of perhaps multiple rows that contain the max(a) within that group.

Or perhaps:

select max(a), c, d
  from x
 where b == 1;

Which finds the max(a) where b==1, and returns that value and the value of c and d from one of the rows containing that maximum (1 row returned).

index on (b, a)

The former returns all groups of c with the top one being the one row returned by the latter.

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Hamish Allan
> Sent: Monday, 3 April, 2017 02:51
> To: SQLite mailing list
> Subject: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>
> Ah. My purpose is to determine "d for the most recent c with b=1",
> with "most recent" being "largest a".
>
> My query had been working for this, but I'd only been adding
> monotonically increasing values for "a" in my tests. I just tried
> doing otherwise and found that I had been relying on an implementation
> detail.
>
> Is there any way I can perform this collation at query time, or do I
> need to do it programmatically later on?
>
> Thanks,
> Hamish
>
>
>
>
>
> On 3 April 2017 at 00:41, Keith Medcalf <[hidden email]> wrote:
> >
> > What is the purpose of the ORDER BY?
> >
> > The value of a used for the order by is from some random row in the
> grouping of c.  Are there relationships between a, b, c, d that you have
> not documented nor told us about?
> >
> > In any case, your most efficient index is on (b, c).
> >
> > The order by is useless.
> >
> >> -----Original Message-----
> >> From: sqlite-users [mailto:sqlite-users-
> [hidden email]]
> >> On Behalf Of Hamish Allan
> >> Sent: Sunday, 2 April, 2017 17:28
> >> To: [hidden email]
> >> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
> >>
> >> Given a table:
> >>
> >> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
> >>
> >> the query:
> >>
> >> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
> >>
> >> shows the following plan, without indexes:
> >>
> >> 0|0|0|SCAN TABLE x
> >> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> >>
> >> I can create an index to cover the WHERE clause:
> >>
> >> CREATE INDEX b_index ON x (b);
> >>
> >> which gives the plan:
> >>
> >> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
> >> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> >>
> >> or I can create an index to cover the GROUP BY clause:
> >>
> >> DROP INDEX b_index;
> >> CREATE INDEX c_index ON x (c);
> >>
> >> which gives the plan:
> >>
> >> 0|0|0|SCAN TABLE x USING INDEX c_index
> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> >>
> >> but I don't seem to be able to create a combined index to cover both
> >> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
> >>
> >> Am I missing something, or is this just not possible?
> >>
> >> If it's not possible, which is the more efficient of those indexes  --
> >> or is there a third way, using an index for the ORDER BY, which would
> >> be more efficient still?
> >>
> >> Thanks,
> >> Hamish
> >> _______________________________________________
> >> sqlite-users mailing list
> >> [hidden email]
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> >
> >
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

metanym
In reply to this post by R Smith
This second one also has the advantage that I understand what's being
expressed! 🙂

This works perfectly, thank you! (and thank you to all others that replied).

I'm not sure I understand the indexing, though. The query plan without
indexes is:

0|0|0|SCAN TABLE x AS x1
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE x AS x2

If I subsequently CREATE INDEX b_c_index ON x (b, c):

0|0|0|SEARCH TABLE x AS x1 USING INDEX b_c_index (b=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE x AS x2 USING INDEX b_c_index (b=? AND c=?)

So far so good. But if I additionally CREATE INDEX b_a_index ON x (b, a):

0|0|0|SEARCH TABLE x AS x1 USING INDEX b_a_index (b=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE x AS x2 USING INDEX b_a_index (b=?)

I've not dropped the b_c_index. Why does the subquery no longer use it?

Thanks,
Hamish



On 3 April 2017 at 10:20, R Smith <[hidden email]> wrote:

> Another option, if a is simple and a suitable index exists, is to simply get
> the max in a correlated subquery, so this should work faster even:
>
> SELECT c,d FROM x AS x1 WHERE b=1 AND a=(SELECT MAX(x2.a) FROM x AS x2 WHERE
> x2.b=x1.b AND x2.c=x1.c)
>
> This needs no grouping because the sub-query ensures it, unless a can have
> duplicate values for any one c value.
>
>
>
>
> On 2017/04/03 11:09 AM, R Smith wrote:
>>
>>
>>
>> On 2017/04/03 10:51 AM, Hamish Allan wrote:
>>>
>>> Ah. My purpose is to determine "d for the most recent c with b=1",
>>> with "most recent" being "largest a".
>>>
>>> My query had been working for this, but I'd only been adding
>>> monotonically increasing values for "a" in my tests. I just tried
>>> doing otherwise and found that I had been relying on an implementation
>>> detail.
>>>
>>> Is there any way I can perform this collation at query time, or do I
>>> need to do it programmatically later on?
>>
>>
>> WITH CLT(d,c,a) AS (
>>   SELECT d,c,a FROM x WHERE b = 1 GROUP BY c,a
>> ), CMX(c,mxa) AS (
>>   SELECT c, MAX(a) FROM CLT GROUP BY c
>> )
>> SELECT d
>>   FROM CLT
>>   JOIN CMX ON CMX.c = CLT.c AND CMX.mxa = CLT.a
>> ;
>>
>> Zero Order-by's
>>
>>
>>
>>>
>>> Thanks,
>>> Hamish
>>>
>>>
>>>
>>>
>>>
>>> On 3 April 2017 at 00:41, Keith Medcalf <[hidden email]> wrote:
>>>>
>>>> What is the purpose of the ORDER BY?
>>>>
>>>> The value of a used for the order by is from some random row in the
>>>> grouping of c.  Are there relationships between a, b, c, d that you have not
>>>> documented nor told us about?
>>>>
>>>> In any case, your most efficient index is on (b, c).
>>>>
>>>> The order by is useless.
>>>>
>>>>> -----Original Message-----
>>>>> From: sqlite-users
>>>>> [mailto:[hidden email]]
>>>>> On Behalf Of Hamish Allan
>>>>> Sent: Sunday, 2 April, 2017 17:28
>>>>> To: [hidden email]
>>>>> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>>>>>
>>>>> Given a table:
>>>>>
>>>>> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
>>>>>
>>>>> the query:
>>>>>
>>>>> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
>>>>>
>>>>> shows the following plan, without indexes:
>>>>>
>>>>> 0|0|0|SCAN TABLE x
>>>>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>>>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>>>
>>>>> I can create an index to cover the WHERE clause:
>>>>>
>>>>> CREATE INDEX b_index ON x (b);
>>>>>
>>>>> which gives the plan:
>>>>>
>>>>> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
>>>>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>>>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>>>
>>>>> or I can create an index to cover the GROUP BY clause:
>>>>>
>>>>> DROP INDEX b_index;
>>>>> CREATE INDEX c_index ON x (c);
>>>>>
>>>>> which gives the plan:
>>>>>
>>>>> 0|0|0|SCAN TABLE x USING INDEX c_index
>>>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>>>
>>>>> but I don't seem to be able to create a combined index to cover both
>>>>> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
>>>>>
>>>>> Am I missing something, or is this just not possible?
>>>>>
>>>>> If it's not possible, which is the more efficient of those indexes  --
>>>>> or is there a third way, using an index for the ORDER BY, which would
>>>>> be more efficient still?
>>>>>
>>>>> Thanks,
>>>>> Hamish
>>>>> _______________________________________________
>>>>> sqlite-users mailing list
>>>>> [hidden email]
>>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [hidden email]
>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

metanym
In reply to this post by Hick Gunter
Thanks, Hick. I now understand that it's undefined which value is
selected from the bare column.

To (hopefully) clarify my purpose: I'm adding a row (with incrementing
"a") each time "d" is changed for a given combination of "b" and "c".
But the way I want to query is: for each "c" with a given "b", find me
the most recent "d" (in my example, the "b" I gave was 1).

R Smith's approach does the trick nicely!

Thanks,
Hamish






On 3 April 2017 at 10:48, Hick Gunter <[hidden email]> wrote:

> I am not sure I correctly understand what you want.
>
> This is the value of d associated with a randomly chosen record from the group of records having the highest value of a tha also fulfills  b == 1;
>
> SELECT MAX(a),d FROM x WHERE b=1;
>
> If you want the value of d within each group of records sharing a value of c, try
>
> SELECT MAX(a),c,d FROM x WHERE b=1 GROUP BY c ORDER BY 1 DESC, 2 ASC;
>
> See also http://sqlite.org/lang_select.html
>
> "Side note: Bare columns in an aggregate queries. The usual case is that all column names in an aggregate query are either arguments to aggregate functions or else appear in the GROUP BY clause. A result column which contains a column name that is not within an aggregate function and that does not appear in the GROUP BY clause (if one exists) is called a "bare" column. Example:
>
> SELECT a, b, sum(c) FROM tab1 GROUP BY a;
>
>
> In the query above, the "a" column is part of the GROUP BY clause and so each row of the output contains one of the distinct values for "a". The "c" column is contained within the sum() aggregate function and so that output column is the sum of all "c" values in rows that have the same value for "a". But what is the result of the bare column "b"? The answer is that the "b" result will be the value for "b" in one of the input rows that form the aggregate. The problem is that you usually do not know which input row is used to compute "b", and so in many cases the value for "b" is undefined.
>
> Special processing occurs occurs when the aggregate function is either min() or max(). Example:
>
> SELECT a, b, max(c) FROM tab1 GROUP BY a;
>
> When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum. So in the query above, the value of the "b" column in the output will be the value of the "b" column in the input row that has the largest "c" value. There is still an ambiguity if two or more of the input rows have the same minimum or maximum value or if the query contains more than one min() and/or max() aggregate function. Only the built-in min() and max() functions work this way."
>
> -----Ursprüngliche Nachricht-----
> Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Hamish Allan
> Gesendet: Montag, 03. April 2017 10:51
> An: SQLite mailing list <[hidden email]>
> Betreff: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>
> Ah. My purpose is to determine "d for the most recent c with b=1", with "most recent" being "largest a".
>
> My query had been working for this, but I'd only been adding monotonically increasing values for "a" in my tests. I just tried doing otherwise and found that I had been relying on an implementation detail.
>
> Is there any way I can perform this collation at query time, or do I need to do it programmatically later on?
>
> Thanks,
> Hamish
>
>
>
> ___________________________________________
>  Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1 80100 0
> E-Mail: [hidden email]
>
> This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

metanym
In reply to this post by Keith Medcalf
Thanks, Keith. I think I need the sub-select because I don't just want
the (d) for the max(a) for the grouping of (c), but for the
combination of the grouping (c) and the given value (b).


On 3 April 2017 at 14:32, Keith Medcalf <[hidden email]> wrote:

>   select max(a), c, d
>     from x
>    where b == 1
> group by c
> order by 1 desc;
>
> index on (b, c, a)
>
> So, you are grouping by c where b=1, and within each grouping returning the grouping (c), the maximum value of (a) for that grouping, and (d) from one of perhaps multiple rows that contain the max(a) within that group.
>
> Or perhaps:
>
> select max(a), c, d
>   from x
>  where b == 1;
>
> Which finds the max(a) where b==1, and returns that value and the value of c and d from one of the rows containing that maximum (1 row returned).
>
> index on (b, a)
>
> The former returns all groups of c with the top one being the one row returned by the latter.
>
>> -----Original Message-----
>> From: sqlite-users [mailto:[hidden email]]
>> On Behalf Of Hamish Allan
>> Sent: Monday, 3 April, 2017 02:51
>> To: SQLite mailing list
>> Subject: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>>
>> Ah. My purpose is to determine "d for the most recent c with b=1",
>> with "most recent" being "largest a".
>>
>> My query had been working for this, but I'd only been adding
>> monotonically increasing values for "a" in my tests. I just tried
>> doing otherwise and found that I had been relying on an implementation
>> detail.
>>
>> Is there any way I can perform this collation at query time, or do I
>> need to do it programmatically later on?
>>
>> Thanks,
>> Hamish
>>
>>
>>
>>
>>
>> On 3 April 2017 at 00:41, Keith Medcalf <[hidden email]> wrote:
>> >
>> > What is the purpose of the ORDER BY?
>> >
>> > The value of a used for the order by is from some random row in the
>> grouping of c.  Are there relationships between a, b, c, d that you have
>> not documented nor told us about?
>> >
>> > In any case, your most efficient index is on (b, c).
>> >
>> > The order by is useless.
>> >
>> >> -----Original Message-----
>> >> From: sqlite-users [mailto:sqlite-users-
>> [hidden email]]
>> >> On Behalf Of Hamish Allan
>> >> Sent: Sunday, 2 April, 2017 17:28
>> >> To: [hidden email]
>> >> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>> >>
>> >> Given a table:
>> >>
>> >> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
>> >>
>> >> the query:
>> >>
>> >> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
>> >>
>> >> shows the following plan, without indexes:
>> >>
>> >> 0|0|0|SCAN TABLE x
>> >> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>> >>
>> >> I can create an index to cover the WHERE clause:
>> >>
>> >> CREATE INDEX b_index ON x (b);
>> >>
>> >> which gives the plan:
>> >>
>> >> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
>> >> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>> >>
>> >> or I can create an index to cover the GROUP BY clause:
>> >>
>> >> DROP INDEX b_index;
>> >> CREATE INDEX c_index ON x (c);
>> >>
>> >> which gives the plan:
>> >>
>> >> 0|0|0|SCAN TABLE x USING INDEX c_index
>> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>> >>
>> >> but I don't seem to be able to create a combined index to cover both
>> >> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
>> >>
>> >> Am I missing something, or is this just not possible?
>> >>
>> >> If it's not possible, which is the more efficient of those indexes  --
>> >> or is there a third way, using an index for the ORDER BY, which would
>> >> be more efficient still?
>> >>
>> >> Thanks,
>> >> Hamish
>> >> _______________________________________________
>> >> sqlite-users mailing list
>> >> [hidden email]
>> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> >
>> >
>> > _______________________________________________
>> > sqlite-users mailing list
>> > [hidden email]
>> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Indexing WHERE with GROUP BY and ORDER BY

Hick Gunter
In reply to this post by metanym
The correlated subquery is fine if your SQL must be portable, but will return n rows if the a value is duplicated (n times) within a group. The bare field select only works in SQLite and is probably faster, with the caveat that the row the bare field values are taken from is "random". Since you seem to guarantee unique a within each b/c group, both will return exactly one row for every b/c group (with the c and d values "randomly selected" from a set of 1 rows, in case of the bare field select).

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Hamish Allan
Gesendet: Montag, 03. April 2017 16:24
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

Thanks, Hick. I now understand that it's undefined which value is selected from the bare column.

To (hopefully) clarify my purpose: I'm adding a row (with incrementing
"a") each time "d" is changed for a given combination of "b" and "c".
But the way I want to query is: for each "c" with a given "b", find me the most recent "d" (in my example, the "b" I gave was 1).

R Smith's approach does the trick nicely!

Thanks,
Hamish






On 3 April 2017 at 10:48, Hick Gunter <[hidden email]> wrote:

> I am not sure I correctly understand what you want.
>
> This is the value of d associated with a randomly chosen record from
> the group of records having the highest value of a tha also fulfills
> b == 1;
>
> SELECT MAX(a),d FROM x WHERE b=1;
>
> If you want the value of d within each group of records sharing a
> value of c, try
>
> SELECT MAX(a),c,d FROM x WHERE b=1 GROUP BY c ORDER BY 1 DESC, 2 ASC;
>
> See also http://sqlite.org/lang_select.html
>
> "Side note: Bare columns in an aggregate queries. The usual case is that all column names in an aggregate query are either arguments to aggregate functions or else appear in the GROUP BY clause. A result column which contains a column name that is not within an aggregate function and that does not appear in the GROUP BY clause (if one exists) is called a "bare" column. Example:
>
> SELECT a, b, sum(c) FROM tab1 GROUP BY a;
>
>
> In the query above, the "a" column is part of the GROUP BY clause and so each row of the output contains one of the distinct values for "a". The "c" column is contained within the sum() aggregate function and so that output column is the sum of all "c" values in rows that have the same value for "a". But what is the result of the bare column "b"? The answer is that the "b" result will be the value for "b" in one of the input rows that form the aggregate. The problem is that you usually do not know which input row is used to compute "b", and so in many cases the value for "b" is undefined.
>
> Special processing occurs occurs when the aggregate function is either min() or max(). Example:
>
> SELECT a, b, max(c) FROM tab1 GROUP BY a;
>
> When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum. So in the query above, the value of the "b" column in the output will be the value of the "b" column in the input row that has the largest "c" value. There is still an ambiguity if two or more of the input rows have the same minimum or maximum value or if the query contains more than one min() and/or max() aggregate function. Only the built-in min() and max() functions work this way."
>
> -----Ursprüngliche Nachricht-----
> Von: sqlite-users
> [mailto:[hidden email]] Im Auftrag von
> Hamish Allan
> Gesendet: Montag, 03. April 2017 10:51
> An: SQLite mailing list <[hidden email]>
> Betreff: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>
> Ah. My purpose is to determine "d for the most recent c with b=1", with "most recent" being "largest a".
>
> My query had been working for this, but I'd only been adding monotonically increasing values for "a" in my tests. I just tried doing otherwise and found that I had been relying on an implementation detail.
>
> Is there any way I can perform this collation at query time, or do I need to do it programmatically later on?
>
> Thanks,
> Hamish
>
>
>
> ___________________________________________
>  Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1 80100 0
> E-Mail: [hidden email]
>
> This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...