Optimising multiple group by clauses

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimising multiple group by clauses

Kevin Martin
Hi,

I am trying to find the best way to write a query that has two levels of group by where the outer group by columns are a subset of the inner group by columns. In my example below I want to do an aggregation grouping by per, prod, and mar, then I want aggregate the results of this aggregation, grouping by just prod, and per. From the results of explain query plan, I can see a B-Tree is not used if I only do the first group by - this is mentioned in the query optimisation page. However, a B-Tree is used for the second group by when both group bys are present, and I don't understand why, as I think the rows from the subquery come out in an order already suitable for the second group by?

My actual data is a bit more complex, but I am seeing a 10x-20x speed difference between the query with the single group by and the query with both. If the B-Tree is necessary, it would be good to be able to understand why, and if not, it would be great if there were some way to communicate this to the query planner.

This below was run on a freshly downloaded and compiled 3.28.

Thanks,
Kev

---

QUERY PLAN
`--SCAN TABLE data USING INDEX sqlite_autoindex_data_1
QUERY PLAN
|--CO-ROUTINE 1
|  `--SCAN TABLE data USING INDEX sqlite_autoindex_data_1
|--SCAN SUBQUERY 1
`--USE TEMP B-TREE FOR GROUP BY

------------

create table data(
  prod integer not null,
  per integer not null,
  mar integer not null,
  off integer not null,
  val real not null,
  primary key(prod, per, mar, off)
);

explain query plan select
  prod,
  per,
  mar,
  sum(val) as val
from
  data
group by
  prod,
  per,
  mar
;

explain query plan select
  prod,
  per,
  min(val)
from
  (select
    prod,
    per,
    mar,
    sum(val) as val
  from
    data
  group by
    prod,
    per,
    mar)
group by
  prod,
  per
;


   
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimising multiple group by clauses

E.Pasma
Hello,


> explain query plan select
>  prod,
>  per,
>  min(val)
> from
>  (select
>    prod,
>    per,
>    mar,
>    sum(val) as val
>  from
>    data
>  group by
>    prod,
>    per,
>    mar)
> group by
>  prod,
>  per
> ;
> QUERY PLAN
> |--CO-ROUTINE 1
> |  `--SCAN TABLE data USING INDEX sqlite_autoindex_data_1
> |--SCAN SUBQUERY 1
> `--USE TEMP B-TREE FOR GROUP BY


I constructed an equivalent query that does not involve a temporary B-tree. I don't expect this to be useful for the real case. But it proves that speed can be improved. With 1000*100*10*1 rows in the data table, my timing was reduced from 0.859 to 0.635.

E. Pasma

explain query plan select
  prod,
  per,
  (select min(val)
   from
    (select
      sum(val) as val
    from
      data
    where
      (prod,per)=(v1.prod,v1.per)
    group by
      prod,
      per,
      mar))
from
 (select
   prod,
   per
 from
   data
 group by
   prod,
   per) v1
;
QUERY PLAN
|--CO-ROUTINE 3
|  `--SCAN TABLE data USING COVERING INDEX sqlite_autoindex_data_1
|--SCAN SUBQUERY 3 AS v1
`--CORRELATED SCALAR SUBQUERY 2
   |--CO-ROUTINE 1
   |  `--SEARCH TABLE data USING INDEX sqlite_autoindex_data_1 (prod=? AND per=?)
   `--SEARCH SUBQUERY 1


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users