Request: Combining skip-scan with 'max' optimization

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Request: Combining skip-scan with 'max' optimization

Jens Alfke-2
I'm following up on my "Optimizing `SELECT a, max(b) GROUP BY a`" thread from a few weeks ago, rephrasing it as a clearer enhancement request.

ACTUAL BEHAVIOR: A query of the form `SELECT a, max(b) GROUP BY a` runs slowly (O(n) with the number of table rows), even if there is an index on (a, b DESC). The query plan explanation says "SCAN TABLE ... USING INDEX". This is in SQLite 3.28.

EXPECTED BEHAVIOR: Query runs faster :-) My big-O fu is not strong enough to express it that way, but I'd imagine it to be proportional to the number of distinct `a` values, not the number of rows in the table.

DIAGNOSIS: According to Keith Medcalf, "it appears that the optimizer will not utilize a skip-scan *AND* apply the max optimization concurrently."

According to Keith, a workaround is to rewrite the query as
        select name,
      (
       select max(timestamp)
         from table
        where name=outer.name
      )
 from (
       select distinct name
         from table
      );

This is of course a lot more complex. And unfortunately in my case the query generator my program uses does not (yet) have the capability to generate nested SELECTs, so the optimization is unavailable to me until/unless we implement that.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Request: Combining skip-scan with 'max' optimization

Andy Bennett
Hi,

I hadn't seen this thread when I posted my recent thread on optimising
MAX aggregates but I suspect this could help my case as well.

At the moment I'm trying to limit the amount of data that the aggregate
query has to visit in order to keep latency low but this optimisation
would give me a bigger window within my latency target.

I have similarly structured indexes to Jens.


> I'm following up on my "Optimizing `SELECT a, max(b) GROUP BY a`"
> thread from a few weeks ago, rephrasing it as a clearer enhancement
> request.
>
> ACTUAL BEHAVIOR: A query of the form `SELECT a, max(b) GROUP BY a`
> runs slowly (O(n) with the number of table rows), even if there is an
> index on (a, b DESC). The query plan explanation says "SCAN TABLE ...
> USING INDEX". This is in SQLite 3.28.
>
> EXPECTED BEHAVIOR: Query runs faster :-) My big-O fu is not strong
> enough to express it that way, but I'd imagine it to be proportional
> to the number of distinct `a` values, not the number of rows in the
> table.
>
> DIAGNOSIS: According to Keith Medcalf, "it appears that the optimizer
> will not utilize a skip-scan *AND* apply the max optimization
> concurrently."
>
> According to Keith, a workaround is to rewrite the query as
> select name,
>       (
>        select max(timestamp)
>          from table
>         where name=outer.name
>       )
>  from (
>        select distinct name
>          from table
>       );
>
> This is of course a lot more complex. And unfortunately in my case the
> query generator my program uses does not (yet) have the capability to
> generate nested SELECTs, so the optimization is unavailable to me
> until/unless we implement that.



[hidden email]
http://www.ashurst.eu.org/
http://www.gonumber.com/andyjpb
0x7EBA75FF
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users