Query Planner GROUP BY and HAVING clauses optimization ?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Query Planner GROUP BY and HAVING clauses optimization ?

Jean-Baptiste Gardette
Hi,
Consider the following exemple :

CREATE TABLE t1 (
a TEXT PRIMARY KEY,
b INTEGER);

SELECT *
FROM t1
GROUP BY a
HAVING b > 1;

Will the GROUP BY clause be supressed and HAVING clause be rewritten in
WHERE clause by the optimizer ?

Jean-Baptiste
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query Planner GROUP BY and HAVING clauses optimization ?

Dominique Devienne
On Tue, Jan 14, 2020 at 2:57 PM Jean-Baptiste Gardette
<[hidden email]> wrote:
> SELECT * FROM t1 GROUP BY a HAVING b > 1;
>
> Will the GROUP BY clause be supressed and HAVING clause be rewritten in WHERE clause by the optimizer ?

My question would be why you wouldn't write it as a WHERE clause in
the first place :)
Sorry, OT, I know. --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query Planner GROUP BY and HAVING clauses optimization ?

Keith Medcalf
In reply to this post by Jean-Baptiste Gardette

On Tuesday, 14 January, 2020 06:58, Jean-Baptiste Gardette <[hidden email]> wrote:

>Consider the following exemple :

>CREATE TABLE t1 (
>a TEXT PRIMARY KEY,
>b INTEGER);

>SELECT *
>FROM t1
>GROUP BY a
>HAVING b > 1;

>Will the GROUP BY clause be supressed and HAVING clause be rewritten in
>WHERE clause by the optimizer ?

No.  The VDBE code will be generated as you have specified.  The "group by a" will be used to select the index to be used to access the table data, and at the end of the processing of each group of records, the "having b > 1" will be applied to the group to determine whether the group is output.

The PRIMARY KEY constraint on a does not make "a" unique since a is not constrained not null, so there may be multiple records in the same group.

Even if "a" is constrained not null the query is still processed as a group by/having even though each group can only consist of one record even though (and only in that case) the query equivalent to "select * from t1 where b > 1 order by a".

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query Planner GROUP BY and HAVING clauses optimization ?

Jean-Baptiste Gardette
In reply to this post by Jean-Baptiste Gardette
Thank you Dominic and Keith for your replies

The reason i asked this is that i have a query in wich one condition
filtering the recordset involves
an UDF and this UDF needs to be processed after all table filters have
been applied

Illustration :

additionnal table :

CREATE TABLE t2 (
a TEXT PRIMARY KEY,
b INTEGER,
c REAL);

UDF :
proc CallUDF {key} {
    if{[dict exists $myDict $key]} {
       dict get $myDict $key  /* returns a value stored in myDict */
    } else {error}
}
db function CallUDF -deterministic CallUDF

If i do :

SELECT *
FROM t1 CROSS JOIN t2
WHERE t1.b > 10
   AND t2.a = t1.a and t2.c < 0
   AND CallUDF(t2.a) <> 0

Even though "CROSS JOIN" garanties that t2 is traversed after t1 has
been filtered on "t1.b > 10",
it seems there is no garanty when t2 is traversed, that the filter
CallUDF(t2.a) <> 0 is evaluated *after*
filters "t2.b = t1.b and t2.c < 0".

In this case the sole solution we found is :

SELECT *
FROM t1, t2
WHERE t1.b > 10
   AND t2.a = t1.a and t2.c < 0
GROUP BY t2.a
HAVING CallUDF(t2.a) <> 0

Thank again
Jean-Baptiste
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query Planner GROUP BY and HAVING clauses optimization ?

Simon Slavin-3
On 14 Jan 2020, at 4:14pm, Jean-Baptiste Gardette <[hidden email]> wrote:

> The reason i asked this is that i have a query in wich one condition filtering the recordset involves
> an UDF and this UDF needs to be processed after all table filters have been applied

You cannot guarantee this.  And even if you find a solution that works now, the optimizer in a future version of SQLite might work differently.

Would it be possible to phrase your SELECT as a SELECT with a sub-SELECT ?  Have the sub-SELECT figure out which rows you want in which order, then use a SELECT to apply your UDF to them ?  It is guaranteed that the sub-SELECT is processed before the SELECT.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query Planner GROUP BY and HAVING clauses optimization ?

Keith Medcalf

On Tuesday, 14 January, 2020 09:23, Simon Slavin <[hidden email]> wrote:

>Would it be possible to phrase your SELECT as a SELECT with a sub-SELECT
>?  Have the sub-SELECT figure out which rows you want in which order,
>then use a SELECT to apply your UDF to them ?  It is guaranteed that the
>sub-SELECT is processed before the SELECT.

you mean if you do something like this:

select *
  from (
         select *
           from ...
          where ...
       )
 where ...

?

If the query flattener is operating the optimizer will push the outer where condition into the inner query after which there is no guarantee that the conditions in the outer query will be processed *after* the conditions in the inner query, especially not if the pushed terms are estimated to reduce the number of candidates "quicker" by executing it sooner.

I seem to recall something about "expensive" conditions that will be forced to be run on only as few surviving candidate rows as possible, but my recollection is vague (they say the memory is the second thing to go -- strange I can't remember the first).

Anyway, Richard may be able to help here.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query Planner GROUP BY and HAVING clauses optimization ?

Richard Hipp-3
On 1/14/20, Keith Medcalf <[hidden email]> wrote:
>
> I seem to recall something about "expensive" conditions that will be forced
> to be run on only as few surviving candidate rows as possible, but my
> recollection is vague (they say the memory is the second thing to go --
> strange I can't remember the first).
>
> Anyway, Richard may be able to help here.

Maybe you are thinking of SQLITE_ENABLE_SORTER_REFERENCES.
https://www.sqlite.org/compile.html#enable_sorter_references


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query Planner GROUP BY and HAVING clauses optimization ?

Jean-Baptiste Gardette
In reply to this post by Jean-Baptiste Gardette
Just to be sure, is it unsafe to write a non agregate SELECT with GROUP
BY and HAVING clauses (without sub-SELECT)
for the sole prupose explained before (even if the approache is
discutable) ?

I understand 2 different answers here :

- "No, this kind of query can't be rewritten by the optimizer for the
technical reasons (VDBE, index etc)"

- "Yes it is unsafe, a future version of SQLite may optimize differently
this kind of query"

I can't see something related to my problem in the doc
https://sqlite.org/optoverview.html
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query Planner GROUP BY and HAVING clauses optimization ?

Keith Medcalf

On Wednesday, 15 January, 2020 02:06, Jean-Baptiste Gardette <[hidden email]> wrote:

> Just to be sure, is it unsafe to write a non agregate SELECT with GROUP
> BY and HAVING clauses (without sub-SELECT) for the sole prupose
> explained before (even if the approache is discutable) ?

Presently, yes it is.

>I understand 2 different answers here :

>- "No, this kind of query can't be rewritten by the optimizer for the
>technical reasons (VDBE, index etc)"

This is presently the case.  Current versions of the query planner will not optimize these queries but will instead execute them as written.  That is statements of the form "select ... from ... where ... group by ... having ... order by ... limit ... offset ..." will not convert the "group by" into an "order by" or the "having" into "where".  The provers required to allow this sort of transformation do not exist at present.

>- "Yes it is unsafe, a future version of SQLite may optimize differently
>this kind of query"

Yes.  A future version of SQLite may indeed process the query by turning the "group by" into an "order by" (if there isn't one, or just ignoring it if there is an order by, or perhaps merging them) and moving the "having" to a "where" condition.  Doing this would require that the optimizer recognize that the group by expression can only result in single row groups and that neither the select list nor the having expression contain aggregate functions.  There is almost nothing to be gained from this optimization, however, so it is highly unlikely that such provers would be written in order to implement this particular optimization.

Contrast this with recent optimizations that have been added, for example, the LEFT JOIN optimization which downgrades an outer join into an inner join if it can be proved that the overall result will be the same (the extra candidates generated by the outer join will be removed from the result set by a where clause, so going to all the bother of adding and processing them in the first place serves no purpose), or that leaf tables which are not referenced in the select list are removed from the query since generating those results merely incurs a cost to no effect.  These optimizations can have a significant impact on performance.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query Planner GROUP BY and HAVING clauses optimization ?

Jean-Baptiste Gardette
Thank you Keith for the detail explanation.
I misunderstood the 2 replies were opposite but this is not the case.

Thank you again
Jean-bapstiste
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users