Equiv stmts, different explain plans

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Equiv stmts, different explain plans

Kyle
On another DB I came across 2 stmts, that I think are equivalent, but
generated different explain plans. I request a second opinion - are
these 2 stmts equivalent? If so, why do they generate different explain
plans even on sqlite?
TIA
--
create table t1(c,d);
create table t2(c,d);
explain select * from t1
   where c=1 and d in (select d from t2 where c=1);
explain select * from t1
   where c=1 and d in (select d from t2 where t2.c=t1.c);
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Equiv stmts, different explain plans

Richard Hipp-3
On 3/4/19, Kyle <[hidden email]> wrote:
> On another DB I came across 2 stmts, that I think are equivalent, but
> generated different explain plans. I request a second opinion - are
> these 2 stmts equivalent? If so, why do they generate different explain
> plans even on sqlite?

The two SELECT statements below may well compute the same output
(unless I'm missing something) but they are not the same.  The WHERE
clause in the subquery is different. So why do you expect them to
generate the same query plan?

> create table t1(c,d);
> create table t2(c,d);
> explain select * from t1
>    where c=1 and d in (select d from t2 where c=1);
> explain select * from t1
>    where c=1 and d in (select d from t2 where t2.c=t1.c);


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Equiv stmts, different explain plans

Kyle
On 05/03/2019 01:33, Richard Hipp wrote:

> On 3/4/19, Kyle <[hidden email]> wrote:
>> On another DB I came across 2 stmts, that I think are equivalent, but
>> generated different explain plans. I request a second opinion - are
>> these 2 stmts equivalent? If so, why do they generate different explain
>> plans even on sqlite?
>
> The two SELECT statements below may well compute the same output
> (unless I'm missing something) but they are not the same.  The WHERE
> clause in the subquery is different. So why do you expect them to
> generate the same query plan?
>
>> create table t1(c,d);
>> create table t2(c,d);
>> explain select * from t1
>>     where c=1 and d in (select d from t2 where c=1);
>> explain select * from t1
>>     where c=1 and d in (select d from t2 where t2.c=t1.c);
>
>
DRH, many thanks for your reply, I was expecting same output because I
believe stmts to be equivalent, so was not sure why query plan was
different. I see the explain plans are very similar.
But I believe original stmts mentioned are still equivalent?
Do you agree? And in SQLite what is best way to write such stmt (or in
other terms, what is difference)?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Equiv stmts, different explain plans

Keith Medcalf
In reply to this post by Kyle

In the first query the subselect that creates the list is independent.
In the second query the subselect that creates the list is correlated.

In the first query you have requested that the subquery be executed to create the list for use by the IN operator.  After this has been done the main (outer) query is executed and a result generated when the where condition (which includes the IN operator) are satisfied.  Since it is possible that the list may not need to be generated because the condition c==1 in the outer query may never be satisfied, the subquery is only executed ONCE the first time its results are required.

In the second query you have requested that the outer query be executed AND FOR EACH ROW that passes the WHERE c=1 constraint, execute the subquery and then check if d in the outer query is in the set of the results obtained by running the correlated subquery.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Kyle
>Sent: Monday, 4 March, 2019 18:05
>To: [hidden email]
>Subject: [sqlite] Equiv stmts, different explain plans
>
>On another DB I came across 2 stmts, that I think are equivalent, but
>generated different explain plans. I request a second opinion - are
>these 2 stmts equivalent? If so, why do they generate different
>explain
>plans even on sqlite?
>TIA
>--
>create table t1(c,d);
>create table t2(c,d);
>explain select * from t1
>   where c=1 and d in (select d from t2 where c=1);
>explain select * from t1
>   where c=1 and d in (select d from t2 where t2.c=t1.c);
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Equiv stmts, different explain plans

Hick Gunter
In reply to this post by Kyle
Both statements generate the same result set, but they are neither equivalent nor equally fast.

The first statement uses a *constant* subquery as the RHS of an IN expression. The QP is free to materialize this query (i.e. run it once and keep the results in an "ephemeral" table with an index for fast lookup).

The second statement uses a *correlated* subquery as the RHS of an IN expression. The QP needs to actually run this query for every record of t1 that matches the condition t1.c == 1.

Imagine what happens if the constraint t1.c == 1 is changed to t1.c == 2.

The first statement will still be checking against the same constant subquery result set.

The second statement will be checking against a new and possibly quite different correlated subquery result set.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Kyle
Gesendet: Dienstag, 05. März 2019 02:05
An: [hidden email]
Betreff: [EXTERNAL] [sqlite] Equiv stmts, different explain plans

On another DB I came across 2 stmts, that I think are equivalent, but generated different explain plans. I request a second opinion - are these 2 stmts equivalent? If so, why do they generate different explain plans even on sqlite?
TIA
--
create table t1(c,d);
create table t2(c,d);
explain select * from t1
   where c=1 and d in (select d from t2 where c=1); explain select * from t1
   where c=1 and d in (select d from t2 where t2.c=t1.c); _______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Equiv stmts, different explain plans

R Smith-2
In reply to this post by Kyle

On 2019/03/05 4:06 AM, kk wrote:

> On 05/03/2019 01:33, Richard Hipp wrote:
>>
>>> create table t1(c,d);
>>> create table t2(c,d);
>>> explain select * from t1
>>>     where c=1 and d in (select d from t2 where c=1);
>>> explain select * from t1
>>>     where c=1 and d in (select d from t2 where t2.c=t1.c);
>>
>>
> DRH, many thanks for your reply, I was expecting same output because I
> believe stmts to be equivalent//...

They are very much not equivalent. They happen to produce the same
output with this very specific crafted schema and queries, but that does
not say that they mean the same thing, in fact they mean very different
things in execution. I think Keith explained it well enough technically,
but in case it is not 100% clear yet, let me add to it this example:

Say we have a group of random people, and I asked you to separate out
all the people aged above 25, and then from that group separate out all
the women, and then from that group separate all who have
husbands/partners in the original group.

The next day, with the same group of beings, I might ask to first
separate out all partnered pairs from the group, then from that group
separate out all females and from that remainder, get everything that's
been on Earth more than 25 years.

You might rightfully protest that, in the end, we would have the exact
same people we've already picked out yesterday, and it would be true -
however, the intermediate groups along the execution plan look very
different, and the method you've used to achieve this second result
follows a very different set of instructions, and, if the origin group
allowed non-humans in, the second query may actually yield different
results.

They are not equivalent in function just because they happen to yield
the same end-results for the specific schema and content.


Hope that is a useful clarification!

Ryan


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Equiv stmts, different explain plans

Simon Slavin-3
In reply to this post by Kyle
On 5 Mar 2019, at 2:06am, kk <[hidden email]> wrote:

>>> select * from t1
>>>    where c=1 and d in (select d from t2 where c=1);
>>> select * from t1
>>>    where c=1 and d in (select d from t2 where t2.c=t1.c);

> DRH, many thanks for your reply, I was expecting same output because I believe stmts to be equivalent, so was not sure why query plan was different. I see the explain plans are very similar.
> But I believe original stmts mentioned are still equivalent?

How do you expect a SQL engine to approach the above statements ?  Should it process the inner SELECT first, or the outer SELECT first ?

If it processes the inner SELECT first, where does it get the value it needs for t1.c ?

If it processes the outer SELECT first, what strategy does it use for selecting on t1.d when it doesn't yet know whether there's going to be no, a single, or multiple values ?

> Do you agree? And in SQLite what is best way to write such stmt (or in other terms, what is difference)?

Using a JOIN.

SELECT t1.* FROM t1
    INNER JOIN t2 ON t2.c=1 AND t2.d = t1.d
    WHERE t1.c=1;
SELECT t1.* FROM t1
    INNER JOIN t2 ON t2.c=t1.c AND t2.d = t1.d
    WHERE t1.c=1;

The INNER JOIN (as opposed to OUTER JOIN) means that a row must exist in t2 for the equivalent row in t1 to be returned.  INNER is the default kind of JOIN.  Of the two statements, it seems that the fist one requires less processing.

Internally, SQLite does comparison and conversion when faced with different ways of phrasing your query.  But that's not your problem.  Phrase what you want in as specific terms as possible, and let SQLite pick its preferred way of solving the problem.

Simon.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Equiv stmts, different explain plans

Keith Medcalf

On Tuesday, 5 March, 2019 04:09, Simon Slavin <[hidden email]> wrote:

>On 5 Mar 2019, at 2:06am, kk <[hidden email]> wrote:

>>>> select * from t1
>>>>    where c=1 and d in (select d from t2 where c=1);
>>>> select * from t1
>>>>    where c=1 and d in (select d from t2 where t2.c=t1.c);

>> DRH, many thanks for your reply, I was expecting same output
>because I believe stmts to be equivalent, so was not sure why query
>plan was different. I see the explain plans are very similar.
>> But I believe original stmts mentioned are still equivalent?

>How do you expect a SQL engine to approach the above statements ?
>Should it process the inner SELECT first, or the outer SELECT first ?

>If it processes the inner SELECT first, where does it get the value
>it needs for t1.c ?

>If it processes the outer SELECT first, what strategy does it use for
>selecting on t1.d when it doesn't yet know whether there's going to
>be no, a single, or multiple values ?

>> Do you agree? And in SQLite what is best way to write such stmt (or
>in other terms, what is difference)?
>
>Using a JOIN.
>
>SELECT t1.* FROM t1
>    INNER JOIN t2 ON t2.c=1 AND t2.d = t1.d
>    WHERE t1.c=1;

Technically this is invalid.  t2.c == 1 is NOT a equijoin condition and should NOT appear in the ON clause (though it would be valid in the case of an outer join).  However, since [INNER] JOIN is merely syntactic sugar for a , and the ON is merely syntactic sugar for a WHERE clause, this does not really matter much.  Moreover, the query plans will be different because in this case you are joining T1 against T2 using only the common column d, and then filtering the interim results based on t1.c and t2.c.  (The actual plan will use the c==1 condition to constrain the outer loop, then loop through the inner table based on the join column d, then filter the result of that with the "other" c==1 condition -- which table is chosen as the outer table is up to the query planner (and it will choose whichever one it things the constraint c==1 will produce the least rows)).

This is entirely different from the below query where you are joining T1 and T2 on the common columns c and d, then filtering for a specific value of c in t1 (which will constrain the outer loop).

The total number of candidate solutions (the number of intersects in the nested loops) can be quite different.

>SELECT t1.* FROM t1
>    INNER JOIN t2 ON t2.c=t1.c AND t2.d = t1.d
>    WHERE t1.c=1;

>The INNER JOIN (as opposed to OUTER JOIN) means that a row must exist
>in t2 for the equivalent row in t1 to be returned.  INNER is the
>default kind of JOIN.  Of the two statements, it seems that the fist
>one requires less processing.

Actually, the latter (the properly phrased equijoin) will require the least processing since it gives the query planner the greatest latitude to generate an optimal solution.  In one case (the former) you have the condition "t1.c == 1 AND t2.c == 1".  The query planner cannot possibly derive from this the fact that "t1.c == t2.c".  However, in the latter case, you have the condition "t1.c == t2.c AND t1.c == 1" from which the query planner can determine the fact that "t2.c == 1".  Depending on the shape of the data and the available statistics this may have a great effect on the performance of the query because the query planner has more information and may choose a more optimal solution (that is, it may now choose whether t1 or t2 is the outer loop, and the descent into the inner table uses both columns).

This is, of course, also fully equivalent to the properly phrased JOIN but it does constrain the solution (it is equivalent to specifying "t1 CROSS JOIN t2" in the above properly phrased equijoin in that it constrains the order of traversal of the tables):

SELECT *
  FROM t1
 WHERE c == 1
   AND EXISTS (SELECT * FROM t2 WHERE c == t1.c AND d == t1.d)
;

since no actual data from t2 needs to be returned.  Which "phrasing" of the myriad of perhaps different queries producing (quite possibly) the same (or perhaps different) results probably depends on how one "translates" the original problem statement from English into SQL (and the adequacy of that problem statement) together with the maintainability of the application and the actual schema including the indexes ...

>Internally, SQLite does comparison and conversion when faced with
>different ways of phrasing your query.  But that's not your problem.
>Phrase what you want in as specific terms as possible, and let SQLite
>pick its preferred way of solving the problem.

Of course, which one chooses depends entirely on the original problem statement.  If one "assumes" that the t1.c=1 and t2.c=1 in the original statement:

select * from t1
   where c=1 and d in (select d from t2 where c=1);

is merely a lazy programmer expressing that t1.c == t2.c without actually saying so, then the properly phrased equijoin is the most efficient because it gives the query planner the most latitude to generate an optimal solution.  This can only be determined by reference to the original statement of the problem that the SQL statement was designed to satisfy (since they are perhaps all the same yet different at the same time, and there is really insufficient information from which to make a decision).

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Equiv stmts, different explain plans

James K. Lowden
In reply to this post by Hick Gunter
On Tue, 5 Mar 2019 08:13:32 +0000
Hick Gunter <[hidden email]> wrote:

> The second statement uses a *correlated* subquery as the RHS of an IN
> expression. The QP needs to actually run this query for every record
> of t1 that matches the condition t1.c == 1.

I'm not sure what you mean be "needs", above.  If you're describing the
way the SQLite QP works, OK.  If you're asserting that the QP or any QP
must work that way, no, that's common fallacy.  The person writing the
query may think of a correlated subquery that way; it's *logically*
true.  But the planner is free to execute the query however it
chooses.  In fact, SQLite explains in great detail when the optimizer
will "flatten" a subquery into a join.  

>  select * from t1
>    where c=1 and d in (select d from t2 where c=1);
>  select * from t1
>    where c=1 and d in (select d from t2 where t2.c=t1.c);

Consider:  

        select distinct t1.*
        from t1 join t2
        on t1.c = t2.c and t1.d = t2.d
        where t1.c = 1

Every existential quantification can be recast as a join.  

--jkl

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Equiv stmts, different explain plans

James K. Lowden
In reply to this post by Keith Medcalf
On Mon, 04 Mar 2019 20:20:08 -0700
"Keith Medcalf" <[hidden email]> wrote:

> In the first query the subselect that creates the list is independent.
> In the second query the subselect that creates the list is correlated.

Yes, and if it can be shown that the two queries are logically
equivalent under relational algebra, then it's theoretically possible
for the query planner to have arrived at the same plan in both cases.
That is the only test that could support/deny the assertion that they
could be rendered according to the same execution plan.  

> In the first query you have requested that the subquery be executed
> to create the list for use by the IN operator.  

No.  The query requests no such thing.  SQL makes no request or
suggestion for how to execute a query.  It simply describes a result.
It's up to the implementation to determine how to produce that result.  

--jkl


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Equiv stmts, different explain plans

Keith Medcalf

On Tuesday, 5 March, 2019 12:53, James K. Lowden <[hidden email]> wrote:

>On Mon, 04 Mar 2019 20:20:08 -0700> "Keith Medcalf" <[hidden email]> wrote:

>> In the first query the subselect that creates the list is
>> independent.
>> In the second query the subselect that creates the list is
>> correlated.

>Yes, and if it can be shown that the two queries are logically
>equivalent under relational algebra, then it's theoretically possible
>for the query planner to have arrived at the same plan in both cases.
>That is the only test that could support/deny the assertion that they
>could be rendered according to the same execution plan.

>> In the first query you have requested that the subquery be executed
>> to create the list for use by the IN operator.

>No.  The query requests no such thing.  SQL makes no request or
>suggestion for how to execute a query.  It simply describes a result.
>It's up to the implementation to determine how to produce that
>result.

You are, of course, correct.  However for the two queries given I do not believe that any query planner currently in existence will recognize that t1.c == 1 and t2.c == 1 implies that t1.c == t2.c.  However, that implication may be stated explicitly (as it is in the correlated subquery).  It is also entirely possible that if the (first) query were phrased as:

select *
  from t1
 where c == ?0
   and d in (select d from t2 where c == ?0 and d == t1.d)
;

then it is quite possible for the query planner to take notice of the fact that t1.c == t2.c ...

Similarly I would not *expect* that a query planner would consider t1.c and t2.c to be transitively equal if the query were phrased as:

select *
  from t1
 where c == ?
   and d in (select d from t2 where c == ? and d == t1.d)
;

even if the two parameters were the same value ...

As another note, you also commented that a "select distinct * from t1 join ...." is (possibly) equivalent.  This is not necessarily the case because there is nothing in the schema which requires the rows of t1 (or t2 for that matter) to be distinct and thus the loop order does affect the output (without distinct).  Granted, with distinct the output (set) will be the same no matter the loop nesting order, it may not be the same without distinct depending on the data in the tables.

In other words, we arrive at the same point in the end.  It depends on the original "problem statement" which the SQL was composed to solve.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Equiv stmts, different explain plans

James K. Lowden
On Tue, 05 Mar 2019 13:58:06 -0700
"Keith Medcalf" <[hidden email]> wrote:

> >The query requests no such thing.  SQL makes no request or
> >suggestion for how to execute a query.  It simply describes a result.
> >It's up to the implementation to determine how to produce that
> >result.
>
> You are, of course, correct.  However for the two queries given I do
> not believe that any query planner currently in existence will
> recognize that t1.c == 1 and t2.c == 1 implies that t1.c == t2.c.  

Thank you for the clarification, Keith.  You may well be right about
the state of the art.  I fault SQL itself; if it implemented relational
algebra correctly, there would be no duplicates from SELECT, and no need
of DISTINCT.  Perhaps then the transformation of FORALL to JOIN would
be easier to infer.  

There is sometimes a tendency in this forum to use shorthand, to
describe what SQLite does as what SQL does.  It's useful for the users'
sake to distinguish between the two, so as not to confuse the
attainable with the attained.   :-)

--jkl
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users