EXISTS optimisation?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

EXISTS optimisation?

Constantin Emil MARINA
Hi all,

I am wondering if in SQLITE the EXISTS clause is expanded and optimized
in any way.

This is generated by the observation that 2 algebrically equivalent queries,

SELECT WHERE EXISTS ()

and

SELECT WHERE id IN (SELECT ...)

produce different execution plans and different performance, with WHERE
id IN (SELECT ) looking properly optimized.

We could not find any reference to this in
https://sqlite.org/optoverview.html

Thanks,
Dinu

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXISTS optimisation?

Clemens Ladisch
Constantin Emil MARINA wrote:
> I am wondering if in SQLITE the EXISTS clause is expanded and optimized in any way.

No.

> This is generated by the observation that 2 algebrically equivalent queries,
>   SELECT WHERE EXISTS ()
> and
>   SELECT WHERE id IN (SELECT ...)
> produce different execution plans and different performance, with WHERE id IN (SELECT ) looking properly optimized.

All three ways of writing this query are executed in the same way.  In
particular, the IN and JOIN queries are rewritten to act like the EXISTS
query:

create table t(x);
create table lookup(x primary key);

select * from t where exists (select * from lookup where x = t.x);
--EQP-- 0,0,0,SCAN TABLE t
--EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 1
--EQP-- 1,0,0,SEARCH TABLE lookup USING COVERING INDEX sqlite_autoindex_lookup_1 (x=?)

select * from t where x in (select x from lookup);
--EQP-- 0,0,0,SCAN TABLE t
--EQP-- 0,0,0,USING INDEX sqlite_autoindex_lookup_1 FOR IN-OPERATOR

select t.* from t join lookup using (x);
--EQP-- 0,0,0,SCAN TABLE t
--EQP-- 0,1,1,SEARCH TABLE lookup USING COVERING INDEX sqlite_autoindex_lookup_1 (x=?)


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXISTS optimisation?

Dinu
Thanks Clemens,

I triple-checked it and it is indeed generating different execution plans,
with the queries being absolutely equivalent. I will try to produce a
minimal test case (right now the query where this occurs is a 100 lines long
monster).

However, I am a bit confused by the examples you provided:
The 3rd query is the equivalent of the first 2 only if the lookup table has
an unique index on x (which it does). However, this would make a very
restricted case of why you would use EXISTS().

In my case, this is completely opposite: x is definitely not unique in the
lookup table, that's precisely why I'm using EXISTS or IN, to avoid the row
multiplication generated by a JOIN.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXISTS optimisation?

Clemens Ladisch
Dinu wrote:
> I triple-checked it and it is indeed generating different execution plans

Probably different indexes?  Are these actual tables?

>> select * from t where exists (select * from lookup where x = t.x);
>> select * from t where x in (select x from lookup);
>> select t.* from t join lookup using (x);
>
> However, I am a bit confused by the examples you provided:
> The 3rd query is the equivalent of the first 2 only if the lookup table has
> an unique index on x (which it does). However, this would make a very
> restricted case of why you would use EXISTS().

It's just another example of how a query is executed the same; I was not
implying that it would be preferrable over the other ones.

(The third case is interesting if there is no index on the lookup column,
because with a join, SQLite estimates that it's worthwhile to create
a temporary index on the lookup column.  IN would create a temporary
index for the result of the subquery, while EXISTS would still execute
the subquery (now a table scan) for every outer row.)


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXISTS optimisation?

Dinu
Quick note to self and others:
IN() and EXISTS() in all RDB's I know of are the uncle noone mentions; that
is to say, they have different compare semantics than JOIN so the naive
strategy is to evaluate them as dependent subqueries, not correlated ones,
which would be consistent with the behavior I noticed. However, I do know of
Maria and Postgres that do a decent job at optimizing EXISTS () (which I
think is by all means the correct semantic for this intent). But there's by
no means a golden standard across RDB's so that's why it would be very
useful to have some documentation on it, as it's one of the migration
pitfalls.

I'm still in debt with the TC, will work on it the next days.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users