LEFT JOIN + WHERE / OR optimisation

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

LEFT JOIN + WHERE / OR optimisation

Dinu
Hi all,
I've ran into an optimisation problem with a double-left join that works as
an "either" clause.

The query is as follows:

SELECT *
FROM
  a
LEFT JOIN
  b ON <cond>
LEFT JOIN
  c ON <cond>
WHERE
  b.someId IN (1,2,3) OR
  c.someId IN (4,5)

This results in a bloated execution plan:
SEARCH a
SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX

However, the semantically equivalent:
SELECT *
FROM
  a
LEFT JOIN
  b ON <cond> AND b.someId IN (1,2,3)
LEFT JOIN
  c ON <cond>AND c.someId IN (4,5)
WHERE
  b.someId IS NOT NULL OR
  c.someId IS NOT NULL

Gets the proper execution plan:
SEARCH b
SEARCH c
EXECUTE LIST SUBQUERY



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Dinu
Probably related:

Compound join with a left outer join generates different execution plans:

LEFT JOIN (
  b
  JOIN c ON <cond>
)
WHERE
  b.something = 5

vs.

LEFT JOIN (
  b
  JOIN c ON <cond> AND b.something = 5
)
WHERE
  b.something IS NOT NULL




--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Keith Medcalf
In reply to this post by Dinu

They are not semantically equivalent.  join conditions attached to an outer join operation are not semantically equivalent to the same conditions being in the where clause.

In other words:

select a,b,c
  from a
  join b
  join c on a.a=b.b
 where c.c=b.d

is simply syntactic sugar for

select a,b,c
  from a, b, c
 where a.a=b.b
   and c.c=b.d;

In all cases the conditions in ON clauses of INNER JOINS are nothing more than WHERE clause filters.  You do not even have to have the tables used in the ON clause "referenced" at the point you refer to them.

the word "INNER JOIN" is syntactic sugar for a comma (,), and ON is sytactic sugar for the word WHERE (or AND).

However, for OUTER JOINS the conditions in the ON clause "glue themselves" to the OUTER JOIN operation and ARE NEITHER syntactically or symantically the same as WHERE clause conditions.

That is to say the behaviour observed is how it is designed to work and you expectations are misguided.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Dinu
>Sent: Thursday, 4 January, 2018 12:29
>To: [hidden email]
>Subject: [sqlite] LEFT JOIN + WHERE / OR optimisation
>
>Hi all,
>I've ran into an optimisation problem with a double-left join that
>works as
>an "either" clause.
>
>The query is as follows:
>
>SELECT *
>FROM
>  a
>LEFT JOIN
>  b ON <cond>
>LEFT JOIN
>  c ON <cond>
>WHERE
>  b.someId IN (1,2,3) OR
>  c.someId IN (4,5)
>
>This results in a bloated execution plan:
>SEARCH a
>SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
>SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX
>
>However, the semantically equivalent:
>SELECT *
>FROM
>  a
>LEFT JOIN
>  b ON <cond> AND b.someId IN (1,2,3)
>LEFT JOIN
>  c ON <cond>AND c.someId IN (4,5)
>WHERE
>  b.someId IS NOT NULL OR
>  c.someId IS NOT NULL
>
>Gets the proper execution plan:
>SEARCH b
>SEARCH c
>EXECUTE LIST SUBQUERY
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

R Smith-2
In reply to this post by Dinu

On 2018/01/04 9:28 PM, Dinu wrote:

> Hi all,
> I've ran into an optimisation problem with a double-left join that works as
> an "either" clause.
>
> The query is as follows:
>
> SELECT *
> FROM
>    a
> LEFT JOIN
>    b ON <cond>
> LEFT JOIN
>    c ON <cond>
> WHERE
>    b.someId IN (1,2,3) OR
>    c.someId IN (4,5)
>
> This results in a bloated execution plan:
> SEARCH a
> SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
> SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX
>
> However, the semantically equivalent:
> SELECT *
> FROM
>    a
> LEFT JOIN
>    b ON <cond> AND b.someId IN (1,2,3)
> LEFT JOIN
>    c ON <cond>AND c.someId IN (4,5)
> WHERE
>    b.someId IS NOT NULL OR
>    c.someId IS NOT NULL
>
> Gets the proper execution plan:
> SEARCH b
> SEARCH c
> EXECUTE LIST SUBQUERY

These Queries are not equivalent, they cannot and should not have the
same query plan.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

David Raymond
In reply to this post by Keith Medcalf
The ON condition is used <before> the "add one result row for each row of the outer table where nothing matches the ON condition"
The WHERE condition is used <after> those rows are added.

Example with the basic "not in" type of outer join:

SQLite version 3.21.0 2017-10-24 18:55:49
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> create table a (x);

sqlite> create table b (x);

sqlite> insert into a values (1), (2), (3);

sqlite> insert into b values (2);

sqlite> select a.x from a left outer join b on a.x = b.x where b.x is null;
x
1
3

sqlite> select a.x from a left outer join b on a.x = b.x and b.x is null;
x
1
2
3

sqlite> select a.x from a left outer join b where a.x = b.x and b.x is null;

sqlite>

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Keith Medcalf
Sent: Thursday, January 04, 2018 2:53 PM
To: SQLite mailing list
Subject: Re: [sqlite] LEFT JOIN + WHERE / OR optimisation


They are not semantically equivalent.  join conditions attached to an outer join operation are not semantically equivalent to the same conditions being in the where clause.

In other words:

select a,b,c
  from a
  join b
  join c on a.a=b.b
 where c.c=b.d

is simply syntactic sugar for

select a,b,c
  from a, b, c
 where a.a=b.b
   and c.c=b.d;

In all cases the conditions in ON clauses of INNER JOINS are nothing more than WHERE clause filters.  You do not even have to have the tables used in the ON clause "referenced" at the point you refer to them.

the word "INNER JOIN" is syntactic sugar for a comma (,), and ON is sytactic sugar for the word WHERE (or AND).

However, for OUTER JOINS the conditions in the ON clause "glue themselves" to the OUTER JOIN operation and ARE NEITHER syntactically or symantically the same as WHERE clause conditions.

That is to say the behaviour observed is how it is designed to work and you expectations are misguided.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Dinu
>Sent: Thursday, 4 January, 2018 12:29
>To: [hidden email]
>Subject: [sqlite] LEFT JOIN + WHERE / OR optimisation
>
>Hi all,
>I've ran into an optimisation problem with a double-left join that
>works as
>an "either" clause.
>
>The query is as follows:
>
>SELECT *
>FROM
>  a
>LEFT JOIN
>  b ON <cond>
>LEFT JOIN
>  c ON <cond>
>WHERE
>  b.someId IN (1,2,3) OR
>  c.someId IN (4,5)
>
>This results in a bloated execution plan:
>SEARCH a
>SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
>SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX
>
>However, the semantically equivalent:
>SELECT *
>FROM
>  a
>LEFT JOIN
>  b ON <cond> AND b.someId IN (1,2,3)
>LEFT JOIN
>  c ON <cond>AND c.someId IN (4,5)
>WHERE
>  b.someId IS NOT NULL OR
>  c.someId IS NOT NULL
>
>Gets the proper execution plan:
>SEARCH b
>SEARCH c
>EXECUTE LIST SUBQUERY
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Dinu
I think they are equivalent, if you look closer.

SELECT FROM a LEFT JOIN b ON a.x=b.x WHERE b.y=5 -is- equivalent to
SELECT FROM a JOIN b ON a.x=b.x AND b.y=5
SELECT FROM a JOIN b WHERE a.x=b.x AND b.y=5
SELECT FROM a LEFT JOIN b ON a.x=b.x AND b.y=5 WHERE b.y IS NOT NULL

All the above are semantically equivalent. When there is only one LEFT JOIN,
the presence of any non-null non-alternative condition on the joined table
in the WHERE clause transforms it in an INNER join. There is no other way to
have a non-null value except if the row exists. The reciprocal is not true
of course.

I don't know how difficult it is to compute the -OR- closure, as it is more
difficult. But for an imperative non-null condition, I did expect the WHERE
condition to be ported to the ON lookup for optimisation.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Dinu
Algebrically, having a non-null imperative lookup condition in the WHERE
clause means you have a stronger predicate on the same subject (ALL MUST fit
vs. ANY that fit).



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Keith Medcalf

They are not the same.  Just as 5 - 3 is not the same as 1 + 1, even though both come up with the same result, 2. by happenstance.

Your "where" condition is effectively converted an OUTER JOIN into an INNER JOIN through artifice (and quite likely mistake).  If you *want* an inner join, use an inner join.  If you want an outer join, use an outer join.  Just because subtraction of two different numbers may have the same result as addition of two other numbers, does not meant that addition and subtraction are the same thing.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Dinu
>Sent: Thursday, 4 January, 2018 16:01
>To: [hidden email]
>Subject: Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
>
>Algebrically, having a non-null imperative lookup condition in the
>WHERE
>clause means you have a stronger predicate on the same subject (ALL
>MUST fit
>vs. ANY that fit).
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Dinu
Thank you for your answer, Keith. I had my problem "fixed" before I wrote the
first mail. Also with every problem I also provided the fix that worked, for
anyone that might run into the same problem.
However, it's difficult to not get a little frustrated with your answer.

At https://sqlite.org/queryplanner.html I read:

"The best feature of SQL (in all its implementations, not just SQLite) is
that it is a declarative language, not a procedural language. When
programming in SQL you tell the system what you want to compute, not how to
compute it."

And I completely agree with this, "how to compute it" is called relational
algebra and it's what a query planner should do best. And the two queries
are algebrically identical. "(X ∊ S or X:=null) AND (X is not null)" is
equivalent to "X ∊ S is not null". The two queries might look different only
from an imperative programming point of view.

As to why the query is written that way: with the above in mind, I will
contend that there can absolutely never exist a "mistaken" way to write a
query, as long as the description of the predicates is correct and
consistent with the schema. You should consider that quite frequently
queries are the result of one or more levels of logic abstraction (ORM,
DBAL, etc). In my case, modifying the query was not difficult to do, but in
other cases one may have few options on rewriting the way the query
structure is generated. The only way to reduce a fabricated query is through
relational algebra, and that is up to the DB, not the programmer, not the
abstractions in-between.

In this particular case, the where is optional; depending on parameters, I
want the set of data that is correctly defined as the left join of tables a
and b, or I might want a subset of this join that has a particular property
over the left-joined set. The query was correctly written, to rewrite it so
that the query planner might know how to run it is wrong, IMHO.

To sum it up: I think it's every DB's intention to optimize as best possible
a query into an execution plan. None does it perfectly, but all try to, very
hard. With this intention, I reported a case where the query planner COULD
be improved. I think you will at least agree with me that making it better
can't be wrong. Whether that happens tomorrow, in a year or never, that's up
to the mercy, resources and priorities of the developers, so I am really am
not interested in an argue over this.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

R Smith-2

On 2018/01/05 4:24 AM, Dinu wrote:

> Thank you for your answer, Keith. I had my problem "fixed" before I wrote the
> first mail. Also with every problem I also provided the fix that worked, for
> anyone that might run into the same problem.
> However, it's difficult to not get a little frustrated with your answer.
>
> At https://sqlite.org/queryplanner.html I read:
>
> "The best feature of SQL (in all its implementations, not just SQLite) is
> that it is a declarative language, not a procedural language. When
> programming in SQL you tell the system what you want to compute, not how to
> compute it."

I'm sure his frustration is on a par. :)
While we've both stated that your queries are not equivalent, Keith took
the time to write an explanation of why that is, which seemingly did not
hit home, and now I will try again with an analogy:

You are essentially have a delivery person usually tasked to take a
truck, go to a farm and pick up a load of eggs. Now you ask the same
driver to use the same truck to go to the corner cafe and pick up 6
eggs, and then you exclaim "Wow, why he takes the truck? he can just
take the scooter!! The truck is sooo inefficient for this job!".

And yes, you are right, but the problem is you asked for it to be done
by truck. Now I agree the ideal in SQL is (as the quote above states)
that one should merely ask for a result and the engine should decide how
best to achieve it, but in practice there are many nuances in the
programming of the engine that thwarts this ideal, not to mention how
many programmers like to tweak their queries to get the engine to
execute the quickest, and there is nothing wrong with this. It does
however mean that the engine should in all circumstances, while trying
to find the best query plan, still adhere to the type of question that
was asked of it. You asked a METHOD A question, it won't (and shouldn't)
apply a METHOD B to reach the accidental similar result.

Second problem, your queries show the narrowest of use cases. The engine
has to work for ALL use cases which can get very involved and complex.
The engine needn't have another level of abstraction AI going "Oh this
one is simple, we will disregard what the programmer asked for and use
our own more simple query because it should get the same result."


> To sum it up: I think it's every DB's intention to optimize as best possible
> a query into an execution plan. None does it perfectly, but all try to, very
> hard. With this intention, I reported a case where the query planner COULD
> be improved. I think you will at least agree with me that making it better
> can't be wrong. Whether that happens tomorrow, in a year or never, that's up
> to the mercy, resources and priorities of the developers, so I am really am
> not interested in an argue over this.

It's a good idea to report possible improvements, and thank you for
that, but this case isn't able to improve since mangling an outer join
into an inner join when sometimes it might yield the same result is as
unsafe as it gets. However, that doesn't mean the devs (who would have
read all this) doesn't find something of interest and could possibly
think of a tweak that might improve things, so having this debate is
never a waste, but the specific algebraic essence of what you are
suggesting is not correct - 's all we're sayin.

Cheers!
Ryan

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Richard Hipp-3
In reply to this post by Dinu
On 1/4/18, Dinu <[hidden email]> wrote:
>  I think it's every DB's intention to optimize as best possible
> a query into an execution plan. None does it perfectly, but all try to, very
> hard.

There are trade-offs here.  How much slower are you willing for
sqlite3_prepare() to run in order to get a better query plan?  How
much extra memory and disk space are you willing to allocation to
libsqlite3.so in order to get a better query plan?  Are you willing to
impose these costs on (literally) billions of other users that don't
really need the more advanced query planning?  These are hard
questions with no easy answers.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

E.Pasma
In reply to this post by Dinu
Dinu wrote:

> Hi all,
> I've ran into an optimisation problem with a double-left join that  
> works as
> an "either" clause.
>
> The query is as follows:
>
> SELECT *
> FROM
>  a
> LEFT JOIN
>  b ON <cond>
> LEFT JOIN
>  c ON <cond>
> WHERE
>  b.someId IN (1,2,3) OR
>  c.someId IN (4,5)
>
> This results in a bloated execution plan:
> SEARCH a
> SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
> SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX
>
> However, the semantically equivalent:
> SELECT *
> FROM
>  a
> LEFT JOIN
>  b ON <cond> AND b.someId IN (1,2,3)
> LEFT JOIN
>  c ON <cond>AND c.someId IN (4,5)
> WHERE
>  b.someId IS NOT NULL OR
>  c.someId IS NOT NULL
>
> Gets the proper execution plan:
> SEARCH b
> SEARCH c
> EXECUTE LIST SUBQUERY
>


Hello, the discussion about whether the two queries are equivalent is  
not satisfactory to me. What Keith sais

> Your "where" condition is effectively converted an OUTER JOIN into  
> an INNER JOIN ..

is true. But the OR condition makes this true for either the one or  
the other outer join. I hope this is what Dinu means here:

> And the two queries are algebrically identical. "(X ∊ S or X:=null)  
> AND (X is not null)" is
> equivalent to "X ∊ S is not null". The two queries might look  
> different only
> from an imperative programming point of view.

Anyway the two queries return the same set of rows.

Furthermore: what is a "bloated" execution plan?
I set up some test data and the query deamed bloated appears just as  
fast. See below.

This test also show a small semantic difference in the two queries.  
The set of rows is the same but the second query leaves certain  
details null if only one of the OR conditions is true. That occurs in  
row 1.
The outcome of the "bloated" execution plan is more complete.

Possibly I am too pragmatical and don't understand the discussion.

E. Pasma


My test script:

create table a (a integer primary key, ab, ac);
create table b (b integer primary key, d);
create table c (c integer primary key, d);
insert into a values (null,1,1);
insert into a select null, 2,2 from a;
insert into a select null, 3,2 from a;
insert into a select null, 4,4 from a;
insert into a select null, 5,5 from a;
insert into a select null, 6,6 from a;
insert into a select null, 7,7 from a;
insert into a select null, 8,8 from a;
insert into a select null, 9,9 from a;
insert into a select null, 10,10 from a;

insert into b values (1,1),(2,2),(3,3),(4,3),(6,3);
insert into c values (1,1),(4,5),(5,5),(7,1);
.eqp on
.timer on
SELECT *
FROM
  a
LEFT JOIN
  b ON b=ab
LEFT JOIN
  c ON c=ac
WHERE
  b.d IN (1,2,3) OR
  c.d IN (4,5)
;

SELECT *
FROM
  a
LEFT JOIN
  b ON b=ab AND b.d IN (1,2,3)
LEFT JOIN
  c ON c=ac AND c.d IN (4,5)
WHERE
  b.d IS NOT NULL OR
  c.d IS NOT NULL
;

Output:

--EQP-- 0,0,0,SCAN TABLE a
--EQP-- 0,1,1,SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)
--EQP-- 0,2,2,SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 1
1|1|1|1|1|1|1
2|2|2|2|2||
3|3|2|3|3||
4|3|2|3|3||
5|4|4|4|3|4|5
6|4|4|4|3|4|5
7|4|4|4|3|4|5
8|4|4|4|3|4|5
9|5|5|||5|5
10|5|5|||5|5
11|5|5|||5|5
12|5|5|||5|5
13|5|5|||5|5
14|5|5|||5|5
15|5|5|||5|5
16|5|5|||5|5
17|6|6|6|3||
18|6|6|6|3||
19|6|6|6|3||
20|6|6|6|3||
21|6|6|6|3||
22|6|6|6|3||
23|6|6|6|3||
24|6|6|6|3||
25|6|6|6|3||
26|6|6|6|3||
27|6|6|6|3||
28|6|6|6|3||
29|6|6|6|3||
30|6|6|6|3||
31|6|6|6|3||
32|6|6|6|3||
Run Time: real 0.003 user 0.001587 sys 0.000358
--EQP-- 0,0,0,SCAN TABLE a
--EQP-- 0,1,1,SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 1
--EQP-- 0,2,2,SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?)
1|1|1|1|1||
2|2|2|2|2||
3|3|2|3|3||
4|3|2|3|3||
5|4|4|4|3|4|5
6|4|4|4|3|4|5
7|4|4|4|3|4|5
8|4|4|4|3|4|5
9|5|5|||5|5
10|5|5|||5|5
11|5|5|||5|5
12|5|5|||5|5
13|5|5|||5|5
14|5|5|||5|5
15|5|5|||5|5
16|5|5|||5|5
17|6|6|6|3||
18|6|6|6|3||
19|6|6|6|3||
20|6|6|6|3||
21|6|6|6|3||
22|6|6|6|3||
23|6|6|6|3||
24|6|6|6|3||
25|6|6|6|3||
26|6|6|6|3||
27|6|6|6|3||
28|6|6|6|3||
29|6|6|6|3||
30|6|6|6|3||
31|6|6|6|3||
32|6|6|6|3||
Run Time: real 0.002 user 0.001560 sys 0.000296








_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

David Raymond
> Anyway the two queries return the same set of rows.

> This test also show a small semantic difference in the two queries.  
> The set of rows is the same but the second query leaves certain  
> details null if only one of the OR conditions is true. That occurs in  
> row 1.

You're contradicting yourself there. If there's a difference in the results then they're not the same set of rows.

We'll just look at the "all 1" case. a has (1, 1, 1), b has (1, 1) and c has (1, 1)

Best if viewed in a fixed-width font

Step by step version 1:

a left join b on b = ab

 a         b
 a ab ac   b  d
(1, 1, 1) (1, 1)

result:
 a ab ac  b  d
(1, 1, 1, 1, 1)

left join c on c = ab

                 c
 a ab ac  b  d   c  d
(1, 1, 1, 1, 1) (1, 1)

result:
 a ab ac  b  d  c  d
(1, 1, 1, 1, 1, 1, 1)

where b.d in (1, 2, 3) or c.d in (4, 5)

b.d is 1, so it passes
result:

 a ab ac  b  d  c  d
(1, 1, 1, 1, 1, 1, 1)

The OR worked. We got the values from both tables b, and c and because one of them was correct.



Now, Step by step version 2:

a left join b on b = ab and b.d in (1, 2, 3)

 a         b
 a ab ac   b  d
(1, 1, 1) (1, 1)

result:
 a ab ac  b  d
(1, 1, 1, 1, 1)  same so far (but only by coincidence)

left join c on c = ac and c.d in (4, 5)

                 c
 a ab ac  b  d   c  d
(1, 1, 1, 1, 1) (1, 1)

the ON condition doesn't match. Since this is an outer join, and there were no matches for the row in the left side, nulls are included

result:
 a ab ac  b  d    c     d
(1, 1, 1, 1, 1,  null, null)

where b.d is not null or c.d is not null

b.d isn't null, so that passes.
result:

 a ab ac  b  d    c     d
(1, 1, 1, 1, 1,  null, null)

So we get a row saying that a matched something in b, but we're throwing out the value from the c table, which is not what we wanted.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

E.Pasma
op 05-01-2018 17:23 schreef David Raymond op [hidden email]:

>> Anyway the two queries return the same set of rows.
>
>> This test also show a small semantic difference in the two queries.
>> The set of rows is the same but the second query leaves certain
>> details null if only one of the OR conditions is true. That occurs in
>> row 1.
>
> You're contradicting yourself there. If there's a difference in the results
> then they're not the same set of rows.
>
> We'll just look at the "all 1" case. a has (1, 1, 1), b has (1, 1) and c has
> (1, 1)
>
> Best if viewed in a fixed-width font
>
> Step by step version 1:
>
> a left join b on b = ab
>
> a         b
> a ab ac   b  d
> (1, 1, 1) (1, 1)
>
> result:
> a ab ac  b  d
> (1, 1, 1, 1, 1)
>
> left join c on c = ab
>
> c
> a ab ac  b  d   c  d
> (1, 1, 1, 1, 1) (1, 1)
>
> result:
> a ab ac  b  d  c  d
> (1, 1, 1, 1, 1, 1, 1)
>
> where b.d in (1, 2, 3) or c.d in (4, 5)
>
> b.d is 1, so it passes
> result:
>
> a ab ac  b  d  c  d
> (1, 1, 1, 1, 1, 1, 1)
>
> The OR worked. We got the values from both tables b, and c and because one of
> them was correct.
>
>
>
> Now, Step by step version 2:
>
> a left join b on b = ab and b.d in (1, 2, 3)
>
> a         b
> a ab ac   b  d
> (1, 1, 1) (1, 1)
>
> result:
> a ab ac  b  d
> (1, 1, 1, 1, 1)  same so far (but only by coincidence)
>
> left join c on c = ac and c.d in (4, 5)
>
> c
> a ab ac  b  d   c  d
> (1, 1, 1, 1, 1) (1, 1)
>
> the ON condition doesn't match. Since this is an outer join, and there were no
> matches for the row in the left side, nulls are included
>
> result:
> a ab ac  b  d    c     d
> (1, 1, 1, 1, 1,  null, null)
>
> where b.d is not null or c.d is not null
>
> b.d isn't null, so that passes.
> result:
>
> a ab ac  b  d    c     d
> (1, 1, 1, 1, 1,  null, null)
>
> So we get a row saying that a matched something in b, but we're throwing out
> the value from the c table, which is not what we wanted.

Thanks, all clear except this last line. Did we not want the value to be
thrown out. Or not want the value?
It depends on that which query is favourite.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Dinu
In reply to this post by R Smith-2
Richard,

Thanks for acknowledging this, you are absolutely right, that's why I stated
that no DB does perfect optimisations and that computing the alternative
-OR- based closures are probably much harder to tackle. Also E. Pasma
pointed out the -OR- queries as I wrote them are not really semantically
equivalent unless the 2 joins are disjunct.

However, the case of the imperative WHERE NOT NULL implying INNER JOIN is
just a matter of replacing a predicate with a stronger one, so in all
fairness I imagined it a far lesser overhead than, say, the query flattener.
And I imagine it's a much more common situation, too, especially when users
are adding additional filters via WHERE clauses to a base query, so it might
benefit a lot of users, too. I know it would us, by not having to rewrite
these queries when porting; we are working on x86 servers, and a stick of
memory or a hard drive cost less than a programmer's day for us :)

For the extra memory, I know for computing relational closures the spatial
complexity can get big, but only when the structure of the query is written
warrants it in the first place, so it shouldn't manifest heavily on a query
that doesn't have this structure.

This is just my best view on this, obviously it's a political decision to be
made so it's no make-it-or-break-it thing, like mentioned before, we are
porting some pretty big system and when I notice differences with SQLite, I
jolt them down, in the hope it might benefit you or the millions of users,
if not by changing SQLite, then simply by pointing out the workaround to
other users, such as moving the WHERE condition out to the ON clause, it's
not necessarily a trivial thing to consider for everyone.

Ryan,

You cannot ask SQL a Method query, that's where my whole RDBMS understanding
takes me. It nullifies the purpose of queries as well as all efforts you
yourselves have put into a lot of things, query flattening to mention just
one. The "same result" is not accidental, the equivalent queries will
produce the same result no matter which data populates the tables. That is
the only deffinition I know of semantic equivalence. SQL is declarative and
thus everything that describes the same thing is the same thing.

E.Pasma,

Thanks for taking the time to make the TC. This is always a huge putdown for
me, because finally the execution plan depends on the data indexes are
populated with (via ANALYZE) and are tables are huge so it's always a
putdown for me to create a minimal TC.

Indeed I noticed just now the 2 queries are not equivalent that way :)
Thanks for pointing that out! I will work on an equivalent -and- optimized
rewrite :)

For the query plans though, here is where the index stats come in: here a
"SCAN a" makes sense, but in our case the number of records in a is on the
order of 10000x records to b and c, and also the cardinality of b.d and c.d
is on the order of 1000; so a "SEARCH b, SEARCH c" works out.

At minimum you should have indexes on b.d, c.d, a.ab, a.ac; but even so and
with adding another 1000 records on a, b and c and running the query:

EXPLAIN QUERY PLAN
SELECT *
FROM
  a
JOIN
  b ON b=ab AND b.d IN (1,2,3)
JOIN
  c ON c=ac AND c.d IN (4,5)

selectid |order |from |detail                                             |
---------|------|-----|---------------------------------------------------|
0        |0     |0    |SCAN TABLE a                                       |
0        |1     |1    |SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?) |
0        |0     |0    |EXECUTE LIST SUBQUERY 1                            |
0        |2     |2    |SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?) |

it still plans a "SCAN a" first. So I guess I'll have to backtrack from the
real data to generate a TC.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Dinu
Short sum-up:
- The -OR- alternative I provided is not semantically equivalent, I will
work on one that is :)
- The other one, without the -OR- (second post) still stands.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

R Smith-2
In reply to this post by Dinu
At the risk of preserving this thread well past its end of life cycle...

On 2018/01/05 6:58 PM, Dinu wrote:
> Ryan,
>
> You cannot ask SQL a Method query, that's where my whole RDBMS understanding
> takes me.

Everything you ask SQL is underpinned by a specific Method. Perhaps I
should have been more clear - by METHOD I mean, in the case of the join
method, "the /way/ you expect the query to enumerate rows" such as an
Inner join being one method, and outer join being another method.
Sorting/Ordering is a method of output, grouping, etc.

Why would the SQL standard propose these different methods if they were
not meaningful and distinct?

As I said before, there need not be an AI to judge that the query
uttered by the programmer can in fact, in a narrow case, be recomputed
as another query because the result will be the same and hopefully that
the alternate method would be more efficient.

Note that I said there needn't be... I did not say there /can't/ be one,
indeed, query flattening is a good example, but with query flattening
the cost is low and the reward is high for a really broad spectrum of cases.

Why did I say it is not needed?  Well, what you propose has a relatively
high cost (added heuristic AI) considering it is paid across all DB
engine query planning to achieve a small advantage in the narrowest of
use cases, not to mention that - should the programmer wish for a
speed-up for the left join that conforms to this narrow set of
circumstances, he or she could instantly change it to a normal join (the
way it should have been to start with) and enjoy the fruits of the added
speed with zero cost to the rest of us who wouldn't have made the
imperfect query in the first place.

Why do I call it narrow?  Have you looked at your example queries in
detail? Do you know how many things must be exactly just so (or how many
other normal query things must be absent) for that join replacement to
work algebraically? At least in the case of query flattening, it
improved a query construct that is found abundantly and considered the
correct construct for the expected results.

You essentially want the engine to second-guess programmers who didn't
write the best query for their expected results. That kind of
hand-holding belongs to the realms of Microsoft and MySQL.

(Apologies for all the word clarifications, but I'd rather avoid having
this turn into a "semantics" debate, so trying to be as clear as
possible on meanings of statements... not sure I succeeded though)  :)


Cheers,
Ryan

PS: I'm not judging MySQL, at least it has the benefit of being a fully
fledged server-side software and greatly tweak-able on the fly for all
its hand-holdy functionality.
PPS: I will say this - If you're not using MySQL in STRICT mode, you
are  n  hours away from some disaster, where n is a not-too-big positive
integer.
PPPS: I wish SQLite had a STRICT mode. :)



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Keith Medcalf
In reply to this post by Dinu

>Thanks for acknowledging this, you are absolutely right, that's why I
>stated
>that no DB does perfect optimisations and that computing the
>alternative
>-OR- based closures are probably much harder to tackle. Also E. Pasma
>pointed out the -OR- queries as I wrote them are not really
>semantically
>equivalent unless the 2 joins are disjunct.

I suspect that query re-write of an outer join to an inner join would be violating some rule in the new SQL standards, most likely about visitation (nested loop) order -- which is freely reorderable for INNER JOINS but cannot be re-ordered for OUTER JOINS.  Back in the "olden days" one specified outer join conditions in the WHERE clause using *= =* or *=* syntax, where the * was on the side of the operator where all rows came from.  This was deprecated many years ago when the <type> JOIN ON <conditions> syntatactic sugar was created because too many people where forgetting that they need to use the appropriate *<operator> to "bind" the where condition to the appropriate outer join binding, then complaining that it was too difficult to remember or figure out where a plain "WHERE" clause was needed and where a "OUTER JOIN" bound condition was required, resulting in many calls to SQL Database support lines.  And these same vendors also happen to be who write the specs, so they promptly changed the spec to eliminate the support calls.

So the *<operator>* format was written out of the standard and the OUTER JOIN ON became mandatory ONLY FOR OUTER JOIN operations.  In all other cases the JOIN ON syntax was just syntactic sugar for the old "list of tables" and where clause.  Don't recall exactly when this occurred but it was about two decades ago.  ( I remember it well because there were many queries that could not be expressed in the new-fangled format )

So really, adding conditions to apply to an OUTER JOIN to the where clause is equivalent to the common error of yester-decades of forgetting the *.

OLD SYNTAX:

select *
  from a, b
 where a.a *= b.a

was replaced by

select *
  from a LEFT JOIN b ON a.a = b.a

Of course, the old syntax allowed one to specify algebraic conditions that can no longer be expressed with the new syntax.  But that is OK, it is easier and simpler to say:  go RTFM, it is jolly clear, and if you don't like it then boo-hoo on you.  A good implementation was destroyed.  Live with it, love it, and get over it.

I would sincerely doubt that there is *any* SQL optimizer or query planner that can optimize mis-spoken queries containing OUTER JOINS.  Not even DB2's exhaustive search query planner/optimizer can do it, and it is quite possibly now one of the best in existence if you tell it that it can take unlimited time and resources to generate (and run) the plan.  All others pale in comparison.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Dinu
Keith Medcalf wrote
> but cannot be re-ordered for OUTER JOINS.

Actually, I think order requirements (or rather, row grouping requirements,
as far as I can paint it the requirement is just that all outer joined rows
come in a bunch for each main join tuple) would not be violated if the join
is made on an unique key left-side and an index is used right-side :) or
something similar. I don't know, extensive algebra must be involved :) Even
without index order inference, the main trunk keys can be sorted in a temp
structure to preserve the condition, like in a GROUP BY query. However, I do
see that SQLite seems to actually do it (scan b before a I mean) if I
reqrite the query as I showed.

select *
  from a, b
 where a.a *= b.a

was replaced by

select *
  from a LEFT JOIN b ON a.a = b.a
</>
Right, right, and with this in mind you can see my problem with the query is
so easy to understand:

My query, on the old format, is:

select *
from a, b
where
  a.a *= b.a AND
  b.c = 5

My "improved" query, on the old format:

select *
from a, b
where
  a.a *= b.a AND
  b.c *= 5 AND
  b.c = 5 // (OR IS NOT NULL)

You can see the b.c *= 5 (JOIN ON ... AND b.c=5 ... WHERE b.c IS NOT NULL)
is redundant, because it's just a weaker predicate, and I needed to add it
just as an index hint on the join loop to trigger the right execution plan.


Keith Medcalf wrote
> I would sincerely doubt that there is *any* SQL optimizer or query planner
> that can optimize mis-spoken queries containing OUTER JOINS.

I don't know about mis-spoken, I don't think anything is mis-spoken.
This app is running fine on Maria, I'm in the process of porting in to
SQLite. I wouldn't have picked on this query unless it was lagging behind
orders of magnitude (2.5s vs 50ms). So I think Maria does it (I haven't
bothered to check the execution plan there, went straight to hacking
SQLite).



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Dinu
To reiterate, Keith: to get the query to execute properly, I didn't change
the LEFT JOIN to an INNER JOIN!
Nope,
I rewrote

SELECT
FROM
  a
  LEFT JOIN b ON <key>
WHERE b.c=5

to

SELECT
FROM
  a
  LEFT JOIN b ON <key> AND b.c=5
WHERE b.c IS NOT NULL

So I just added a redundant predicate and it runs perfectly, on SQLite!
That's why I said this simple improvement can surely be taken care of on the
optimizer, while the larger discussion of actually changing the outer join
to an inner join or even tackling the -OR- case is for sure something nice
to think of, but increasingly more complicated.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12