LEFT JOIN + WHERE / OR optimisation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Keith Medcalf
>To reiterate, Keith: to get the query to execute properly, I didn't
>change
>the LEFT JOIN to an INNER JOIN!
>Nope,
>I rewrote
>
>SELECT
>FROM
>  a
>  LEFT JOIN b ON <key>
>WHERE b.c=5

This means:

Take all rows of table a
match with rows of table b on the key and where there is no match "fill" columns that would have come from b with a NULL
AND THEN AFTER THAT
return all rows where b.c = 5


>to

>SELECT
>FROM
>  a
>  LEFT JOIN b ON <key> AND b.c=5
>WHERE b.c IS NOT NULL

This translates to:
Take all rows of table a
match with rows of table b on the key AND where b.c = 5 and where there is no match "fill" columns that would have come from b with a NULL
AND THEN AFTER THAT
return only rows where b.c is not NULL

The primary difference is likely the test on b.c.  You see, to check if b.c is a particular value requires unpacking the row and retrieving the value.  Checking whether b.c IS NULL does not require unpacking the data but merely reading the 'datatype', which should be a much faster operation.

If you wrote the query as an inner join it would be much faster since it would not be possible after projection for b.c to be anything other than not null.

>So I just added a redundant predicate and it runs perfectly, on
>SQLite!

>That's why I said this simple improvement can surely be taken care of
>on the
>optimizer, while the larger discussion of actually changing the outer
>join
>to an inner join or even tackling the -OR- case is for sure something
>nice
>to think of, but increasingly more complicated.

I think the query is ill-conceived when written as an outer join.  You are asking the query to "project" a result containing a crap-load of rows which you will subsequently remove from the result set.  For bitty-bases (a couple of hundred rows) this is irrelevant.  However, for tables containing millions of rows, creating a googillion of rows then culling out all of them except 4 is a very bad use of computer resources.  Why generate them in the first place if you are just going to discard them?

I have seen many a mis-designed program which works blazingly quickly on a "test" database that contains a dozen customers, half-a-dozen products, with two or three components each.  But when populated with the production data of 1,000,000 customers, 500,000 products, and between 100 and 10,000 components per product frankly take "years" to run any step and are completely unuseable.

One should always design as if each table contained a googillion rows.  If it is fast enough then, it will certainly be efficient enough to handle your ten row per table database.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Dinu
Keith Medcalf wrote
> The primary difference is likely the test on b.c.  


Keith Medcalf wrote
> I think the query is ill-conceived when written as an outer join.

If anything, to reflect intentionality the query should have been written as
such:

SELECT FROM (
  SELECT FROM a LEFT JOIN b
) WHERE b.c=5




--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Dinu
In reply to this post by Keith Medcalf
Keith Medcalf wrote
> The primary difference is likely the test on b.c.  

/There was some text here that seems to have been lost and my browser opened
a "super resume templates" page... maybe I've been already spectre-hacked?
Anyway, I repost:/

The primary difference from what I see is that it changes the query plan to
start off with an index-based search on b, not a search (scan) on a. To me,
this can be due to the supplimentary condition in the ON clause making this
plan more appealing to the optimizer. So with this in mind, an easy
optimisation would be to just "permeate" any imperative non-null condition
in the where clause to the on clause. I don't know if the execution plan is
identical to that of an inner join (I never tried to actually convert it to
an inner join and I'm working on sth else atm), but it's satisfactory and it
starts off the right way, with a search on b, with an execution time well in
the expected margins.


Keith Medcalf wrote
> I think the query is ill-conceived when written as an outer join.

If anything, to reflect intentionality the query should have been written as
such:

SELECT FROM (
  SELECT FROM a LEFT JOIN b
) WHERE b.c=5




--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT JOIN + WHERE / OR optimisation

Dinu
In reply to this post by Keith Medcalf
Another reason to do at least the LEFT JOIN + WHERE -> INNER JOIN
optimisation:
If a view is based on a LEFT JOIN, running a WHERE query on it will exhibit
the same poor behavior and here there will be no way to rewrite the query.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12