LEFT joins affect query plan for semantically equal queries

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

LEFT joins affect query plan for semantically equal queries

vitalif
Hi!

After playing a little with SQLite as a DBMS for Bugzilla, I've
discovered that LEFT/INNER join affects query plan in a bad way even for
semantically equal queries:

SELECT * FROM bugs b INNER JOIN profiles p ON p.userid=b.assigned_to
WHERE p.login_name='[hidden email]'

Query plan:
SEARCH TABLE profiles AS p USING INDEX profiles_login_name_idx
(login_name=?)
   SEARCH TABLE bugs AS b USING INDEX bugs_assigned_to_idx
(assigned_to=?)

But

SELECT * FROM bugs b LEFT JOIN profiles p ON p.userid=b.assigned_to
WHERE p.login_name='[hidden email]'

Query plan:
SCAN TABLE bugs AS b
   SEARCH TABLE profiles AS p USING INTEGER PRIMARY KEY (rowid=?)

Which is of course very slow.

Maybe you'll fix it in later sqlite versions? :)

--
With best regards,
   Vitaliy Filippov
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

Simon Slavin-3

On 5 Nov 2014, at 12:13pm, [hidden email] wrote:

> Which is of course very slow.

Can you please run ANALYZE then try the plans again ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

Richard Hipp-3
In reply to this post by vitalif
On Wed, Nov 5, 2014 at 7:13 AM, <[hidden email]> wrote:

> Hi!
>
> After playing a little with SQLite as a DBMS for Bugzilla, I've discovered
> that LEFT/INNER join affects query plan in a bad way even for semantically
> equal queries:
>

I'm not sure what you mean by "semantically equal", but INNER JOIN and LEFT
JOIN do compute different answers in many cases.  And in particular, the
first query plan below (the one for INNER JOIN) will compute the wrong
answer for a LEFT JOIN.

SQLite seems to be doing the right thing here.


>
> SELECT * FROM bugs b INNER JOIN profiles p ON p.userid=b.assigned_to WHERE
> p.login_name='[hidden email]'
>
> Query plan:
> SEARCH TABLE profiles AS p USING INDEX profiles_login_name_idx
> (login_name=?)
>   SEARCH TABLE bugs AS b USING INDEX bugs_assigned_to_idx (assigned_to=?)
>
> But
>
> SELECT * FROM bugs b LEFT JOIN profiles p ON p.userid=b.assigned_to WHERE
> p.login_name='[hidden email]'
>
> Query plan:
> SCAN TABLE bugs AS b
>   SEARCH TABLE profiles AS p USING INTEGER PRIMARY KEY (rowid=?)
>
> Which is of course very slow.
>
> Maybe you'll fix it in later sqlite versions? :)
>
> --
> With best regards,
>   Vitaliy Filippov
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

Clemens Ladisch
In reply to this post by vitalif
[hidden email] wrote:
> SELECT * FROM bugs b LEFT JOIN profiles p ON p.userid=b.assigned_to WHERE p.login_name='[hidden email]'
>
> Query plan:
> SCAN TABLE bugs AS b
>   SEARCH TABLE profiles AS p USING INTEGER PRIMARY KEY (rowid=?)
>
> Which is of course very slow.
>
> Maybe you'll fix it in later sqlite versions?

To find bugs that do not have a matching profiles row, the database
always must go through the entire bugs table.

The WHERE expression then makes the outer join meaningless, but SQLite's
query optimizer does not protect you from the consequences of writing
inconsistent queries; if you don't want an outer join, omit the LEFT in
the first place.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

R Smith
In reply to this post by vitalif

On 2014/11/05 14:13, [hidden email] wrote:

> Hi!
>
> After playing a little with SQLite as a DBMS for Bugzilla, I've discovered that LEFT/INNER join affects query plan in a bad way
> even for semantically equal queries:
>
> SELECT * FROM bugs b INNER JOIN profiles p ON p.userid=b.assigned_to WHERE p.login_name='[hidden email]'
>
> Query plan:
> SEARCH TABLE profiles AS p USING INDEX profiles_login_name_idx (login_name=?)
>   SEARCH TABLE bugs AS b USING INDEX bugs_assigned_to_idx (assigned_to=?)
>
> But
>
> SELECT * FROM bugs b LEFT JOIN profiles p ON p.userid=b.assigned_to WHERE p.login_name='[hidden email]'
>
> Query plan:
> SCAN TABLE bugs AS b
>   SEARCH TABLE profiles AS p USING INTEGER PRIMARY KEY (rowid=?)
>
> Which is of course very slow.

These queries are very different, not equal in any way (semantically or otherwise), the fact that they produce the exact same answer
is simply by virtue of your WHERE clause being specifically that and your table data being being special. Drop the where clause and
they produce very different results for different table data.

To clarify - Producing the same results does not imply semantic equality, if I asked you to find me some people of a specific kind
in a specific bus, and someone else asks you to find the same type of people in the New-York downtown traffic... it "may" end up
that those just happen to be the exact same people at the end, but the methods which you employ to look for those people will be
significantly different, the result coinciding does not make the syntax semantically equal.

Telling SQL (of any flavour) to "Left join" is asking for a very different method of computation than when asking it to "inner join"
and it should use the best tools for the request, it cannot see into the future to know the result will be fine if it stealthily
just does the other method. (Though Analyze might help it to see a bit clearer and take that decision differently).

Why would you use left join for that? Maybe this is just a really small extract from the real query you are doing and the real one
would shed more light?

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

Richard Hipp-3
In reply to this post by Clemens Ladisch
On Wed, Nov 5, 2014 at 8:12 AM, Clemens Ladisch <[hidden email]> wrote:

>
> The WHERE expression then makes the outer join meaningless,
>

Thank you, Clemens - I missed that detail.

So the suggestion is that we should enhance the SQLite query planner to
detect when the WHERE clause requires some column in the right-hand table
of a LEFT JOIN to be non-NULL and then promote that join to a normal INNER
JOIN.  That seems like a reasonable enhancement request which we will take
under advisement.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

David Woodhouse
In reply to this post by R Smith
On Wed, 2014-11-05 at 15:13 +0200, RSmith wrote:

> On 2014/11/05 14:13, [hidden email] wrote:
> > Hi!
> >
> > After playing a little with SQLite as a DBMS for Bugzilla, I've discovered that LEFT/INNER join affects query plan in a bad way
> > even for semantically equal queries:
> >
> > SELECT * FROM bugs b INNER JOIN profiles p ON p.userid=b.assigned_to WHERE p.login_name='[hidden email]'
> >
> > Query plan:
> > SEARCH TABLE profiles AS p USING INDEX profiles_login_name_idx (login_name=?)
> >   SEARCH TABLE bugs AS b USING INDEX bugs_assigned_to_idx (assigned_to=?)
> >
> > But
> >
> > SELECT * FROM bugs b LEFT JOIN profiles p ON p.userid=b.assigned_to WHERE p.login_name='[hidden email]'
> >
> > Query plan:
> > SCAN TABLE bugs AS b
> >   SEARCH TABLE profiles AS p USING INTEGER PRIMARY KEY (rowid=?)
> >
> > Which is of course very slow.
>
> These queries are very different, not equal in any way (semantically
> or otherwise), the fact that they produce the exact same answer
> is simply by virtue of your WHERE clause being specifically that and
> your table data being being special. Drop the where clause and
> they produce very different results for different table data.
I don't think it's anything to do with the table data being special, is
it?

Isn't it generically true that for any LEFT JOIN of a,b WHERE b.anything
IS NOT NULL, the results are going to be equal with an INNER JOIN?

> Why would you use left join for that? Maybe this is just a really
> small extract from the real query you are doing and the real one
> would shed more light?

In some ways this looks very similar to the optimisation I was asking
about a few weeks ago. The query is coming from the user, and is being
"translated" into SQL from whatever the user puts into the actual UI. We
normally don't make users type SQL into a text box :)

In this specific query the filter criteria just *happen* to exclude
unmatched results from the profiles tables and thus it turns out we
*could* use an INNER JOIN and get the same results.

You are right, we *could* make a query planner of our own, and squeeze
it in between the simple UI->SQL translation and the SQL database. It
could spot these optimisations and automatically change the type of join
where it sees that it could.

Or we could hope that the SQL database has a query planner of its own
which can do such optimisations... :)

--
dwmw2

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

R Smith

On 2014/11/05 15:26, David Woodhouse wrote:
> On Wed, 2014-11-05 at 15:13 +0200, RSmith wrote:
> I don't think it's anything to do with the table data being special, is it? Isn't it generically true that for any LEFT JOIN of
> a,b WHERE b.anything IS NOT NULL, the results are going to be equal with an INNER JOIN?

Yes, I was simply pointing to the fact that if indeed you had NULL values and omit the specific WHERE clause the results will depend
on your table. You also need to look through the entire A table in a Left join because NULLs in the B table does not disqualify
them, unless of course there is specifically either a NOT NULL or ColVal='Someting Specific'. I am now not sure if these would be
the only set of cases for the optimisation, but they are the only ones coming to mind currently.

> Or we could hope that the SQL database has a query planner of its own which can do such optimisations... :)

Quite a reasonable request if the optimisation can be boiled down to a specific always working set of rules. As I mention above, the
ones coming to mind is that the WHERE clause is simple and specifically excludes NULLs. Even if NULLs are not specifically excluded,
the column schema might include "NOT NULL" for that column which may also incur the optimisation (if it doesn't already), or even
have the sqlite_statx analyze point out column that do not contain any nulls, or rather not as that may change. I know there is an
optimisation step in SQLite NGQP that promotes ON clauses to WHERE clause statements (except for Left Joins for this reason), though
more complex queries might have some caveats which eludes me currently. Also if several left-joins exist in the same query, a light
implementation of this optimisation might influence the truth of the result (and of course I am not implying that the implementation
would be light, just that the optimisation might need a lot of work). It also doesn't affect people who know when to not use a left
join, but that is never a reason not to implement an obvious optimisation (as your use case illustrates).

In fact this may allow much wider use of Left-joins which is always the preferred join method system-side because it doesn't hide
missing/unlinked items. Ok, I convinced myself, +1 to the request. :)

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

David Woodhouse
On Wed, 2014-11-05 at 16:00 +0200, RSmith wrote:

> On 2014/11/05 15:26, David Woodhouse wrote:
> > On Wed, 2014-11-05 at 15:13 +0200, RSmith wrote:
> > I don't think it's anything to do with the table data being special,
> is it? Isn't it generically true that for any LEFT JOIN of
> > a,b WHERE b.anything IS NOT NULL, the results are going to be equal
> with an INNER JOIN?
>
> Yes, I was simply pointing to the fact that if indeed you had NULL
> values and omit the specific WHERE clause the results will depend
> on your table.
Right. The proposed optimisation does fundamentally depend on having a
WHERE clause which discards all the additional records which the LEFT
JOIN includes, that the INNER JOIN would not.

Put simply, you can only change LEFT JOIN to INNER JOIN if you're going
to throw away the records from the output which make them different.

The optimiser's detection of such a WHERE clause could actually start
off being relatively simplistic (perhaps only notice that it can make
this optimisation if it sees an explicit IS NOT NULL on a column from
the right-hand table), and could grow to recognise more cases over time
as required. False negatives don't really hurt; they're just a missed
optimisation.

For example, I don't think anyone would be losing sleep if it initially
failed to spot that 'WHERE b.foo IS NOT NULL OR b.bar IS NOT NULL' also
means the optimisation could kick in. Or 'WHERE a.foo = b.bar' when
a.foo is actually a "NOT NULL" field.

> In fact this may allow much wider use of Left-joins which is always
> the preferred join method system-side because it doesn't hide
> missing/unlinked items. Ok, I convinced myself, +1 to the request. :)

If this is going to end up in the tracker, could I also request that the
optimisations from here be tracked too, please:
 https://www.mail-archive.com/sqlite-users@.../msg86350.html
 https://www.mail-archive.com/sqlite-users@.../msg86643.html


--
dwmw2

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

Keith Medcalf
In reply to this post by vitalif
On Wednesday, 5 November, 2014 05:14, [hidden email] said:

>After playing a little with SQLite as a DBMS for Bugzilla, I've
>discovered that LEFT/INNER join affects query plan in a bad way even for
>semantically equal queries:

>SELECT * FROM bugs b INNER JOIN profiles p ON p.userid=b.assigned_to
>WHERE p.login_name='[hidden email]'

>Query plan:
>SEARCH TABLE profiles AS p USING INDEX profiles_login_name_idx
>(login_name=?)
>   SEARCH TABLE bugs AS b USING INDEX bugs_assigned_to_idx
>(assigned_to=?)

>But

>SELECT * FROM bugs b LEFT JOIN profiles p ON p.userid=b.assigned_to
>WHERE p.login_name='[hidden email]'

>Query plan:
>SCAN TABLE bugs AS b
>   SEARCH TABLE profiles AS p USING INTEGER PRIMARY KEY (rowid=?)

>Which is of course very slow.

The two queries are different.  They may end up with the same result, but you are asking different questions.  In the first you are returning only matching rows.  In the later you are requesting a projection (outer join) then applying the conditional (where) to the projection.  An index on profiles.userid will speed it up, but you are still asking a different question.  You cannot change the table visitation order around a left outer join because it changes the question.

You ought to obtain the same result with

SELECT *
  FROM bugs b CROSS JOIN profiles p
 WHERE p.userid=b.assigned_to
   AND p.login_name='[hidden email]'

as well, it is the same inner join.  However you have precluded the optimizer from re-ordering the tables in the join by using the cross join keyword, so the plan will be the same as the second (left outer join) rather than the first (equijoin) query.

>Maybe you'll fix it in later sqlite versions? :)

---
Theory is when you know everything but nothing works.  Practice is when everything works but no one knows why.  Sometimes theory and practice are combined:  nothing works and no one knows why.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

vitalif
In reply to this post by Simon Slavin-3
> Can you please run ANALYZE then try the plans again ?

This was just after running ANALYZE :)

> the fact that they produce the exact same answer is simply by virtue of  
> your WHERE clause being specifically that

Of course, I understand, that's what I've meant - the plan shouldn't  
differ for this specific query with this specific condition...

> Why would you use left join for that? Maybe this is just a really small  
> extract from the real query you are doing and the real one would shed  
> more light?

This is an autogenerated query, the conditions are user-specified, so I  
must analyze them to determine whether to use INNER or LEFT join in each  
case if SQLite does not do that. In fact that's what I've implemented at  
once in my fork of bugzilla (bugzilla4intranet) to optimise for SQLite,  
and in most cases it works.

But I think SQLite could do this kind of optimisation better inside  
itself, and most (if not all) client-server DBMSes (even MySQL) actually  
do it - their plans don't differ for the same query.

> Even if NULLs are not specifically excluded, the column schema might  
> include "NOT NULL" for that column which may also incur the optimisation

No-no, even if a column is NOT NULL in an outer-joined table, it may still  
be null in the result if no matching row is found.

So the condition is simple: turn outer join into an inner if the WHERE  
clause excludes NULLs.

--
With best regards,
   Vitaliy Filippov
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

James K. Lowden
In reply to this post by Keith Medcalf
On Wed, 05 Nov 2014 08:24:47 -0700
"Keith Medcalf" <[hidden email]> wrote:

> The two queries are different.  They may end up with the same result,
> but you are asking different questions.  In the first you are
> returning only matching rows.  In the later you are requesting a
> projection (outer join) then applying the conditional (where) to the
> projection.  

I'm surprised to see you say that, Keith, because it's not true.  

An SQL query is one piece, integral and whole.  There's no
before & after, no first this then that. Projection and selection are
independent operations and can be applied in either order, at the
discretion of the implementation. The only requirement, as you know, is
that the criteria be met.

> An index on profiles.userid will speed it up, but you are still
> asking a different question.  You cannot change the table visitation
> order around a left outer join because it changes the question.

Actually, he can't change the visitation order because he doesn't
*control* the visitation order.  That's up to SQLite.  He can only ask a
question.  

Equivalent to an outer join is a union.  Let's look at it that way
(simplified slightly for clarity):

>SELECT * FROM bugs b LEFT JOIN profiles p ON p.userid=b.assigned_to
>WHERE p.login_name='[hidden email]'

        select b.*, p.login_name
        from bugs as b join profiles as p
        on p.userid=b.assigned_to
        where login_name='[hidden email]'
        UNION
        select *, NULL
        from bugs
        where assigned_to not in (select userid from profiles)
        and NULL = '[hidden email]'

Same question, differently expressed.  How much work would you suggest
the system do to answer the second part of that query?  ;-)  

--jkl


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: LEFT joins affect query plan for semantically equal queries

Keith Medcalf
On Wednesday, 5 November, 2014 22:23, James Lowden said:
>On Wed, 05 Nov 2014 08:24:47 -0700, "Keith Medcalf" <[hidden email]> wrote:

>> The two queries are different.  They may end up with the same result,
>> but you are asking different questions.  In the first you are
>> returning only matching rows.  In the later you are requesting a
>> projection (outer join) then applying the conditional (where) to the
>> projection.

>I'm surprised to see you say that, Keith, because it's not true.

>An SQL query is one piece, integral and whole.  There's no
>before & after, no first this then that. Projection and selection are
>independent operations and can be applied in either order, at the
>discretion of the implementation. The only requirement, as you know, is
>that the criteria be met.

You are, or course, entirely correct.  Premature optimization based on known implementation details is hard to avoid though.

>> An index on profiles.userid will speed it up, but you are still
>> asking a different question.  You cannot change the table visitation
>> order around a left outer join because it changes the question.

>Actually, he can't change the visitation order because he doesn't
>*control* the visitation order.  That's up to SQLite.  He can only ask a
>question.

Again, that is true.

>Equivalent to an outer join is a union.  Let's look at it that way
>(simplified slightly for clarity):

>>SELECT * FROM bugs b LEFT JOIN profiles p ON p.userid=b.assigned_to
>>WHERE p.login_name='[hidden email]'
>
> select b.*, p.login_name
> from bugs as b join profiles as p
> on p.userid=b.assigned_to
> where login_name='[hidden email]'
> UNION
> select *, NULL
> from bugs
> where assigned_to not in (select userid from profiles)
> and NULL = '[hidden email]'
>
>Same question, differently expressed.  How much work would you suggest
>the system do to answer the second part of that query?  ;-)

Hehehe.  None because it cannot possibly have any results.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users