Feature request: merge joins (preferably through a hint)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Feature request: merge joins (preferably through a hint)

Davor Josipovic
Are there any plans to implement merge joins in sqlite? As far as I am aware, only nested loops are currently supported.

Merge joins could be an incredible optimization in some cases for large queries and would make sqlite much faster in such cases.

Personally, I would like to have this option rather as a sql HINT, than as an optimizer option, since the optimizer now, small and efficient as it is, does a great job. The merge join HINT could be used to greatly optimize specific queries - i.e. queries where poking the index many many times on an already ordered set is inefficient.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: merge joins (preferably through a hint)

Richard Hipp-3
On 11/5/17, Davor Josipovic <[hidden email]> wrote:
> Merge joins could be an incredible optimization in some cases for large
> queries and would make sqlite much faster in such cases.

SQLite does do a merge in some cases, though not for what you would
traditionally call a join.  For example, SQLite will do a merge to
combine the two halves of this query:

    SELECT a,b,c FROM tab1 UNION SELECT x,y,z FROM tab2 ORDER BY 1,2,3;

You are thinking that perhaps queries such as the following might be
faster using a merge:

    SELECT * FROM tab1 JOIN tab2 ON tab1.a=tab2.x;

I disagree.  In order to do this as a merge, we'd need indexes on both
tab1.a and tab2.x.  (In order for the merge to be practical, and not
require an arbitrary amount of auxiliary storage, both indexes would
need to be UNIQUE.)  But if you already either one of those two
indexes, then the nested-loop join will already be blazing fast.  It
is difficult to see how switching to a merge join would make it go any
faster.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: merge joins (preferably through a hint)

Simon Slavin-3


On 5 Nov 2017, at 11:04am, Richard Hipp <[hidden email]> wrote:

> SQLite does do a merge in some cases, though not for what you would
> traditionally call a join.  For example, SQLite will do a merge to
> combine the two halves of this query:
>
>    SELECT a,b,c FROM tab1 UNION SELECT x,y,z FROM tab2 ORDER BY 1,2,3;

In case it’s not clear from the above, SQL processes the UNION before the ORDER BY clause.   If "merge join" didn’t already have a definition, we could use the term for that.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: merge joins (preferably through a hint)

Davor Josipovic
In reply to this post by Davor Josipovic
>  You are thinking that perhaps queries such as the following might
> be faster using a merge:

>
>     SELECT * FROM tab1 JOIN tab2 ON tab1.a=tab2.x;

>

> I disagree.


I don't see any reason to disagree. Merge join will definitely be faster if the data is already sorted. See the reference: https://en.wikipedia.org/wiki/Sort-merge_join. It is a linear time operation.


What sqlite does now is for each "a" it searches through the index for "x". This search operation is logarithmic time. If there are many records in tab1, then this stacks and becomes quasilinear time. I experience this constantly with sqlite data wrangling and tab1 and 2 in the millions. sqlite's nested loops are very fast, but the joins _could_ be made much faster with merge joins in such situations. I just wish I had this hint...
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Feature request: merge joins (preferably through a hint)

Simon Slavin-3
On 7 Nov 2017, at 7:59am, Davor Josipovic <[hidden email]> wrote:

> What sqlite does now is for each "a" it searches through the index for "x".

If an ideal index already exists, accessing the correct records will be fast.  If one does not exist, how would you expect a merge join to be any faster ?

There are specific cases where a merge join is faster than using JOIN … ORDER BY.  For that to happen, both source tables must already have indexes ideally suited to the merge join, and the rows which you’re going to want returned must be a very large proportion of both source tables, probably the whole tables.  Also, SQLite has to be aware of those facts, it cannot simply assume them.

Except for the above cases, existing formats will be just as fast, and can be far faster, especially in cases where the rows wanted do not represent most of the rows of the existing tables.

Merge joins also represent a problem where you have to compare the two available rows.  There’s no good way to know what the programmer means by this, especially in cases involving many columns and collation methods.  Assumptions have to be made and whatever the development team picks is sure to annoy some users.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users