Join performance in SQLite

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Join performance in SQLite

D. Richard Hipp
There has been a recent flurry of comments about SQLite at

     http://www.reddit.com/r/programming/comments/8oed5/how_sqlite_is_tested/
     http://news.ycombinator.com/item?id=633151

One of the criticisms of SQLite is that it is slow to do joins.  That  
is true if SQLite is unable to figure out how to use an index to speed  
the join.  I was under the impression that SQLite actually did a  
fairly reasonable job of making use of indices, if they exist.  But  
without indices, an k-way join takes time proportional to N^k.

Do other SQL database engines not have this same limitation?  Are  
MySQL and PostgreSQL and Firebird and MS-SQL and Oracle creating  
phantom indices on-the-fly to help them do joins faster, for example?  
Or do their optimizers do a better job of finding ways to use indices  
in a join?  Can somebody supply me with specific examples of joins  
that other database engines do efficiently but that SQLite does  
slowly?  Is join efficiency really a frustration to many SQLite users?

Curiously, in some of our own internal tests, SQLite is much, much  
faster than MS-SQL, MySQL, and PostgreSQL for k-way joins where k is  
large - greater than 20 or 30.  (SQLite can handle up to a 64-way  
join.)  This is because SQLite uses a O(k*k) greedy algorithm for  
selecting the ordering of tables in the join whereas the other guys  
all do a much more extensive search.  So the performance loss in the  
other engines is due to the excessive time spent in the query planner,  
not the time actually running the query.  SQLite can plan a 64-way  
join in the blink of an eye, whereas PostgreSQL requires several  
minutes.

But for the tests described in the previous paragraph, there were  
always good indices so that the time to actually run the join was  
approximately linear.  What about situations where you have a 4- or 5-
way join on tables that are not indexed?  Do other database engines  
handle those more efficiently than SQLite somehow?  Is this something  
we need to look into?

D. Richard Hipp
[hidden email]



_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Join performance in SQLite

Pavel Ivanov-2
> Do other SQL database engines not have this same limitation?  Are
> MySQL and PostgreSQL and Firebird and MS-SQL and Oracle creating
> phantom indices on-the-fly to help them do joins faster, for example?

Sort of. There's 2 types of join methods in Oracle for this - Hash
joins and Sort merge joins - when server creates in memory (or in
temporary storage in general) sorted data for one or both merging data
sets and then merges these sets. You can read about it here:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i51523.
Not sure though if it's worth to implement such technique in SQLite.

Pavel

On Sat, May 30, 2009 at 11:11 AM, D. Richard Hipp <[hidden email]> wrote:

> There has been a recent flurry of comments about SQLite at
>
>     http://www.reddit.com/r/programming/comments/8oed5/how_sqlite_is_tested/
>     http://news.ycombinator.com/item?id=633151
>
> One of the criticisms of SQLite is that it is slow to do joins.  That
> is true if SQLite is unable to figure out how to use an index to speed
> the join.  I was under the impression that SQLite actually did a
> fairly reasonable job of making use of indices, if they exist.  But
> without indices, an k-way join takes time proportional to N^k.
>
> Do other SQL database engines not have this same limitation?  Are
> MySQL and PostgreSQL and Firebird and MS-SQL and Oracle creating
> phantom indices on-the-fly to help them do joins faster, for example?
> Or do their optimizers do a better job of finding ways to use indices
> in a join?  Can somebody supply me with specific examples of joins
> that other database engines do efficiently but that SQLite does
> slowly?  Is join efficiency really a frustration to many SQLite users?
>
> Curiously, in some of our own internal tests, SQLite is much, much
> faster than MS-SQL, MySQL, and PostgreSQL for k-way joins where k is
> large - greater than 20 or 30.  (SQLite can handle up to a 64-way
> join.)  This is because SQLite uses a O(k*k) greedy algorithm for
> selecting the ordering of tables in the join whereas the other guys
> all do a much more extensive search.  So the performance loss in the
> other engines is due to the excessive time spent in the query planner,
> not the time actually running the query.  SQLite can plan a 64-way
> join in the blink of an eye, whereas PostgreSQL requires several
> minutes.
>
> But for the tests described in the previous paragraph, there were
> always good indices so that the time to actually run the join was
> approximately linear.  What about situations where you have a 4- or 5-
> way join on tables that are not indexed?  Do other database engines
> handle those more efficiently than SQLite somehow?  Is this something
> we need to look into?
>
> D. Richard Hipp
> [hidden email]
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Join performance in SQLite

Mark Hamburg-2
Assuming memory is sufficiently inexpensive, I would think that it  
would almost always be useful to build an index for any field in a  
join rather than doing a full scan. (Or better yet, build a hash table  
if memory is sufficient.) Indices maintained in the database then  
become optimizations to avoid starting the query with an index build.

Mark

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Join performance in SQLite

John Elrick-2
In reply to this post by D. Richard Hipp
D. Richard Hipp wrote:

> There has been a recent flurry of comments about SQLite at
>
>      http://www.reddit.com/r/programming/comments/8oed5/how_sqlite_is_tested/
>      http://news.ycombinator.com/item?id=633151
>
> One of the criticisms of SQLite is that it is slow to do joins.  That  
> is true if SQLite is unable to figure out how to use an index to speed  
> the join.  I was under the impression that SQLite actually did a  
> fairly reasonable job of making use of indices, if they exist.  But  
> without indices, an k-way join takes time proportional to N^k.
>  

We're finishing a system which produces auto generated queries where k
can potentially range from 1 to 16.  The only times I have seen Sqlite
slowdown were when the indexes needed tweaking.

I noticed one of the comments made was that a three inner join query
took 10 minutes and that the joins should have filtered the table,
however, I did not notice any example, contrived or obfuscated, which
would demonstrate the issue.  This absence means we must rely upon the
author's interpretation of what the query did being accurate.  The
author makes the assertion that each progressive inner join should make
the search table smaller, however, just because that was the intent does
not make it the reality.  The fact that temp tables with no apparent
indexing ran faster makes me question whether or not the initial
assumption was true.

I've been wrong far too often about the actual vs theoretical of my code
in operation to accept anything such as this at face value; however,
that caveat would apply from both directions.  The author said he
reported it as a bug, which implies he presented a repeatable test
case.  If that is so, that specific test case might merit further
examination.

FWIW


John
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Join performance in SQLite

Simon Slavin-2
I'm interested in how sqlite works differently to the SQL systems  
which keep a daemon running as a background task.  One of the  
advantages of having a daemon which persists between runs of an  
application is that the daemon can keep its own list of ORDERs, and  
JOINs which are asked for frequently, and decide to maintain them even  
when no SQL-using application is running.  This can give the  
impression that something is being done very quickly, when in fact the  
majority of the time was taken during a previous run of the  
application.  It can be particularly hard to figure out what a  
performance test means under these circumstances.

But the problem is that I like the way sqlite works.  I like the tiny  
library, I like the way that the SQL library is entirely inside my  
application, and any CPU load is mine.  I like knowing that when my  
app quits, nothing is going on.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Join performance in SQLite

Jim Wilcoxson
SQLite has surprised me with its quick performance, not the other way
around.  In fact, I've implemented all kinds of lookup queries that I
knew could be optimized by caching results so I didn't have to keep
repeating the SQL query, but the performance was so good even
repeating the queries that I never bothered with the caching.

I'm sure there are queries that SQLite doesn't run as fast as database
product X, and I'm sure it goes the other way too.  It's a balancing
act, and as the primary developer, you have to choose for us what's
important to optimize and what isn't.

So far, I'm very happy with the choices and trade-offs that have been
made in SQLite. :-)

Jim

On 5/30/09, Simon Slavin <[hidden email]> wrote:

> I'm interested in how sqlite works differently to the SQL systems
> which keep a daemon running as a background task.  One of the
> advantages of having a daemon which persists between runs of an
> application is that the daemon can keep its own list of ORDERs, and
> JOINs which are asked for frequently, and decide to maintain them even
> when no SQL-using application is running.  This can give the
> impression that something is being done very quickly, when in fact the
> majority of the time was taken during a previous run of the
> application.  It can be particularly hard to figure out what a
> performance test means under these circumstances.
>
> But the problem is that I like the way sqlite works.  I like the tiny
> library, I like the way that the SQL library is entirely inside my
> application, and any CPU load is mine.  I like knowing that when my
> app quits, nothing is going on.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


--
Software first.  Software lasts!
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Join performance in SQLite

Nicolas Williams
In reply to this post by Simon Slavin-2
On Sat, May 30, 2009 at 07:01:31PM +0100, Simon Slavin wrote:
> I'm interested in how sqlite works differently to the SQL systems  
> which keep a daemon running as a background task.  One of the  
> advantages of having a daemon which persists between runs of an  
> application is that the daemon can keep its own list of ORDERs, and  
> JOINs which are asked for frequently, and decide to maintain them even  
> when no SQL-using application is running.  [...]

You don't need a daemon to do that.  One could use a special table in
the database itself (much like the master table) to keep statistics
about all sorts of things.

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Join performance in SQLite

Florian Weimer
In reply to this post by D. Richard Hipp
* D. Richard Hipp:

> One of the criticisms of SQLite is that it is slow to do joins.  That  
> is true if SQLite is unable to figure out how to use an index to speed  
> the join.  I was under the impression that SQLite actually did a  
> fairly reasonable job of making use of indices, if they exist.  But  
> without indices, an k-way join takes time proportional to N^k.
>
> Do other SQL database engines not have this same limitation?  Are  
> MySQL and PostgreSQL and Firebird and MS-SQL and Oracle creating  
> phantom indices on-the-fly to help them do joins faster, for example?  

PostgreSQL roughly does one of the following (when dealing with a
two-way join):

  * If one side of the join is estimated to be a small set, PostgreSQL
    performs a sequential scan on it, hashes it, and joins the other
    table in a hash join.

  * If both sides are large, each side is sorted, and a merge join is
    performed.

Things go horribly wrong if the estimates are off and the wrong plan
is picked.

There's also a nested loop join (which would be what SQLite does), but
I haven't seen it in recent version.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Join performance in SQLite

Thomas Briggs-2
In reply to this post by D. Richard Hipp
   As others have already mentioned, hash joins can help in a
situation where there are no appropriate indexes.  They can make
things worse if the inputs aren't large enough though, so there's
still some gray area.

   The biggest thing that other databases have going for them - MSSQL
and Oracle at least - is parallelism.  If you've got 8 or 16 or 32
threads available to you, and plenty of RAM to boot, it's often faster
to ignore the indexes and either hash join or nested loop join subsets
of the affected tables.  Thus situations where there are no indexes
seem better too, and SQLite can look bad in comparison.  'tis the
price paid for being a zero-config embedded database vs. a full-blown
client/server database system, that's all.

   -T

On Sat, May 30, 2009 at 11:11 AM, D. Richard Hipp <[hidden email]> wrote:

> There has been a recent flurry of comments about SQLite at
>
>     http://www.reddit.com/r/programming/comments/8oed5/how_sqlite_is_tested/
>     http://news.ycombinator.com/item?id=633151
>
> One of the criticisms of SQLite is that it is slow to do joins.  That
> is true if SQLite is unable to figure out how to use an index to speed
> the join.  I was under the impression that SQLite actually did a
> fairly reasonable job of making use of indices, if they exist.  But
> without indices, an k-way join takes time proportional to N^k.
>
> Do other SQL database engines not have this same limitation?  Are
> MySQL and PostgreSQL and Firebird and MS-SQL and Oracle creating
> phantom indices on-the-fly to help them do joins faster, for example?
> Or do their optimizers do a better job of finding ways to use indices
> in a join?  Can somebody supply me with specific examples of joins
> that other database engines do efficiently but that SQLite does
> slowly?  Is join efficiency really a frustration to many SQLite users?
>
> Curiously, in some of our own internal tests, SQLite is much, much
> faster than MS-SQL, MySQL, and PostgreSQL for k-way joins where k is
> large - greater than 20 or 30.  (SQLite can handle up to a 64-way
> join.)  This is because SQLite uses a O(k*k) greedy algorithm for
> selecting the ordering of tables in the join whereas the other guys
> all do a much more extensive search.  So the performance loss in the
> other engines is due to the excessive time spent in the query planner,
> not the time actually running the query.  SQLite can plan a 64-way
> join in the blink of an eye, whereas PostgreSQL requires several
> minutes.
>
> But for the tests described in the previous paragraph, there were
> always good indices so that the time to actually run the join was
> approximately linear.  What about situations where you have a 4- or 5-
> way join on tables that are not indexed?  Do other database engines
> handle those more efficiently than SQLite somehow?  Is this something
> we need to look into?
>
> D. Richard Hipp
> [hidden email]
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Join performance in SQLite

Wiktor Adamski
In reply to this post by D. Richard Hipp

> Do other SQL database engines not have this same limitation?" Are MySQL
> and PostgreSQL and Firebird and MS-SQL and Oracle creating phantom
> indices on-the-fly to help them do joins faster, for example?" Or do
> their optimizers do a better job of finding ways to use indices in a
> join?" Can somebody supply me with specific examples of joins that other
> database engines do efficiently but that SQLite does slowly?"
 
Acoording to SQLite wiki other databases do better job without indices:
"
Test 6: INNER JOIN without an index
SELECT t1.a FROM t1 INNER JOIN t2 ON t1.b=t2.b;
SQLite 3.3.3 (sync):    14.473
SQLite 3.3.3 (nosync):  14.445
SQLite 2.8.17 (sync):   47.776
SQLite 2.8.17 (nosync): 47.750
PostgreSQL 8.1.2:       0.176
MySQL 5.0.18 (sync):    3.421
MySQL 5.0.18 (nosync):  3.443
FirebirdSQL 1.5.2:      0.141
"

> Is join
> efficiency really a frustration to many SQLite users?

Generally not, however the behaviour could be more user friendly. The way is I use SQLite is probably not common becase I don't write queries - apllication's users write them. I also deal with quite large data. The biggest problem with the way joins work is with subqueries. If flattening cannot be done the query runs slow. For example I was told by an user that joins on views are really slow (on large data it means that doesn't work at all).
The are other minor problems:
1. Creating indices on every (possibly very large) table makes database file much bigger that it would be if SQLite used temporary indices created before query is run.
2. Database users need to know how exaclty how SQLite work. That is not problem if programmer write queries, but can be a problem if database is used by a mathematician who doesn't really care about it and simply wants to do some calculations.

----------------------------------------------------------------------
Chcesz miec nawigacje GPS ?
Zamow lub przedluz umowe na neostrade, a nawigacja bedzie Twoja.
Kliknij na link po szczegoly! http://link.interia.pl/f219a


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users