Why are two select statements 2000 times faster than one?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Why are two select statements 2000 times faster than one?

Steinar Midtskogen
Hello,

I have a table with "unix_time" as primary key and I want to get the
minimum and maximum values of "unix_time".  When I do:

  SELECT min(unix_time), max(unix_time) from table;

it is very slow.  It takes about 250ms, nearly everything in the
step() call.

However, if I do:

  SELECT min(unix_time) FROM table; SELECT max(unix_time) FROM table;

to get the same values, it takes a fraction of the time.  The speedup
is more than 2000x.

Why?

--
Steinar
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Marc L. Allen
Maybe the query analyzer isn't smart enough to do two seeks in this case, so it does a scan?

> -----Original Message-----
> From: [hidden email] [mailto:sqlite-users-
> [hidden email]] On Behalf Of Steinar Midtskogen
> Sent: Friday, April 13, 2012 3:00 PM
> To: [hidden email]
> Subject: [sqlite] Why are two select statements 2000 times faster than
> one?
>
> Hello,
>
> I have a table with "unix_time" as primary key and I want to get the
> minimum and maximum values of "unix_time".  When I do:
>
>   SELECT min(unix_time), max(unix_time) from table;
>
> it is very slow.  It takes about 250ms, nearly everything in the
> step() call.
>
> However, if I do:
>
>   SELECT min(unix_time) FROM table; SELECT max(unix_time) FROM table;
>
> to get the same values, it takes a fraction of the time.  The speedup
> is more than 2000x.
>
> Why?
>
> --
> Steinar
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Alessandro Marzocchi
What does EXPLAIN QUERY PLAN says?


Il giorno 13 aprile 2012 21:04, Marc L. Allen
<[hidden email]>ha scritto:

> Maybe the query analyzer isn't smart enough to do two seeks in this case,
> so it does a scan?
>
> > -----Original Message-----
> > From: [hidden email] [mailto:sqlite-users-
> > [hidden email]] On Behalf Of Steinar Midtskogen
> > Sent: Friday, April 13, 2012 3:00 PM
> > To: [hidden email]
> > Subject: [sqlite] Why are two select statements 2000 times faster than
> > one?
> >
> > Hello,
> >
> > I have a table with "unix_time" as primary key and I want to get the
> > minimum and maximum values of "unix_time".  When I do:
> >
> >   SELECT min(unix_time), max(unix_time) from table;
> >
> > it is very slow.  It takes about 250ms, nearly everything in the
> > step() call.
> >
> > However, if I do:
> >
> >   SELECT min(unix_time) FROM table; SELECT max(unix_time) FROM table;
> >
> > to get the same values, it takes a fraction of the time.  The speedup
> > is more than 2000x.
> >
> > Why?
> >
> > --
> > Steinar
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Steinar Midtskogen
Alessandro Marzocchi <[hidden email]> writes:

> What does EXPLAIN QUERY PLAN says?

sqlite> EXPLAIN QUERY PLAN SELECT min(unix_time) FROM table;
0|0|0|SEARCH TABLE table USING INTEGER PRIMARY KEY (~1 rows)

sqlite> EXPLAIN QUERY PLAN SELECT max(unix_time) FROM table;
0|0|0|SEARCH TABLE table USING INTEGER PRIMARY KEY (~1 rows)

sqlite> EXPLAIN QUERY PLAN SELECT min(unix_time), max(unix_time) FROM table;
0|0|0|SCAN TABLE table (~1000000 rows)

I suppose a query for a single min/max gets optimised, while a query
involving multiple columns doesn't.

I have a much bigger table as well, and on that one the speedup is in
the millions to run two SELECTs.  It's hard to guess that there will
be such a difference, but I suppose I should be happy that there is at
least an optimised way to get min and max for the integer primary key.

--
Steinar
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Igor Tandetnik
In reply to this post by Steinar Midtskogen
On 4/13/2012 2:59 PM, Steinar Midtskogen wrote:

> I have a table with "unix_time" as primary key and I want to get the
> minimum and maximum values of "unix_time".  When I do:
>
>    SELECT min(unix_time), max(unix_time) from table;
>
> it is very slow.  It takes about 250ms, nearly everything in the
> step() call.
>
> However, if I do:
>
>    SELECT min(unix_time) FROM table; SELECT max(unix_time) FROM table;
>
> to get the same values, it takes a fraction of the time.  The speedup
> is more than 2000x.

If you want to do it with a single query (say, to minimize disturbance
to existing code), you could make it

select (SELECT min(unix_time) FROM table), (SELECT max(unix_time) FROM
table);

I'm pretty sure this will get executed the fast way.
--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Puneet Kishor-2
In reply to this post by Steinar Midtskogen
Try the following

sqlite> EXPLAIN QUERY PLAN SELECT Min(a) FROM t UNION ALL SELECT Max(a) FROM t;
selectid|order|from|detail
1|0|0|SEARCH TABLE t USING INTEGER PRIMARY KEY (~1 rows)
2|0|0|SEARCH TABLE t USING INTEGER PRIMARY KEY (~1 rows)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

Should be a lot faster than a single query without UNION.

If you want the results in separate columns, you can do something like

SELECT Min(a) minimum, 'none' maximum FROM t UNION ALL SELECT 'none' minimum, Max(a) minimum FROM t;

On Apr 13, 2012, at 2:44 PM, Steinar Midtskogen wrote:

> Alessandro Marzocchi <[hidden email]> writes:
>
>> What does EXPLAIN QUERY PLAN says?
>
> sqlite> EXPLAIN QUERY PLAN SELECT min(unix_time) FROM table;
> 0|0|0|SEARCH TABLE table USING INTEGER PRIMARY KEY (~1 rows)
>
> sqlite> EXPLAIN QUERY PLAN SELECT max(unix_time) FROM table;
> 0|0|0|SEARCH TABLE table USING INTEGER PRIMARY KEY (~1 rows)
>
> sqlite> EXPLAIN QUERY PLAN SELECT min(unix_time), max(unix_time) FROM table;
> 0|0|0|SCAN TABLE table (~1000000 rows)
>
> I suppose a query for a single min/max gets optimised, while a query
> involving multiple columns doesn't.
>
> I have a much bigger table as well, and on that one the speedup is in
> the millions to run two SELECTs.  It's hard to guess that there will
> be such a difference, but I suppose I should be happy that there is at
> least an optimised way to get min and max for the integer primary key.
>
> --
> Steinar
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Steinar Midtskogen
Puneet Kishor <[hidden email]> writes:

> If you want the results in separate columns, you can do something like
>
> SELECT Min(a) minimum, 'none' maximum FROM t UNION ALL SELECT 'none' minimum, Max(a) minimum FROM t;

Then it does a full scan again.

But Igor's suggestion "SELECT (SELECT min(unix_time) FROM table),
(SELECT max(unix_time) FROM table)" works fine, and means less code.

Thanks!

--
Steinar
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Puneet Kishor-2

On Apr 13, 2012, at 3:14 PM, Steinar Midtskogen wrote:

> Puneet Kishor <[hidden email]> writes:
>
>> If you want the results in separate columns, you can do something like
>>
>> SELECT Min(a) minimum, 'none' maximum FROM t UNION ALL SELECT 'none' minimum, Max(a) minimum FROM t;
>
> Then it does a full scan again.
>
> But Igor's suggestion "SELECT (SELECT min(unix_time) FROM table),
> (SELECT max(unix_time) FROM table)" works fine, and means less code.
>


Yes, Igor's suggestion is definitely better, but where is the full table scan?

        sqlite> EXPLAIN QUERY PLAN SELECT Min(a) FROM t UNION ALL SELECT Max(a) FROM t;
        selectid|order|from|detail
        1|0|0|SEARCH TABLE t USING INTEGER PRIMARY KEY (~1 rows)
        2|0|0|SEARCH TABLE t USING INTEGER PRIMARY KEY (~1 rows)
        0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

Am I missing something?


--
Puneet Kishor

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Dan Kennedy-4
In reply to this post by Steinar Midtskogen
On 04/14/2012 03:14 AM, Steinar Midtskogen wrote:

> Puneet Kishor<[hidden email]>  writes:
>
>> If you want the results in separate columns, you can do something like
>>
>> SELECT Min(a) minimum, 'none' maximum FROM t UNION ALL SELECT 'none' minimum, Max(a) minimum FROM t;
>
> Then it does a full scan again.
>
> But Igor's suggestion "SELECT (SELECT min(unix_time) FROM table),
> (SELECT max(unix_time) FROM table)" works fine, and means less code.

This:

   http://www.sqlite.org/optoverview.html#minmax

Both the subqueries qualify for the optimization, so the overall
query is fast. With the UNION ALL version, the second column in the
result set disqualifies both sides from using the optimization. So
it is slow.

I think if you were to change the UNION ALL version to the following
it would be just as fast as the sub-selects.

   SELECT Min(a) minimum FROM t
     UNION ALL
   SELECT Max(a) minimum FROM t;




_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Steinar Midtskogen
In reply to this post by Steinar Midtskogen
Hello again,

Another question about max()/min() optimisation.  Is there a way I can
implement a virtual table so that max()/min() of a sorted
(incrementing) column (which could be an integer primary key in a
regular table) gets fast?

For example,

sqlite> explain query plan select max(unix_time) from vtab;
0|0|0|SEARCH TABLE vtab VIRTUAL TABLE INDEX 0: (~1 rows)

Currently, "select max(unix_time) from vtab" causes SQLite to search
through millions of rows, which may take nearly half a minute for my
table, no faster than other non-sorted columns.

I've added special treatment of this sorted "unix_time" column in
xBestIndex, so that a query like:

 select max(unix_time) from vtab where unix_time > strftime("%s", "2012-04-14");

runs fast (i.e. then my table will only look through a few rows at the end).


Perhaps what I'm asking is whether it's possible to add a special
treatment for max() and min() in a virtual table.

--
Steinar
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Simon Slavin-3

On 15 Apr 2012, at 1:31pm, Steinar Midtskogen <[hidden email]> wrote:

> Another question about max()/min() optimisation.  Is there a way I can
> implement a virtual table so that max()/min() of a sorted
> (incrementing) column (which could be an integer primary key in a
> regular table) gets fast?

The max() and min() functions work instantly if the columns they're looking at are indexed.  They simply find the first or last entry in the index.  So when defining your virtual table routines, just make sure your key columns have an index, and that your xBestIndex method finds the right index.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Steinar Midtskogen
[Simon Slavin]

> On 15 Apr 2012, at 1:31pm, Steinar Midtskogen <[hidden email]> wrote:
>
>> Another question about max()/min() optimisation.  Is there a way I can
>> implement a virtual table so that max()/min() of a sorted
>> (incrementing) column (which could be an integer primary key in a
>> regular table) gets fast?
>
> The max() and min() functions work instantly if the columns they're
> looking at are indexed.  They simply find the first or last entry in
> the index.  So when defining your virtual table routines, just make
> sure your key columns have an index, and that your xBestIndex method
> finds the right index.

According to the "Using SQLite" book, "you cannot create an index on
a view or on a virtual table".

Also, when declaring the virtual table using sqlite3_declare_vtab the
book says: "any constraints, default values, or key definitions within
the table definition are also ignord - this includes any definition of
INTEGER PRIMARY KEY as a ROWID alias".

So, is there really a way to create an index in a virtual table, or a
way to emulate this?

My xRowid function simply returns the value of the "unix_time" column,
but even "select max(rowid)" is equally slow.
--
Steinar
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Kit-18
2012/4/15 Steinar Midtskogen <[hidden email]>:
> So, is there really a way to create an index in a virtual table, or a
> way to emulate this?

Why? You don't need this. Use index on base tables.

> My xRowid function simply returns the value of the "unix_time" column,
> but even "select max(rowid)" is equally slow.
> Steinar

Why you need "select max(rowid)"? Something is wrong in your data
design. Use autoincrement.
--
Kit
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Gerry Snyder-4
In reply to this post by Steinar Midtskogen
At worst you could use another table to keep track of the maximum and
minimum, and update it with triggers when something is added to or deleted
from the virtual table.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Steinar Midtskogen
In reply to this post by Kit-18
[Kit]

> 2012/4/15 Steinar Midtskogen <[hidden email]>:
>> So, is there really a way to create an index in a virtual table, or a
>> way to emulate this?
>
> Why? You don't need this. Use index on base tables.

My base tables are indexed.  Let's say I want to make a very simple
virtual table this way:

 create virtual table vtab using copy(indexed_table);

which simply maps any query for vtab to indexed_table and returns that.
So let's say that indexed_table have an integer column "key" which
also a primary key.  So "select max(key) from indexed_table" will be
fast no matter how big it is and the module can find this value in a
blink.  What I would like to is to have "select max(key) from vtab"
run fast as well, without having to run through the billion rows in
index.

So what happens when I run "select max(key) from vtab"?  Well, all
xFilter will know is that it needs to produce the "key" column, and there
should be a "order by key" clause as well, but even if we can assume
that what we're dealing with is a sorted column, and xFilter could
look up the max in no time, xFilter doesn't know that the query is for
the max value.  Can my module do anything better than to produce all
the rows for sqlite to feed into the max aggregate function?

>> My xRowid function simply returns the value of the "unix_time" column,
>> but even "select max(rowid)" is equally slow.
>> Steinar
>
> Why you need "select max(rowid)"? Something is wrong in your data
> design. Use autoincrement.

I don't need it, but a virtual table must provide one.  I'm not sure
why.

--
Steinar
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Steinar Midtskogen
In reply to this post by Gerry Snyder-4
[Gerry Snyder]

> At worst you could use another table to keep track of the maximum and
> minimum, and update it with triggers when something is added to or deleted
> from the virtual table.

My module knows what the maximum and minimum values are at all times.
It also knows that the column is sorted.  The trouble is that it
doesn't know that the values it can produce for that column will be
fed to a max() and min() function.  If it did, it could simply just
return one value.

I might be missing something, though.

--
Steinar
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster thanone?

ProgenyUSA
In reply to this post by Steinar Midtskogen
Testing for my e mail address

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Steinar Midtskogen
Sent: 15 April 2012 18:38
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Why are two select statements 2000 times faster
thanone?

[Kit]

> 2012/4/15 Steinar Midtskogen <[hidden email]>:
>> So, is there really a way to create an index in a virtual table, or a
>> way to emulate this?
>
> Why? You don't need this. Use index on base tables.

My base tables are indexed.  Let's say I want to make a very simple virtual
table this way:

 create virtual table vtab using copy(indexed_table);

which simply maps any query for vtab to indexed_table and returns that.
So let's say that indexed_table have an integer column "key" which also a
primary key.  So "select max(key) from indexed_table" will be fast no matter
how big it is and the module can find this value in a blink.  What I would
like to is to have "select max(key) from vtab"
run fast as well, without having to run through the billion rows in index.

So what happens when I run "select max(key) from vtab"?  Well, all xFilter
will know is that it needs to produce the "key" column, and there should be
a "order by key" clause as well, but even if we can assume that what we're
dealing with is a sorted column, and xFilter could look up the max in no
time, xFilter doesn't know that the query is for the max value.  Can my
module do anything better than to produce all the rows for sqlite to feed
into the max aggregate function?

>> My xRowid function simply returns the value of the "unix_time"
>> column, but even "select max(rowid)" is equally slow.
>> Steinar
>
> Why you need "select max(rowid)"? Something is wrong in your data
> design. Use autoincrement.

I don't need it, but a virtual table must provide one.  I'm not sure why.

--
Steinar
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1901 / Virus Database: 2411/4938 - Release Date: 04/15/12

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Why are two select statements 2000 times faster than one?

Steinar Midtskogen
In reply to this post by Steinar Midtskogen
To answer my own question, whether there is an efficient way to find
max() of an increasingly sorted column in a virtual array: What is
needed is to make sure that xBestIndex sets orderByConsumed, and that
the module takes care of all sorting.

--
Steinar
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users