Timing issue with min, max and count

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Timing issue with min, max and count

curmudgeon

I have a table with 2.4 million records. It’s a without rowid table (I don’t know if that’s significant) with an integer primary key (ID) and several secondary indexes of the form (OtherCol, ID). If I run

select min(ID), max(ID), count(*) from BigTbl;

It takes 0.67 secs

If I run the three commands separately

select min(ID) from BigTbl;
select max(ID) from BigTbl;
select count(*) from BigTbl;

the TOTAL time to run all 3 is around 0.1 secs.

explain query plan select min(ID) from BigTbl; suggests the primary key is used
explain query plan select max(ID) from BigTbl; suggests the primary key is used
explain query plan select count(*) from BigTbl; suggests a secondary index (call it Ndx) is used

Any combo also seems to use secondary index Ndx e.g.

explain query plan select min(ID), max(ID) from BigTbl;
and
explain query plan select min(ID), max(ID), count(*) from BigTbl;

both use secondary index Ndx.

All come up with the correct answer but obviously when Ndx is used min and max require checking all values of ID rather than obtaining the result from first and last entries in primary key.

Tom



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Timing issue with min, max and count

Richard Hipp-3
On 12/8/17, x <[hidden email]> wrote:
>
> I have a table with 2.4 million records. It’s a without rowid table (I don’t
> know if that’s significant) with an integer primary key (ID) and several
> secondary indexes of the form (OtherCol, ID). If I run

(1) If you are using INTEGER PRIMARY KEY, you should *not* be using a
WITHOUT ROWID.  You can.  It will get the correct answer.  But it will
be unnecessarily slower.

(2) You should never include a single-column PRIMARY KEY as part of an
index.  The database engine will do that for you automatically.

>
> select min(ID), max(ID), count(*) from BigTbl;
>
> It takes 0.67 secs
>
> If I run the three commands separately
>
> select min(ID) from BigTbl;
> select max(ID) from BigTbl;
> select count(*) from BigTbl;


SQLite internally performs special optimizations for the first two
cases, transforming them as follows:

(A) SELECT id FROM bittbl ORDER BY id LIMIT 1;
(B) SELECT id FROM bigtbl ORDER BY id DESC LIMIT 1;

And isolated "count(*)" without a WHERE uses a separate special
optimization in the storage engine that allows it to count the number
of entries in the btree very quick.

But these special optimizations only apply when min(), max(), and
count(*) are used in isolation.  Hence, they do not work for the first
query above that uses all three functions at one.

>
> the TOTAL time to run all 3 is around 0.1 secs.
>
> explain query plan select min(ID) from BigTbl; suggests the primary key is
> used
> explain query plan select max(ID) from BigTbl; suggests the primary key is
> used
> explain query plan select count(*) from BigTbl; suggests a secondary index
> (call it Ndx) is used
>
> Any combo also seems to use secondary index Ndx e.g.
>
> explain query plan select min(ID), max(ID) from BigTbl;
> and
> explain query plan select min(ID), max(ID), count(*) from BigTbl;
>
> both use secondary index Ndx.
>
> All come up with the correct answer but obviously when Ndx is used min and
> max require checking all values of ID rather than obtaining the result from
> first and last entries in primary key.
>
> Tom
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Timing issue with min, max and count

Simon Slavin-3
In reply to this post by curmudgeon


On 8 Dec 2017, at 12:20pm, x <[hidden email]> wrote:

> I have a table with 2.4 million records. It’s a without rowid table (I don’t know if that’s significant) with an integer primary key (ID) and several secondary indexes of the form (OtherCol, ID). If I run
>
> select min(ID), max(ID), count(*) from BigTbl;
>
> It takes [far longer than the individual SELECTs take in total)

Are you doing this in your own programming language ?

If so, can you try obtaining timings using the SQLite command-line tool ?  If you want to post them please tell us which version of SQLite you’re using.

You might find the output of

EXPLAIN QUERY PLAN select min(ID), max(ID), count(*) from BigTbl;

helps you (and us, if you want to post it) to understand what’s happening.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Timing issue with min, max and count

Simon Slavin-3


On 8 Dec 2017, at 1:17pm, Simon Slavin <[hidden email]> wrote:

> helps you (and us, if you want to post it) to understand what’s happening.

Please ignore my post.  Dr H explain your situation exactly.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Timing issue with min, max and count

curmudgeon
In reply to this post by Richard Hipp-3
>But these special optimizations only apply when min(), max(), and
count(*) are used in isolation.  Hence, they do not work for the first
query above that uses all three functions at one.

Thanks Richard.

>(1) If you are using INTEGER PRIMARY KEY, you should *not* be using a
WITHOUT ROWID.  You can.  It will get the correct answer.  But it will
be unnecessarily slower.

I created that table when i first started learning SQLite because I thought it would take up less space. I later read what you mention in (1) above and went back and rebuilt the table minus the WITHOUT ROWID. It then took up more space and there was no noticeable speed gain so I reverted back. As I said, it was a while ago but I’ll revisit the whole affair when I’ve more time.


>(2) You should never include a single-column PRIMARY KEY as part of an
index.  The database engine will do that for you automatically.

That’s something else I only realised recently.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users