Is there a performance difference between COUNT(*) and COUNT(name)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Is there a performance difference between COUNT(*) and COUNT(name)

Cecil Westerhof-5
I want to know the number of teas I have in stock. For this I use:
SELECT COUNT(Tea)
FROM   teaInStock

Tea cannot be NULL, so this is the same as:
SELECT COUNT(*)
FROM   teaInStock

​But I find the first more clear.
I almost always see the second variant. Is this because it is more
efficient, or are people just ‘lazy’?​

--
Cecil Westerhof
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Is there a performance difference between COUNT(*) and COUNT(name)

R Smith
On 2017/09/05 10:21 PM, Cecil Westerhof wrote:

> I want to know the number of teas I have in stock. For this I use:
> SELECT COUNT(Tea)
> FROM   teaInStock
>
> Tea cannot be NULL, so this is the same as:
> SELECT COUNT(*)
> FROM   teaInStock
>
> ​But I find the first more clear.
> I almost always see the second variant. Is this because it is more
> efficient, or are people just ‘lazy’?​
>

Nothing to do with laziness - depends on what is wanted:
COUNT(*) counts the rows in the DB that matches the filter (WHERE clause).
COUNT(Tea) counts the Tea column only and will return only the number of
values that are not NULL.

A short script to demonstrate the differences:

   -- SQLite version 3.17.0  [ Release: 2017-02-13 ]  on SQLitespeed
version 2.0.2.4.
   --
================================================================================================

CREATE TABLE t(ID INT, Tea TEXT);

INSERT INTO t(ID, Tea) VALUES
(1, 'Ceylon'),
(2, Null),
(3, 'Earl Grey'),
(4, NULL),
(5, 'Jasmine')
;

SELECT * FROM t;


   --      ID      | Tea
   -- ------------ | ---------
   --       1      | Ceylon
   --       2      | NULL
   --       3      | Earl Grey
   --       4      | NULL
   --       5      | Jasmine

SELECT COUNT(*) FROM t;


   --   COUNT(*)
   -- ------------
   --       5

SELECT COUNT(Tea) FROM t;


   --  COUNT(Tea)
   -- ------------
   --       3

SELECT COUNT(*) FROM t WHERE Tea IS NULL;


   --   COUNT(*)
   -- ------------
   --       2

DROP TABLE t;

   --
================================================================================================



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Is there a performance difference between COUNT(*) and COUNT(name)

John McKown
In reply to this post by Cecil Westerhof-5
On Tue, Sep 5, 2017 at 3:21 PM, Cecil Westerhof <[hidden email]>
wrote:

> I want to know the number of teas I have in stock. For this I use:
> SELECT COUNT(Tea)
> FROM   teaInStock
>
> Tea cannot be NULL, so this is the same as:
> SELECT COUNT(*)
> FROM   teaInStock
>
> ​But I find the first more clear.
> I almost always see the second variant. Is this because it is more
> efficient, or are people just ‘lazy’?​
>

​The first seems more efficient. Example using EXPLAIN:

sqlite> create table data (tea text);
sqlite> insert into data(tea) values("Lampsang Souchung");
sqlite> insert into data(tea) values("Keemun");
sqlite> explain select count(*) from data;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     7     0                    00  Start at 7
1     OpenRead       1     2     0     1              00  root=2 iDb=0
2     Count          1     1     0                    00  r[1]=count()
3     Close          1     0     0                    00
4     Copy           1     2     0                    00  r[2]=r[1]
5     ResultRow      2     1     0                    00  output=r[2]
6     Halt           0     0     0                    00
7     Transaction    0     0     1     0              01  usesStmtJournal=0
8     TableLock      0     2     0     data           00  iDb=0 root=2
write=0
9     Goto           0     1     0                    00
sqlite> explain select count(tea) from data;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     12    0                    00  Start at 12
1     Null           0     1     2                    00  r[1..2]=NULL
2     OpenRead       0     2     0     1              00  root=2 iDb=0; data
3     Rewind         0     7     0                    00
4       Column         0     0     3                    00  r[3]=data.tea
5       AggStep0       0     3     1     count(1)       01  accum=r[1]
step(r[3])
6     Next           0     4     0                    01
7     Close          0     0     0                    00
8     AggFinal       1     1     0     count(1)       00  accum=r[1] N=1
9     Copy           1     4     0                    00  r[4]=r[1]
10    ResultRow      4     1     0                    00  output=r[4]
11    Halt           0     0     0                    00
12    Transaction    0     0     1     0              01  usesStmtJournal=0
13    TableLock      0     2     0     data           00  iDb=0 root=2
write=0
14    Goto           0     1     0                    00
sqlite> drop table data;
sqlite> -- see if NOT NULL makes a difference
sqlite> create table data(tea text not null);
sqlite> insert into data(tea) values("Lapsang Souchung");
sqlite> insert into data(tea) values("Keemun");
sqlite> explain select count(*) from data;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     7     0                    00  Start at 7
1     OpenRead       1     2     0     1              00  root=2 iDb=0
2     Count          1     1     0                    00  r[1]=count()
3     Close          1     0     0                    00
4     Copy           1     2     0                    00  r[2]=r[1]
5     ResultRow      2     1     0                    00  output=r[2]
6     Halt           0     0     0                    00
7     Transaction    0     0     3     0              01  usesStmtJournal=0
8     TableLock      0     2     0     data           00  iDb=0 root=2
write=0
9     Goto           0     1     0                    00
sqlite> explain select count(Tea) from data;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     12    0                    00  Start at 12
1     Null           0     1     2                    00  r[1..2]=NULL
2     OpenRead       0     2     0     1              00  root=2 iDb=0; data
3     Rewind         0     7     0                    00
4       Column         0     0     3                    00  r[3]=data.tea
5       AggStep0       0     3     1     count(1)       01  accum=r[1]
step(r[3])
6     Next           0     4     0                    01
7     Close          0     0     0                    00
8     AggFinal       1     1     0     count(1)       00  accum=r[1] N=1
9     Copy           1     4     0                    00  r[4]=r[1]
10    ResultRow      4     1     0                    00  output=r[4]
11    Halt           0     0     0                    00
12    Transaction    0     0     3     0              01  usesStmtJournal=0
13    TableLock      0     2     0     data           00  iDb=0 root=2
write=0
14    Goto           0     1     0                    00
sqlite>


​NOT NULL doesn't make a difference. The EXPLAIN shows the operations, and
the first seems to be much more efficient; fewer steps & no loop.​



>
> --
> Cecil Westerhof
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



--
Caution! The OP is an hyperpolysyllabicsesquipedalianist and this email may
cause stress to those with hippopotomonstrosesquipedaliophobia.

Maranatha! <><
John McKown
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Is there a performance difference between COUNT(*) and COUNT(name)

Simon Slavin-3
In reply to this post by Cecil Westerhof-5


On 5 Sep 2017, at 9:21pm, Cecil Westerhof <[hidden email]> wrote:

> I want to know the number of teas I have in stock. For this I use:
> SELECT COUNT(Tea)
> FROM   teaInStock
>
> Tea cannot be NULL, so this is the same as:
> SELECT COUNT(*)
> FROM   teaInStock
>
> ​But I find the first more clear.
> I almost always see the second variant. Is this because it is more
> efficient, or are people just ‘lazy’?​

Your guess is right !

To do COUNT(*) SQLite has to retrieve all the rows.
To do COUNT(value) has to retrieve all the rows and test the value of each row to make sure it is not NULL.

Also, SQLite has a specific piece of code which makes COUNT(*) more efficient than counting the values.  However, unless you have a big database, the difference for your case may be small.  If you find COUNT(Tea) easier to understand perhaps you should use that one.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Is there a performance difference between COUNT(*) and COUNT(name)

Cecil Westerhof-5
2017-09-05 23:11 GMT+02:00 Simon Slavin <[hidden email]>:

>
>
> On 5 Sep 2017, at 9:21pm, Cecil Westerhof <[hidden email]> wrote:
>
> > I want to know the number of teas I have in stock. For this I use:
> > SELECT COUNT(Tea)
> > FROM   teaInStock
> >
> > Tea cannot be NULL, so this is the same as:
> > SELECT COUNT(*)
> > FROM   teaInStock
> >
> > ​But I find the first more clear.
> > I almost always see the second variant. Is this because it is more
> > efficient, or are people just ‘lazy’?​
>
> Your guess is right !
>
> To do COUNT(*) SQLite has to retrieve all the rows.
> To do COUNT(value) has to retrieve all the rows and test the value of each
> row to make sure it is not NULL.
>
> Also, SQLite has a specific piece of code which makes COUNT(*) more
> efficient than counting the values.  However, unless you have a big
> database, the difference for your case may be small.  If you find
> COUNT(Tea) easier to understand perhaps you should use that one.
>

​I will keep using COUNT(Tea) then, but keep in the back of my mind that I
maybe should change that if a table becomes big.

Thanks.​

--
Cecil Westerhof
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Is there a performance difference between COUNT(*) and COUNT(name)

Hick Gunter
In reply to this post by Cecil Westerhof-5
Count(<name>) needs to extract the <name> field from the record, tallying only those that are NOT NULL.

Count(*) returns the total number of records in the table, with no need to extract a specific field.

When looking into efficiency, try using the .explain/explain feature.

asql> explain select count(a) from t;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
...
4     OpenRead       0     2     1     1              00  t
5     Rewind         0     9     0                    00  NULL
6     Column         0     0     3                    00  t.a
7     AggStep        0     3     1     count(1)       01  NULL
8     Next           0     6     0                    01  NULL
9     Close          0     0     0                    00  NULL
10    AggFinal       1     1     0     count(1)       00  NULL
11    Copy           1     4     0                    00  NULL
12    ResultRow      4     1     0                    00  NULL
...
asql> explain select count(*) from t;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
...
2     OpenRead       1     2     1                    00  NULL
3     Count          1     1     0                    00  NULL
4     Close          1     0     0                    00  NULL
5     Copy           1     2     0                    00  NULL
6     ResultRow      2     1     0                    00  NULL
...

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Cecil Westerhof
Gesendet: Dienstag, 05. September 2017 22:22
An: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] [sqlite] Is there a performance difference between COUNT(*) and COUNT(name)

I want to know the number of teas I have in stock. For this I use:
SELECT COUNT(Tea)
FROM   teaInStock

Tea cannot be NULL, so this is the same as:
SELECT COUNT(*)
FROM   teaInStock

​But I find the first more clear.
I almost always see the second variant. Is this because it is more efficient, or are people just ‘lazy’?​

--
Cecil Westerhof
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Is there a performance difference between COUNT(*) and COUNT(name)

Dominique Devienne
On Wed, Sep 6, 2017 at 7:56 AM, Hick Gunter <[hidden email]> wrote:

> Count(<name>) needs to extract the <name> field from the record, tallying
> only those that are NOT NULL.
>

Technically it would not need to "extract" the <name> field, only lookup
the row header and
see whether that field/column is NULL or not (since NULL is a special
SERIAL TYPE of value 0 [1]).

asql> explain select count(a) from t;
> addr  opcode         p1    p2    p3    p4             p5  comment
> ----  -------------  ----  ----  ----  -------------  --  -------------
> ...
> 4     OpenRead       0     2     1     1              00  t
> 5     Rewind         0     9     0                    00  NULL
> 6     Column         0     0     3                    00  t.a
> 7     AggStep        0     3     1     count(1)       01  NULL
> ...
>

But as you showed, in practice it may still do, because of the Column VDBE
op-code.
Unless one imagines a specialized ColumnNullness of ColumnType new op-code
that
wouldn't need to extract the actual value/content, as a future
optimization. --DD

[1] https://www.sqlite.org/fileformat.html#record_format
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Is there a performance difference between COUNT(*) and COUNT(name)

John Found
In reply to this post by Cecil Westerhof-5
On Tue, 5 Sep 2017 23:31:32 +0200
Cecil Westerhof <[hidden email]> wrote:

> 2017-09-05 23:11 GMT+02:00 Simon Slavin <[hidden email]>:
>
> >
> >
> > On 5 Sep 2017, at 9:21pm, Cecil Westerhof <[hidden email]> wrote:
> >
> > > I want to know the number of teas I have in stock. For this I use:
> > > SELECT COUNT(Tea)
> > > FROM   teaInStock
> > >
> > > Tea cannot be NULL, so this is the same as:
> > > SELECT COUNT(*)
> > > FROM   teaInStock
> > >
> > > ​But I find the first more clear.
> > > I almost always see the second variant. Is this because it is more
> > > efficient, or are people just ‘lazy’?​
> >
> > Your guess is right !
> >
> > To do COUNT(*) SQLite has to retrieve all the rows.
> > To do COUNT(value) has to retrieve all the rows and test the value of each
> > row to make sure it is not NULL.
> >
> > Also, SQLite has a specific piece of code which makes COUNT(*) more
> > efficient than counting the values.  However, unless you have a big
> > database, the difference for your case may be small.  If you find
> > COUNT(Tea) easier to understand perhaps you should use that one.
> >
>
> ​I will keep using COUNT(Tea) then, but keep in the back of my mind that I
> maybe should change that if a table becomes big.

In my tests even on small tables count(colName) is at least 2 times slower than
count(*), even if both queries uses covering indexes. So, using count(colName) has
meaning only if you really want to count only not null rows.

Making exception for columns that "never contain NULL" in the name of "source clearness" actually is hard for detection hidden bug that can strike after long time on the database schema change.


>
> Thanks.​
>
> --
> Cecil Westerhof
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


--
http://fresh.flatassembler.net
http://asm32.info
John Found <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users