Quantcast

Index usefulness for GROUP BY

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Index usefulness for GROUP BY

Jeffrey Mattox
Given this DB schema (simplified, there are other columns):

  CREATE TABLE History (
    history_ID  INTEGER PRIMARY KEY,
    gameCount  INTEGER,
    weekday  INTEGER, /* 0=Sunday, 6=Saturday */
    hour  INTEGER, /* (0..23) */
    datetime  INTEGER /* unix datetime */ );

  CREATE INDEX  Idx_weekday  ON  History( weekday );

-------------
Now, I look at a recent set of rows...

SELECT TOTAL(gameCount), weekday  FROM History
      WHERE datetime >= strftime('%s','now','-28 days')
      GROUP BY weekday
      ORDER BY 1 DESC

QUERY PLANS:
without the index:
  0  0  0  SCAN TABLE History
  0  0  0  USE TEMP B-TREE FOR GROUP BY  <-- weekday (7 groups)
  0  0  0  USE TEMP B-TREE FOR ORDER BY

with the index:
  0  0  0  SCAN TABLE History USING INDEX Idx_weekday
  0  0  0  USE TEMP B-TREE FOR ORDER BY

Either way, the entire table is scanned (right?).  My index covers the entire table, but the TEMP B-TREE FOR GROUP BY contains only the rows matching the WHERE clause, so the TEMP B-TREE is much smaller (right?).  So, is my index on weekday worthwhile, time-wise and space-wise?  (Query speed is not a big issue for me, and the DB is relatively small -- there are, at most, 60 rows added per day.  Memory is plentiful, OSX).

---
Jeff

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Index usefulness for GROUP BY

Hick Gunter
YES. AFAIK if SQLite detects that the rows are/can be made to be returned in GROUP BY order it can use internal variables to accumulate the group results. This is expected to be significantly faster than locating and updating a temporary BTree row for each record scanned.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Jeffrey Mattox
Gesendet: Freitag, 03. März 2017 11:30
An: SQLite mailing list <[hidden email]>
Betreff: [sqlite] Index usefulness for GROUP BY

Given this DB schema (simplified, there are other columns):

  CREATE TABLE History (
    history_ID  INTEGER PRIMARY KEY,
    gameCount  INTEGER,
    weekday  INTEGER, /* 0=Sunday, 6=Saturday */
    hour  INTEGER, /* (0..23) */
    datetime  INTEGER /* unix datetime */ );

  CREATE INDEX  Idx_weekday  ON  History( weekday );

-------------
Now, I look at a recent set of rows...

SELECT TOTAL(gameCount), weekday  FROM History
      WHERE datetime >= strftime('%s','now','-28 days')
      GROUP BY weekday
      ORDER BY 1 DESC

QUERY PLANS:
without the index:
  0  0  0  SCAN TABLE History
  0  0  0  USE TEMP B-TREE FOR GROUP BY  <-- weekday (7 groups)
  0  0  0  USE TEMP B-TREE FOR ORDER BY

with the index:
  0  0  0  SCAN TABLE History USING INDEX Idx_weekday
  0  0  0  USE TEMP B-TREE FOR ORDER BY

Either way, the entire table is scanned (right?).  My index covers the entire table, but the TEMP B-TREE FOR GROUP BY contains only the rows matching the WHERE clause, so the TEMP B-TREE is much smaller (right?).  So, is my index on weekday worthwhile, time-wise and space-wise?  (Query speed is not a big issue for me, and the DB is relatively small -- there are, at most, 60 rows added per day.  Memory is plentiful, OSX).

---
Jeff

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Index usefulness for GROUP BY

Clemens Ladisch
In reply to this post by Jeffrey Mattox
Jeffrey Mattox wrote:
> is my index on weekday worthwhile, time-wise and space-wise?  (Query
> speed is not a big issue for me, and the DB is relatively small

Indexes are optimizations.  In a small DB, the effect is probably not
noticeable, which implies that you should not bother.

Where exactly the point is at which the index becomes useful in your
system is something which you have to measure yourself.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Index usefulness for GROUP BY

Jeffrey Mattox
In reply to this post by Jeffrey Mattox
Thank you for your replies.  I've found that my best index is on datetime since it eliminates the most uninteresting rows.  The query plan is

SEARCH TABLE History USING INDEX Idx_datetime (datetime>?)
USE TEMP B-TREE FOR GROUP BY
USE TEMP B-TREE FOR ORDER BY

In my case, it's also best to have no other indexes.  If I add an index on the GROUP BY column and if that column also appears in a WHERE clause, then that index is used and (the better) Idx_datetime is ignored.  When I get a larger dataset, I'll run ANALYZE to see if the optimizer chooses Idx_datetime (which I know would be the best index).

Jeff


> On Mar 3, 2017, at 4:29 AM, Jeffrey Mattox <[hidden email]> wrote:
>
> Given this DB schema (simplified, there are other columns):
>
>  CREATE TABLE History (
>    history_ID  INTEGER PRIMARY KEY,
>    gameCount  INTEGER,
>    weekday  INTEGER, /* 0=Sunday, 6=Saturday */
>    hour  INTEGER, /* (0..23) */
>    datetime  INTEGER /* unix datetime */ );
>
>  CREATE INDEX  Idx_weekday  ON  History( weekday );
>
> -------------
> Now, I look at a recent set of rows...
>
> SELECT TOTAL(gameCount), weekday  FROM History
>      WHERE datetime >= strftime('%s','now','-28 days')
>      GROUP BY weekday
>      ORDER BY 1 DESC
>
> QUERY PLANS:
> without the index:
>  0  0  0  SCAN TABLE History
>  0  0  0  USE TEMP B-TREE FOR GROUP BY  <-- weekday (7 groups)
>  0  0  0  USE TEMP B-TREE FOR ORDER BY
>
> with the index:
>  0  0  0  SCAN TABLE History USING INDEX Idx_weekday
>  0  0  0  USE TEMP B-TREE FOR ORDER BY
>
> Either way, the entire table is scanned (right?).  My index covers the entire table, but the TEMP B-TREE FOR GROUP BY contains only the rows matching the WHERE clause, so the TEMP B-TREE is much smaller (right?).  So, is my index on weekday worthwhile, time-wise and space-wise?  (Query speed is not a big issue for me, and the DB is relatively small -- there are, at most, 60 rows added per day.  Memory is plentiful, OSX).
>
> ---
> Jeff
>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Index usefulness for GROUP BY

Simon Slavin-3

On 4 Mar 2017, at 10:16pm, Jeffrey Mattox <[hidden email]> wrote:

> Thank you for your replies.  I've found that my best index is on datetime since it eliminates the most uninteresting rows.  The query plan is
>
> SEARCH TABLE History USING INDEX Idx_datetime (datetime>?)
> USE TEMP B-TREE FOR GROUP BY
> USE TEMP B-TREE FOR ORDER BY
>
>> -------------
>> Now, I look at a recent set of rows...
>>
>> SELECT TOTAL(gameCount), weekday  FROM History
>>     WHERE datetime >= strftime('%s','now','-28 days')
>>     GROUP BY weekday
>>     ORDER BY 1 DESC

Create these two additional indexes on History:

(datetime, weekday)
(weekday, datetime)

Then execute the ANALYZE command.

Then find out whether this has increased or decreased the time taken for the SELECT.

You can delete the two indexes it turns out not to be using.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...