without rowid and secondary indexes

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

without rowid and secondary indexes

Olivier Mascia
Dear all,

With https://www.sqlite.org/withoutrowid.html in mind, assuming something simple as:

  create table if not exists SHEETS(
  NODE integer not null,
  SHEET integer not null,
  NAME text,               -- often NULL, sometimes short text value (keyword)
  ...,                     -- 2 or 3 other small properties
  primary key(NODE,SHEET)
  ) without rowid;
  -- makes sense because small row and natural primary key is not integer

Now what if an index on NAME becomes valuable?
Will such a secondary index have a higher storage cost or performance cost than with an ordinary table?

  create index if not exists SHEETS_NAME on SHEETS(NAME);

--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: without rowid and secondary indexes

Richard Hipp-3
Every index is composed of the columns being indexed followed by the
primary key of the table.  In a regular ROWID table, the "primary key"
is the rowid - an integer.  In a WITHOUT ROWID table, the primary key
is whatever you declare the PRIMARY KEY to be.

So whether or not there is more overhead in an index on a WITHOUT
ROWID table versus a normal rowid table depends on your choice of the
PRIMARY KEY.

In your example below, your PRIMARY KEY consists of two integer
instead of just a single rowid integer.  So it will take up slightly
more space.  Not a lot, but some.

Will that make a difference in your application?  I do not know.  I
suggest you try it and see.  This is one of the great beauties of SQL
(any SQL not just SQLite).  You can make a simple schema change like
this and try it out, without having to modify your application -
indeed without having to even recompile your application.  Simply run
the same build of your application using two different versions of the
database - one with an ordinary rowid table and the other with a
WITHOUT ROWID table - and see which one gives you better performance -
for whatever definition of "performance" is important to you.

On 4/10/18, Olivier Mascia <[hidden email]> wrote:

> Dear all,
>
> With https://www.sqlite.org/withoutrowid.html in mind, assuming something
> simple as:
>
>   create table if not exists SHEETS(
>   NODE integer not null,
>   SHEET integer not null,
>   NAME text,               -- often NULL, sometimes short text value
> (keyword)
>   ...,                     -- 2 or 3 other small properties
>   primary key(NODE,SHEET)
>   ) without rowid;
>   -- makes sense because small row and natural primary key is not integer
>
> Now what if an index on NAME becomes valuable?
> Will such a secondary index have a higher storage cost or performance cost
> than with an ordinary table?
>
>   create index if not exists SHEETS_NAME on SHEETS(NAME);
>
> --
> Best Regards, Meilleures salutations, Met vriendelijke groeten,
> Olivier Mascia
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: without rowid and secondary indexes

Olivier Mascia
Thanks!

> In your example below, your PRIMARY KEY consists of two integer
> instead of just a single rowid integer.  So it will take up slightly
> more space.  Not a lot, but some.
>
> Will that make a difference in your application?  I do not know.

Then, I will indeed run some more tests, but I guess I'm OK with adding the index while keeping the table without rowid because few rows will actually have a NAME value, most (like in more than 95% of them) will be NULL. What's more, most queries are PK centric.

So I should probably even better go with:

  create index if not exists SHEETS_NAME on SHEETS(NAME) where NAME is not NULL;

as my only queries involving NAME have a 'where NAME is not NULL' restriction. The purpose of the index being only to optimize such occasional dictionary building queries:

select NAME,count(NAME) from SHEETS group by NAME having SHEET>? and NAME is not NULL;

--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


> Le 10 avr. 2018 à 12:36, Richard Hipp <[hidden email]> a écrit :
>
> Every index is composed of the columns being indexed followed by the
> primary key of the table.  In a regular ROWID table, the "primary key"
> is the rowid - an integer.  In a WITHOUT ROWID table, the primary key
> is whatever you declare the PRIMARY KEY to be.
>
> So whether or not there is more overhead in an index on a WITHOUT
> ROWID table versus a normal rowid table depends on your choice of the
> PRIMARY KEY.
>
> In your example below, your PRIMARY KEY consists of two integer
> instead of just a single rowid integer.  So it will take up slightly
> more space.  Not a lot, but some.
>
> Will that make a difference in your application?  I do not know.  I
> suggest you try it and see.  This is one of the great beauties of SQL
> (any SQL not just SQLite).  You can make a simple schema change like
> this and try it out, without having to modify your application -
> indeed without having to even recompile your application.  Simply run
> the same build of your application using two different versions of the
> database - one with an ordinary rowid table and the other with a
> WITHOUT ROWID table - and see which one gives you better performance -
> for whatever definition of "performance" is important to you.
>
> On 4/10/18, Olivier Mascia <[hidden email]> wrote:
>> Dear all,
>>
>> With https://www.sqlite.org/withoutrowid.html in mind, assuming something
>> simple as:
>>
>>  create table if not exists SHEETS(
>>  NODE integer not null,
>>  SHEET integer not null,
>>  NAME text,               -- often NULL, sometimes short text value
>> (keyword)
>>  ...,                     -- 2 or 3 other small properties
>>  primary key(NODE,SHEET)
>>  ) without rowid;
>>  -- makes sense because small row and natural primary key is not integer
>>
>> Now what if an index on NAME becomes valuable?
>> Will such a secondary index have a higher storage cost or performance cost
>> than with an ordinary table?
>>
>>  create index if not exists SHEETS_NAME on SHEETS(NAME);
>>
>> --
>> Best Regards, Meilleures salutations, Met vriendelijke groeten,
>> Olivier Mascia


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: without rowid and secondary indexes

Clemens Ladisch
Olivier Mascia wrote:
> [...]
> So I should probably even better go with:
>
>   create index ... where NAME is not NULL;
>
> as my only queries involving NAME have a 'where NAME is not NULL' restriction.

To be sure, check with EXPLAIN QUERY PLAN whether the index is actually used.

> select NAME,count(NAME) from SHEETS group by NAME having SHEET>? and NAME is not NULL;

The SHEET value is from some random row in the group, and the NAME value is
the same for all rows in the group.  Are you sure you do not want to use WHERE
instead of HAVING?


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: without rowid and secondary indexes

Olivier Mascia
> Le 10 avr. 2018 à 13:20, Clemens Ladisch <[hidden email]> a écrit :
>
>> select NAME,count(NAME) from SHEETS group by NAME having SHEET>? and NAME is not NULL;
>
> The SHEET value is from some random row in the group, and the NAME value is
> the same for all rows in the group.  Are you sure you do not want to use WHERE
> instead of HAVING?

Thanks for the hint, Clemens.
Would you mean:

  select NAME,count(NAME) from SHEETS where SHEET>? group by NAME having NAME is not NULL;

is more logically appropriate? I'm inclined to think yes and that was what you meant.

Or that I might even go for:

  select NAME,count(NAME) from SHEETS where SHEET>? and NAME is not NULL group by NAME;

On a test table with statistically representative values (only ~95'000 rows), I get the same plan in all three cases:

  --EQP-- 0,0,0,SEARCH TABLE SHEETS USING COVERING INDEX SHEETS_NAME (NAME>?)


--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: without rowid and secondary indexes

Clemens Ladisch
Olivier Mascia wrote:

>> Le 10 avr. 2018 à 13:20, Clemens Ladisch <[hidden email]> a écrit :
>>> select NAME,count(NAME) from SHEETS group by NAME having SHEET>? and NAME is not NULL;
>>
>> The SHEET value is from some random row in the group, and the NAME value is
>> the same for all rows in the group.  Are you sure you do not want to use WHERE
>> instead of HAVING?
>
> Would you mean:
>
>   select NAME,count(NAME) from SHEETS where SHEET>? group by NAME having NAME is not NULL;
>
> is more logically appropriate?

Probably.  The first query can give quite random results:

  delete from SHEETS;
  insert into SHEETS(NODE, SHEET, NAME) values(1, 1, 'a');
  insert into SHEETS(NODE, SHEET, NAME) values(1, 3, 'a');
  select NAME,count(NAME) from SHEETS group by NAME having SHEET>2 and NAME is not NULL;
  a|2
  select NAME,count(NAME) from SHEETS where SHEET>2 group by NAME having NAME is not NULL;
  a|1
  delete from SHEETS;
  insert into SHEETS(NODE, SHEET, NAME) values(1, 3, 'a');
  insert into SHEETS(NODE, SHEET, NAME) values(2, 1, 'a');
  select NAME,count(NAME) from SHEETS group by NAME having SHEET>2 and NAME is not NULL;
  select NAME,count(NAME) from SHEETS where SHEET>2 group by NAME having NAME is not NULL;
  a|1

> Or that I might even go for:
>
>   select NAME,count(NAME) from SHEETS where SHEET>? and NAME is not NULL group by NAME;

This always has the same result as the second query.  (But "NAME is not NULL" in the
WHERE clause applies to the table column, while in the HAVING clause, it applies to
the result of the grouping.  So the last query makes it more likely that SQLite can
prove that it can use the index.)


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: without rowid and secondary indexes

Olivier Mascia
> Le 10 avr. 2018 à 18:25, Clemens Ladisch <[hidden email]> a écrit :
>
> Olivier Mascia wrote:
>> Would you mean:
>>
>>  select NAME,count(NAME) from SHEETS where SHEET>? group by NAME having NAME is not NULL;
>>
>> is more logically appropriate?
>
> Probably.  The first query can give quite random results:
>
>  delete from SHEETS;
>  insert into SHEETS(NODE, SHEET, NAME) values(1, 1, 'a');
>  insert into SHEETS(NODE, SHEET, NAME) values(1, 3, 'a');
>  select NAME,count(NAME) from SHEETS group by NAME having SHEET>2 and NAME is not NULL;
>  a|2
>  select NAME,count(NAME) from SHEETS where SHEET>2 group by NAME having NAME is not NULL;
>  a|1
>  delete from SHEETS;
>  insert into SHEETS(NODE, SHEET, NAME) values(1, 3, 'a');
>  insert into SHEETS(NODE, SHEET, NAME) values(2, 1, 'a');
>  select NAME,count(NAME) from SHEETS group by NAME having SHEET>2 and NAME is not NULL;
>  select NAME,count(NAME) from SHEETS where SHEET>2 group by NAME having NAME is not NULL;
>  a|1
>
>> Or that I might even go for:
>>
>>  select NAME,count(NAME) from SHEETS where SHEET>? and NAME is not NULL group by NAME;
>
> This always has the same result as the second query.  (But "NAME is not NULL" in the
> WHERE clause applies to the table column, while in the HAVING clause, it applies to
> the result of the grouping.  So the last query makes it more likely that SQLite can
> prove that it can use the index.)

Thanks a lot: this proved wonderfully educative, especially the oddities arising from:

> select NAME,count(NAME) from SHEETS group by NAME having SHEET>2 and NAME is not NULL;

--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users