Packing integer primary key with field bits

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Packing integer primary key with field bits

curmudgeon
As in cramming numerous integer columns into a 64 bit integer.

Does anyone do this?

Is the speed gain worth the additional confusion of extracting the columns from the key?

How is it best accomplished (virtual table maybe)?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

Hick Gunter
Simple answer: Don't!

This sounds like a misguided attempt to save space in the disk image by messing around with the rowid of a table (which is what "integer primary key" declares the column to be an alias of).

Whatever you stuff in there needs to be unique and, if you intend to use foreign keys, stable (unless you want to incur anomalies on updates). Additionally, SQLite may assign a completely different rowid (i.e. clobber your "packed field values") when running the REPLACE conflict resolution algortihm.

SQLite is already very good at saving space on disk. The most common values (NULL, 0 and 1) are stored using only 1 byte in the record header and other integer values only use the minimal number of bytes required to store the value.


-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von x
Gesendet: Donnerstag, 10. August 2017 09:45
An: [hidden email]
Betreff: [sqlite] Packing integer primary key with field bits

As in cramming numerous integer columns into a 64 bit integer.

Does anyone do this?

Is the speed gain worth the additional confusion of extracting the columns from the key?

How is it best accomplished (virtual table maybe)?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

R Smith
In reply to this post by curmudgeon
The "anyone" that does this already is called "SQLite".

It doesn't amalgamate the INT fields as such, but it stores INTEGER
values in only as much bits as is needed, so it achieves the goal of
being more memory-efficient. So a 0, 1 or 2 takes up much less space
than say a 786587626 or other high-value integer.

It saves a lot more than is immediately evident, because it turns out in
*most* cases, *most* tables with INT columns/values have *most* values
in the low-end of the Integer value spectrum. Especially for Indexes,
Key relations and the like.

The way you refer to, is simply a form of compression, and it has the
advantage of saving space, but the common weakness of all compression
algorithms - It takes CPU cycles, i.e. Work. You asked about "speed"
gain - there is NO speed gain, there might be a bit of space gain, but
there is a Speed loss. Perhaps you think that because a processor is
natively 64bit (these days anyway) it will transfer 64bit integers just
as fast as any lower-bitted integer, and you are correct about that, but
if it involves ANY cycles to build that Integer (and it will involve
MANY), the gain turns into a loss quickly.

Remember: In the CPU registers, something that looks very lean to your
eye, like left-shifting 2 bytes packing it into a 16-bit INT [ex: i16 =
(b1 << 8) + b2] actually expands to many low-level CPU operations
involving moving the larger int into a register/adder, shifting it,
adding, etc. (Repeat-extrapolate for larger INTs.)  As you can see,
simply speeding the 8 bytes as-is into cpu/memory/device consecutively,
using only an 1/8th of the highway, is much faster than first building
the 64 bit INT and then sending that. The loss effectively repeats the
day you read it back and needs to decipher.

Verdict: Don't do it. Ever.

Cheers,
Ryan

On 2017/08/10 9:44 AM, x wrote:

> As in cramming numerous integer columns into a 64 bit integer.
>
> Does anyone do this?
>
> Is the speed gain worth the additional confusion of extracting the columns from the key?
>
> How is it best accomplished (virtual table maybe)?
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

Hick Gunter
In reply to this post by curmudgeon
For the sake of the argument, let's try to devise a workable scheme for such an undertaking:

Lets assume you have a 32-bit "real rowid" and four 8-bit "value" fields.

How to distribute these in the 64-bit rowid?

Rrid low = MSB | v1 | v2 | v3 | v4 | r1r2r3r4 | LSB or
Rrid high =  MSB | r1r2r3r4 | v1 | v2 | v3 | v4 | LSB?

Let's try to select a row by real rowid:

SELECT id FROM t WHERE id & 0xffff = 1234;

There is no index on "id & 0xffff" so this translates into a full table scan (even if limited to one row, it still requires on average 1/2 of the rows to be read). Creating an index is no option because we are trying to save space, remember?

SELECT id FROM t WHERE id BETWEEN (1234<<32) and (1234<<32 + 0xffff);

This only works with "real rowid high" ("real rowid low" requires 2^^32 ranges to work) and the lookup is fast, but the endpoints should better be calculated in the calling program.

So for a reasonably fast lookup by "real rowid", it needs to occupy the most significant bits.

Trying to select by one the values, by extension of the arguments given, will either be a nightmare, default to a full table scan or require a covering index (which requires more disk space than the 4-12 bytes per record we have "saved").

Additionally, if any of the statements returns more than one row for any "real rowid" then your table is shot up beyond repair...

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von x
Gesendet: Donnerstag, 10. August 2017 09:45
An: [hidden email]
Betreff: [sqlite] Packing integer primary key with field bits

As in cramming numerous integer columns into a 64 bit integer.

Does anyone do this?

Is the speed gain worth the additional confusion of extracting the columns from the key?

How is it best accomplished (virtual table maybe)?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

Bart Smissaert
Not sure the OP wanted to touch the rowid/integer primary key.
I think he just was contemplating putting a number of integers in one
single, normal integer column.
Might be mistaken there.

RBS

On Thu, Aug 10, 2017 at 10:53 AM, Hick Gunter <[hidden email]> wrote:

> For the sake of the argument, let's try to devise a workable scheme for
> such an undertaking:
>
> Lets assume you have a 32-bit "real rowid" and four 8-bit "value" fields.
>
> How to distribute these in the 64-bit rowid?
>
> Rrid low = MSB | v1 | v2 | v3 | v4 | r1r2r3r4 | LSB or
> Rrid high =  MSB | r1r2r3r4 | v1 | v2 | v3 | v4 | LSB?
>
> Let's try to select a row by real rowid:
>
> SELECT id FROM t WHERE id & 0xffff = 1234;
>
> There is no index on "id & 0xffff" so this translates into a full table
> scan (even if limited to one row, it still requires on average 1/2 of the
> rows to be read). Creating an index is no option because we are trying to
> save space, remember?
>
> SELECT id FROM t WHERE id BETWEEN (1234<<32) and (1234<<32 + 0xffff);
>
> This only works with "real rowid high" ("real rowid low" requires 2^^32
> ranges to work) and the lookup is fast, but the endpoints should better be
> calculated in the calling program.
>
> So for a reasonably fast lookup by "real rowid", it needs to occupy the
> most significant bits.
>
> Trying to select by one the values, by extension of the arguments given,
> will either be a nightmare, default to a full table scan or require a
> covering index (which requires more disk space than the 4-12 bytes per
> record we have "saved").
>
> Additionally, if any of the statements returns more than one row for any
> "real rowid" then your table is shot up beyond repair...
>
> -----Ursprüngliche Nachricht-----
> Von: sqlite-users [mailto:[hidden email]]
> Im Auftrag von x
> Gesendet: Donnerstag, 10. August 2017 09:45
> An: [hidden email]
> Betreff: [sqlite] Packing integer primary key with field bits
>
> As in cramming numerous integer columns into a 64 bit integer.
>
> Does anyone do this?
>
> Is the speed gain worth the additional confusion of extracting the columns
> from the key?
>
> How is it best accomplished (virtual table maybe)?
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___________________________________________
>  Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1 80100 0
> E-Mail: [hidden email]
>
> This communication (including any attachments) is intended for the use of
> the intended recipient(s) only and may contain information that is
> confidential, privileged or legally protected. Any unauthorized use or
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, please immediately notify the sender
> by return e-mail message and delete all copies of the original
> communication. Thank you for your cooperation.
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

Bart Smissaert
Actually looking at the subject title it looks he was.
Sorry.

RBS

On Thu, Aug 10, 2017 at 11:43 AM, Bart Smissaert <[hidden email]>
wrote:

> Not sure the OP wanted to touch the rowid/integer primary key.
> I think he just was contemplating putting a number of integers in one
> single, normal integer column.
> Might be mistaken there.
>
> RBS
>
> On Thu, Aug 10, 2017 at 10:53 AM, Hick Gunter <[hidden email]> wrote:
>
>> For the sake of the argument, let's try to devise a workable scheme for
>> such an undertaking:
>>
>> Lets assume you have a 32-bit "real rowid" and four 8-bit "value" fields.
>>
>> How to distribute these in the 64-bit rowid?
>>
>> Rrid low = MSB | v1 | v2 | v3 | v4 | r1r2r3r4 | LSB or
>> Rrid high =  MSB | r1r2r3r4 | v1 | v2 | v3 | v4 | LSB?
>>
>> Let's try to select a row by real rowid:
>>
>> SELECT id FROM t WHERE id & 0xffff = 1234;
>>
>> There is no index on "id & 0xffff" so this translates into a full table
>> scan (even if limited to one row, it still requires on average 1/2 of the
>> rows to be read). Creating an index is no option because we are trying to
>> save space, remember?
>>
>> SELECT id FROM t WHERE id BETWEEN (1234<<32) and (1234<<32 + 0xffff);
>>
>> This only works with "real rowid high" ("real rowid low" requires 2^^32
>> ranges to work) and the lookup is fast, but the endpoints should better be
>> calculated in the calling program.
>>
>> So for a reasonably fast lookup by "real rowid", it needs to occupy the
>> most significant bits.
>>
>> Trying to select by one the values, by extension of the arguments given,
>> will either be a nightmare, default to a full table scan or require a
>> covering index (which requires more disk space than the 4-12 bytes per
>> record we have "saved").
>>
>> Additionally, if any of the statements returns more than one row for any
>> "real rowid" then your table is shot up beyond repair...
>>
>> -----Ursprüngliche Nachricht-----
>> Von: sqlite-users [mailto:[hidden email]]
>> Im Auftrag von x
>> Gesendet: Donnerstag, 10. August 2017 09:45
>> An: [hidden email]
>> Betreff: [sqlite] Packing integer primary key with field bits
>>
>> As in cramming numerous integer columns into a 64 bit integer.
>>
>> Does anyone do this?
>>
>> Is the speed gain worth the additional confusion of extracting the
>> columns from the key?
>>
>> How is it best accomplished (virtual table maybe)?
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>> ___________________________________________
>>  Gunter Hick
>> Software Engineer
>> Scientific Games International GmbH
>> FN 157284 a, HG Wien
>> Klitschgasse 2-4, A-1130 Vienna, Austria
>> Tel: +43 1 80100 0
>> E-Mail: [hidden email]
>>
>> This communication (including any attachments) is intended for the use of
>> the intended recipient(s) only and may contain information that is
>> confidential, privileged or legally protected. Any unauthorized use or
>> dissemination of this communication is strictly prohibited. If you have
>> received this communication in error, please immediately notify the sender
>> by return e-mail message and delete all copies of the original
>> communication. Thank you for your cooperation.
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

curmudgeon
In reply to this post by Hick Gunter
Thanks for the replies. I’m not sure I agree with Gunter and Ryan though. I’m thinking about this more from the gain in speed rather than saving space.

To clarify, I’m suggesting replacing a compound key (made up of several integer cols) with an integer primary key (which sqlite will use rather than the row id). I have done my homework on this so I’m familiar with Gunter’s points regarding ‘between’ and ‘high end bits’ but will the between on a single integer key not be faster than matching on the first m fields of an n compound key? If an index is needed on any non-high bit col an expression index would work just as fast for lookups (I suppose inserts would be slower). The savings on space would contribute to the speed as each disk read would contain more records.

Even forgetting about keys, if you packed say 8 columns into one int64 column would you not be saving a minimum of 7 bits?


From: Hick Gunter<mailto:[hidden email]>
Sent: 10 August 2017 10:53
To: 'SQLite mailing list'<mailto:[hidden email]>
Subject: Re: [sqlite] Packing integer primary key with field bits

For the sake of the argument, let's try to devise a workable scheme for such an undertaking:

Lets assume you have a 32-bit "real rowid" and four 8-bit "value" fields.

How to distribute these in the 64-bit rowid?

Rrid low = MSB | v1 | v2 | v3 | v4 | r1r2r3r4 | LSB or
Rrid high =  MSB | r1r2r3r4 | v1 | v2 | v3 | v4 | LSB?

Let's try to select a row by real rowid:

SELECT id FROM t WHERE id & 0xffff = 1234;

There is no index on "id & 0xffff" so this translates into a full table scan (even if limited to one row, it still requires on average 1/2 of the rows to be read). Creating an index is no option because we are trying to save space, remember?

SELECT id FROM t WHERE id BETWEEN (1234<<32) and (1234<<32 + 0xffff);

This only works with "real rowid high" ("real rowid low" requires 2^^32 ranges to work) and the lookup is fast, but the endpoints should better be calculated in the calling program.

So for a reasonably fast lookup by "real rowid", it needs to occupy the most significant bits.

Trying to select by one the values, by extension of the arguments given, will either be a nightmare, default to a full table scan or require a covering index (which requires more disk space than the 4-12 bytes per record we have "saved").

Additionally, if any of the statements returns more than one row for any "real rowid" then your table is shot up beyond repair...

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von x
Gesendet: Donnerstag, 10. August 2017 09:45
An: [hidden email]
Betreff: [sqlite] Packing integer primary key with field bits

As in cramming numerous integer columns into a 64 bit integer.

Does anyone do this?

Is the speed gain worth the additional confusion of extracting the columns from the key?

How is it best accomplished (virtual table maybe)?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

curmudgeon
Saving a minimum of 7 bytes I meant.

From: x<mailto:[hidden email]>
Sent: 10 August 2017 12:19
To: SQLite mailing list<mailto:[hidden email]>
Subject: Re: [sqlite] Packing integer primary key with field bits

Thanks for the replies. I’m not sure I agree with Gunter and Ryan though. I’m thinking about this more from the gain in speed rather than saving space.

To clarify, I’m suggesting replacing a compound key (made up of several integer cols) with an integer primary key (which sqlite will use rather than the row id). I have done my homework on this so I’m familiar with Gunter’s points regarding ‘between’ and ‘high end bits’ but will the between on a single integer key not be faster than matching on the first m fields of an n compound key? If an index is needed on any non-high bit col an expression index would work just as fast for lookups (I suppose inserts would be slower). The savings on space would contribute to the speed as each disk read would contain more records.

Even forgetting about keys, if you packed say 8 columns into one int64 column would you not be saving a minimum of 7 bits?


From: Hick Gunter<mailto:[hidden email]>
Sent: 10 August 2017 10:53
To: 'SQLite mailing list'<mailto:[hidden email]>
Subject: Re: [sqlite] Packing integer primary key with field bits

For the sake of the argument, let's try to devise a workable scheme for such an undertaking:

Lets assume you have a 32-bit "real rowid" and four 8-bit "value" fields.

How to distribute these in the 64-bit rowid?

Rrid low = MSB | v1 | v2 | v3 | v4 | r1r2r3r4 | LSB or
Rrid high =  MSB | r1r2r3r4 | v1 | v2 | v3 | v4 | LSB?

Let's try to select a row by real rowid:

SELECT id FROM t WHERE id & 0xffff = 1234;

There is no index on "id & 0xffff" so this translates into a full table scan (even if limited to one row, it still requires on average 1/2 of the rows to be read). Creating an index is no option because we are trying to save space, remember?

SELECT id FROM t WHERE id BETWEEN (1234<<32) and (1234<<32 + 0xffff);

This only works with "real rowid high" ("real rowid low" requires 2^^32 ranges to work) and the lookup is fast, but the endpoints should better be calculated in the calling program.

So for a reasonably fast lookup by "real rowid", it needs to occupy the most significant bits.

Trying to select by one the values, by extension of the arguments given, will either be a nightmare, default to a full table scan or require a covering index (which requires more disk space than the 4-12 bytes per record we have "saved").

Additionally, if any of the statements returns more than one row for any "real rowid" then your table is shot up beyond repair...

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von x
Gesendet: Donnerstag, 10. August 2017 09:45
An: [hidden email]
Betreff: [sqlite] Packing integer primary key with field bits

As in cramming numerous integer columns into a 64 bit integer.

Does anyone do this?

Is the speed gain worth the additional confusion of extracting the columns from the key?

How is it best accomplished (virtual table maybe)?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

Hick Gunter
In reply to this post by curmudgeon
The first problem with packing several integer fields into one is that you lose the capability of expressing NULL values in the packed fields without resorting to complicated and time consuming arithmetics modulo 257.

The next is that you lose the possibility of expressing values outside of the intended range. Even if, at inception, you think you will only need 3 bits (e.g. NULL and 7 discrete values), murphy's law guarantees that there will arise a need for an 8th discrete value and you will have to unpack and repack all of the rows. If you still have some bits left unused.

Another method of avoiding the "wasted" space of a rowid is to declare WITHOUT ROWID tables.

Arguably, comparing 8 bytes is faster than decoding n fields; but the record format of SQLite allows the range of the values to be compared first.

E.g. you have 8 subfields f1 to f8 and your constraint is "f1=1 and f2=2 and f3=4", which translates into "between 0x0102040000000000 and 0x010204ffffffffff"

The index cell would have a (partial) header of s1s2s3s4s5s6s7s8 and a (partial) payload of.[f1] [f2] [f3] [f4] [f5] [f6] [f7] [f8]. Analysis of the constraints yields that the values are all representable in 8 bits, giving a "signature" of s1=9 and s2=1 and s3=1 which serves to eliminate most of the keys by comparing 1 to 3 bytes, and needs to compare 1 or 2 more bytes (f2 and f3, because f1 is not present) to find a match.

Apart from the technical details, overly concerning yourself with the speed of query execution at the beginning of design can be considered "premature optimization". Getting the application to run correctly should be the primary focus. Once it does that, see if the performance is ok. If not, writing better queries that run efficiently is probably orders of magnitude more important than the difference between comparing "rowids" and "keys".

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von x
Gesendet: Donnerstag, 10. August 2017 13:19
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] Packing integer primary key with field bits

Thanks for the replies. I’m not sure I agree with Gunter and Ryan though. I’m thinking about this more from the gain in speed rather than saving space.

To clarify, I’m suggesting replacing a compound key (made up of several integer cols) with an integer primary key (which sqlite will use rather than the row id). I have done my homework on this so I’m familiar with Gunter’s points regarding ‘between’ and ‘high end bits’ but will the between on a single integer key not be faster than matching on the first m fields of an n compound key? If an index is needed on any non-high bit col an expression index would work just as fast for lookups (I suppose inserts would be slower). The savings on space would contribute to the speed as each disk read would contain more records.

Even forgetting about keys, if you packed say 8 columns into one int64 column would you not be saving a minimum of 7 bits?


From: Hick Gunter<mailto:[hidden email]>
Sent: 10 August 2017 10:53
To: 'SQLite mailing list'<mailto:[hidden email]>
Subject: Re: [sqlite] Packing integer primary key with field bits

For the sake of the argument, let's try to devise a workable scheme for such an undertaking:

Lets assume you have a 32-bit "real rowid" and four 8-bit "value" fields.

How to distribute these in the 64-bit rowid?

Rrid low = MSB | v1 | v2 | v3 | v4 | r1r2r3r4 | LSB or Rrid high =  MSB | r1r2r3r4 | v1 | v2 | v3 | v4 | LSB?

Let's try to select a row by real rowid:

SELECT id FROM t WHERE id & 0xffff = 1234;

There is no index on "id & 0xffff" so this translates into a full table scan (even if limited to one row, it still requires on average 1/2 of the rows to be read). Creating an index is no option because we are trying to save space, remember?

SELECT id FROM t WHERE id BETWEEN (1234<<32) and (1234<<32 + 0xffff);

This only works with "real rowid high" ("real rowid low" requires 2^^32 ranges to work) and the lookup is fast, but the endpoints should better be calculated in the calling program.

So for a reasonably fast lookup by "real rowid", it needs to occupy the most significant bits.

Trying to select by one the values, by extension of the arguments given, will either be a nightmare, default to a full table scan or require a covering index (which requires more disk space than the 4-12 bytes per record we have "saved").

Additionally, if any of the statements returns more than one row for any "real rowid" then your table is shot up beyond repair...

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von x
Gesendet: Donnerstag, 10. August 2017 09:45
An: [hidden email]
Betreff: [sqlite] Packing integer primary key with field bits

As in cramming numerous integer columns into a 64 bit integer.

Does anyone do this?

Is the speed gain worth the additional confusion of extracting the columns from the key?

How is it best accomplished (virtual table maybe)?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

R Smith
In reply to this post by curmudgeon
On 2017/08/10 1:19 PM, x wrote:
> Thanks for the replies. I’m not sure I agree with Gunter and Ryan though. I’m thinking about this more from the gain in speed rather than saving space.
>
> To clarify, I’m suggesting replacing a compound key (made up of several integer cols) with an integer primary key (which sqlite will use rather than the row id). I have done my homework on this so I’m familiar with Gunter’s points regarding ‘between’ and ‘high end bits’ but will the between on a single integer key not be faster than matching on the first m fields of an n compound key? If an index is needed on any non-high bit col an expression index would work just as fast for lookups (I suppose inserts would be slower). The savings on space would contribute to the speed as each disk read would contain more records.

Ok, if you require ALL the packed records all the time, and will always
access it by the primary value (the first of the packed values) and is
very very sure you won't ever need expanding the value range, then you
might actually get a speed gain from it.

Problem is, the gain will be minuscule, and the price is high. Lots of
development time, loss of useful SQL aggregates and other functionality,
possible future reworks... All of that for a very small speed gain?  If
you are wanting that, why not simply use a custom structure and avoid
SQLite completely? The speed gain will actually be significant then, and
you're going to lose the SQL-ness of it anyway, so that shouldn't matter.

A structured array mapped to a physical byte-stream will be several
times faster than SQLite (or any other RDBMS for that matter).  SQL as
supported by the average RDBMS is only really helpful when you are
looking for SET-type relational data handling or very large data (and
your use case is specifically not for large data). Most RDBMSes have
great optimizations for speeding up resolving of relational-type
questions and their every-day-use advantages are legion, they are
however without exception NOT faster than - NOR intended to be faster
than - simple byte/structured array handling.

You might even find a synergy between using your own structured array
together with an SQLite DB which only get accessed once you need more
information than persists in the array itself - it's easy to make a
pilot and test the speed gains. And please do that before investing the
time to develop a fully fledged dual system.


>
> Even forgetting about keys, if you packed say 8 columns into one int64 column would you not be saving a minimum of 7 bits?

No you won't, SQLite stores Integer much more efficiently. Unless you
mean use ONLY the 64-bit index and not storing the values in separate
fields in the DB at all, in which case yes, you will save a few bytes,
possibly less than 7 though (I need to affirm the exact number, don't
know off the top...).

Cheers,
Ryan

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

Clemens Ladisch
In reply to this post by curmudgeon
x wrote:
> I’m thinking about this more from the gain in speed rather than saving space.

Database performance is usually limited by I/O, i.e., you gain speed by
saving space.

> I have done my homework on this

So what are the results of your measurements?


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

curmudgeon
In reply to this post by Hick Gunter
Valid point on the intended range Gunter. I don’t know enough about sqlite to fully understand your index cell paragraph. I thought the way sqlite worked was e.g. to get the value of the 3rd column it had to read the lengths of col1 & col2 so it knows where col 3 value starts (I’ve seen a comment that the most retrieved cols should be at the start of the record) and that indexes are stored the same way. If that’s the case would having to consider a single int not always be faster than considering the first few cols (although I suppose it only has to consider columns 2 and up to settle ties on col 1).

Ryan, you might be right on the structured array. Re the 8 cols packed together, I’m thinking it would save at least 7 bytes because sqlite would only require one ‘length byte’ (I’m unsure of the terminology), as opposed to 8, in the record.

Clemens, I was referring to bit shifting with respect to indexes homework. I’ve yet to test anything. Indeed, in this thread, I’m trying to determine if such testing is a waste of time.

Tom



From: Hick Gunter<mailto:[hidden email]>
Sent: 10 August 2017 13:40
To: 'SQLite mailing list'<mailto:[hidden email]>
Subject: Re: [sqlite] Packing integer primary key with field bits

The first problem with packing several integer fields into one is that you lose the capability of expressing NULL values in the packed fields without resorting to complicated and time consuming arithmetics modulo 257.

The next is that you lose the possibility of expressing values outside of the intended range. Even if, at inception, you think you will only need 3 bits (e.g. NULL and 7 discrete values), murphy's law guarantees that there will arise a need for an 8th discrete value and you will have to unpack and repack all of the rows. If you still have some bits left unused.

Another method of avoiding the "wasted" space of a rowid is to declare WITHOUT ROWID tables.

Arguably, comparing 8 bytes is faster than decoding n fields; but the record format of SQLite allows the range of the values to be compared first.

E.g. you have 8 subfields f1 to f8 and your constraint is "f1=1 and f2=2 and f3=4", which translates into "between 0x0102040000000000 and 0x010204ffffffffff"

The index cell would have a (partial) header of s1s2s3s4s5s6s7s8 and a (partial) payload of.[f1] [f2] [f3] [f4] [f5] [f6] [f7] [f8]. Analysis of the constraints yields that the values are all representable in 8 bits, giving a "signature" of s1=9 and s2=1 and s3=1 which serves to eliminate most of the keys by comparing 1 to 3 bytes, and needs to compare 1 or 2 more bytes (f2 and f3, because f1 is not present) to find a match.

Apart from the technical details, overly concerning yourself with the speed of query execution at the beginning of design can be considered "premature optimization". Getting the application to run correctly should be the primary focus. Once it does that, see if the performance is ok. If not, writing better queries that run efficiently is probably orders of magnitude more important than the difference between comparing "rowids" and "keys".

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von x
Gesendet: Donnerstag, 10. August 2017 13:19
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] Packing integer primary key with field bits

Thanks for the replies. I’m not sure I agree with Gunter and Ryan though. I’m thinking about this more from the gain in speed rather than saving space.

To clarify, I’m suggesting replacing a compound key (made up of several integer cols) with an integer primary key (which sqlite will use rather than the row id). I have done my homework on this so I’m familiar with Gunter’s points regarding ‘between’ and ‘high end bits’ but will the between on a single integer key not be faster than matching on the first m fields of an n compound key? If an index is needed on any non-high bit col an expression index would work just as fast for lookups (I suppose inserts would be slower). The savings on space would contribute to the speed as each disk read would contain more records.

Even forgetting about keys, if you packed say 8 columns into one int64 column would you not be saving a minimum of 7 bits?


From: Hick Gunter<mailto:[hidden email]>
Sent: 10 August 2017 10:53
To: 'SQLite mailing list'<mailto:[hidden email]>
Subject: Re: [sqlite] Packing integer primary key with field bits

For the sake of the argument, let's try to devise a workable scheme for such an undertaking:

Lets assume you have a 32-bit "real rowid" and four 8-bit "value" fields.

How to distribute these in the 64-bit rowid?

Rrid low = MSB | v1 | v2 | v3 | v4 | r1r2r3r4 | LSB or Rrid high =  MSB | r1r2r3r4 | v1 | v2 | v3 | v4 | LSB?

Let's try to select a row by real rowid:

SELECT id FROM t WHERE id & 0xffff = 1234;

There is no index on "id & 0xffff" so this translates into a full table scan (even if limited to one row, it still requires on average 1/2 of the rows to be read). Creating an index is no option because we are trying to save space, remember?

SELECT id FROM t WHERE id BETWEEN (1234<<32) and (1234<<32 + 0xffff);

This only works with "real rowid high" ("real rowid low" requires 2^^32 ranges to work) and the lookup is fast, but the endpoints should better be calculated in the calling program.

So for a reasonably fast lookup by "real rowid", it needs to occupy the most significant bits.

Trying to select by one the values, by extension of the arguments given, will either be a nightmare, default to a full table scan or require a covering index (which requires more disk space than the 4-12 bytes per record we have "saved").

Additionally, if any of the statements returns more than one row for any "real rowid" then your table is shot up beyond repair...

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von x
Gesendet: Donnerstag, 10. August 2017 09:45
An: [hidden email]
Betreff: [sqlite] Packing integer primary key with field bits

As in cramming numerous integer columns into a 64 bit integer.

Does anyone do this?

Is the speed gain worth the additional confusion of extracting the columns from the key?

How is it best accomplished (virtual table maybe)?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

Paul Sanderson
In reply to this post by R Smith
Space savings will depend very much on what other data is in the table.

If you have a 4096 byte page size and with an average record size of 1000
bytes then saving 7 bytes for each of the 4 records wont free up enough
space to fit a new record into that page. So savings in this scenario will
effectively be nil.

If on the otherhand the average record is 100 bytes you may well fit more
records into the page, conversely changing the page size to 64K would also
reduce the number of reads.

I suspect that biggest time savings may be gained by reducing disk I/O.

Better advice could possibly be given if we know the full table schema
including typical sizes for data in any fields/

Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence

On 10 August 2017 at 14:13, R Smith <[hidden email]> wrote:

> On 2017/08/10 1:19 PM, x wrote:
>
>> Thanks for the replies. I’m not sure I agree with Gunter and Ryan though.
>> I’m thinking about this more from the gain in speed rather than saving
>> space.
>>
>> To clarify, I’m suggesting replacing a compound key (made up of several
>> integer cols) with an integer primary key (which sqlite will use rather
>> than the row id). I have done my homework on this so I’m familiar with
>> Gunter’s points regarding ‘between’ and ‘high end bits’ but will the
>> between on a single integer key not be faster than matching on the first m
>> fields of an n compound key? If an index is needed on any non-high bit col
>> an expression index would work just as fast for lookups (I suppose inserts
>> would be slower). The savings on space would contribute to the speed as
>> each disk read would contain more records.
>>
>
> Ok, if you require ALL the packed records all the time, and will always
> access it by the primary value (the first of the packed values) and is very
> very sure you won't ever need expanding the value range, then you might
> actually get a speed gain from it.
>
> Problem is, the gain will be minuscule, and the price is high. Lots of
> development time, loss of useful SQL aggregates and other functionality,
> possible future reworks... All of that for a very small speed gain?  If you
> are wanting that, why not simply use a custom structure and avoid SQLite
> completely? The speed gain will actually be significant then, and you're
> going to lose the SQL-ness of it anyway, so that shouldn't matter.
>
> A structured array mapped to a physical byte-stream will be several times
> faster than SQLite (or any other RDBMS for that matter).  SQL as supported
> by the average RDBMS is only really helpful when you are looking for
> SET-type relational data handling or very large data (and your use case is
> specifically not for large data). Most RDBMSes have great optimizations for
> speeding up resolving of relational-type questions and their every-day-use
> advantages are legion, they are however without exception NOT faster than -
> NOR intended to be faster than - simple byte/structured array handling.
>
> You might even find a synergy between using your own structured array
> together with an SQLite DB which only get accessed once you need more
> information than persists in the array itself - it's easy to make a pilot
> and test the speed gains. And please do that before investing the time to
> develop a fully fledged dual system.
>
>
>
>> Even forgetting about keys, if you packed say 8 columns into one int64
>> column would you not be saving a minimum of 7 bits?
>>
>
> No you won't, SQLite stores Integer much more efficiently. Unless you mean
> use ONLY the 64-bit index and not storing the values in separate fields in
> the DB at all, in which case yes, you will save a few bytes, possibly less
> than 7 though (I need to affirm the exact number, don't know off the
> top...).
>
> Cheers,
> Ryan
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

R Smith
In reply to this post by curmudgeon


On 2017/08/10 4:51 PM, x wrote:
> Valid point on the intended range Gunter. I don’t know enough about sqlite to fully understand your index cell paragraph. I thought the way sqlite worked was e.g. to get the value of the 3rd column it had to read the lengths of col1 & col2 so it knows where col 3 value starts (I’ve seen a comment that the most retrieved cols should be at the start of the record)

This is true, but the cost of "reading" the bytes to get to the values
are no worse than what your cost would be for reading the Int64 and
unpacking the bytes from there... SQLite internally does the same thing
only without using an Int64, it just uses an array of bytes, but gleans
the values equally efficiently. If it was better to use Int64 values in
stead of a byte-array, SQLite would already be using it. By the way,
Int64 is just another way of saying "list of 8 bytes", which is just
another way of saying "list of 64 bits", there is nothing magically more
efficient about 64bit INT than a list of 8 bytes. Memory is memory.
Your idea might offer savings in that you predetermine the length of
stored values and ensure they fit into that list of 8 bytes, whereas
SQLite stores also a Length, but has the vastly superior property of
being able to store any size of any type of variable. Again, the speed
gain is dubious for the price it comes at.

To make matters worse, the real bottleneck is usually the storage layer,
and to read even 1 byte from a row, or Index, usually requires reading
an entire File-System Page (Typically ~4 Thousand bytes) but some
successive reads are usually gotten from the same page to make matters
more efficient. The point being, the reading of that File system page is
several magnitudes more time-intensive than unpacking the bytes. You
will not save anything much speed-wise, and very little space-wise, and
then only for smaller row-lengths.

By the way, the same goes for reading your own array of bytes from the
storage medium - the hope is of course that some good caching at both
the storage layer and your application layer would smooth out read/write
waits.

>   and that indexes are stored the same way.

This is not quite true. The Index is (usually) a B-Tree structure with
consistent size value indices. An Index is built for speed, not storage
efficiency. Adding an Index effectively copies the entire column(s) the
Index refers to into slightly less efficient (but more accessible)
storage - It's much worse space-wise than simply using separate fields
and no Index.
[Note: I am not 100% sure about the BTree storage format in SQLite's
case, I refer here to other DB Engines, but I would be surprised if an
SQLite BTree Index is stored exactly the same as the table Row format.]


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

curmudgeon
In reply to this post by Paul Sanderson
Paul, The record size will be fairly small. Probably under 100 bytes. Agree with you on your reduced disk I/o point, particularly as the index size might be significantly smaller than the full blown index size.


From: Paul Sanderson<mailto:[hidden email]>
Sent: 10 August 2017 16:15
To: SQLite mailing list<mailto:[hidden email]>
Subject: Re: [sqlite] Packing integer primary key with field bits

Space savings will depend very much on what other data is in the table.

If you have a 4096 byte page size and with an average record size of 1000
bytes then saving 7 bytes for each of the 4 records wont free up enough
space to fit a new record into that page. So savings in this scenario will
effectively be nil.

If on the otherhand the average record is 100 bytes you may well fit more
records into the page, conversely changing the page size to 64K would also
reduce the number of reads.

I suspect that biggest time savings may be gained by reducing disk I/O.

Better advice could possibly be given if we know the full table schema
including typical sizes for data in any fields/

Paul
www.sandersonforensics.com<http://www.sandersonforensics.com>
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence

On 10 August 2017 at 14:13, R Smith <[hidden email]> wrote:

> On 2017/08/10 1:19 PM, x wrote:
>
>> Thanks for the replies. I’m not sure I agree with Gunter and Ryan though.
>> I’m thinking about this more from the gain in speed rather than saving
>> space.
>>
>> To clarify, I’m suggesting replacing a compound key (made up of several
>> integer cols) with an integer primary key (which sqlite will use rather
>> than the row id). I have done my homework on this so I’m familiar with
>> Gunter’s points regarding ‘between’ and ‘high end bits’ but will the
>> between on a single integer key not be faster than matching on the first m
>> fields of an n compound key? If an index is needed on any non-high bit col
>> an expression index would work just as fast for lookups (I suppose inserts
>> would be slower). The savings on space would contribute to the speed as
>> each disk read would contain more records.
>>
>
> Ok, if you require ALL the packed records all the time, and will always
> access it by the primary value (the first of the packed values) and is very
> very sure you won't ever need expanding the value range, then you might
> actually get a speed gain from it.
>
> Problem is, the gain will be minuscule, and the price is high. Lots of
> development time, loss of useful SQL aggregates and other functionality,
> possible future reworks... All of that for a very small speed gain?  If you
> are wanting that, why not simply use a custom structure and avoid SQLite
> completely? The speed gain will actually be significant then, and you're
> going to lose the SQL-ness of it anyway, so that shouldn't matter.
>
> A structured array mapped to a physical byte-stream will be several times
> faster than SQLite (or any other RDBMS for that matter).  SQL as supported
> by the average RDBMS is only really helpful when you are looking for
> SET-type relational data handling or very large data (and your use case is
> specifically not for large data). Most RDBMSes have great optimizations for
> speeding up resolving of relational-type questions and their every-day-use
> advantages are legion, they are however without exception NOT faster than -
> NOR intended to be faster than - simple byte/structured array handling.
>
> You might even find a synergy between using your own structured array
> together with an SQLite DB which only get accessed once you need more
> information than persists in the array itself - it's easy to make a pilot
> and test the speed gains. And please do that before investing the time to
> develop a fully fledged dual system.
>
>
>
>> Even forgetting about keys, if you packed say 8 columns into one int64
>> column would you not be saving a minimum of 7 bits?
>>
>
> No you won't, SQLite stores Integer much more efficiently. Unless you mean
> use ONLY the 64-bit index and not storing the values in separate fields in
> the DB at all, in which case yes, you will save a few bytes, possibly less
> than 7 though (I need to affirm the exact number, don't know off the
> top...).
>
> Cheers,
> Ryan
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

curmudgeon
In reply to this post by R Smith
Ryan, I know what you’re saying about the list of bytes but I’d like it confirmed that the indexes are not stored in a format similar to the data.

Do you not think the cost of unpacking the bytes would be insignificant?

Another possibility that should be considered is that if the database is reduced considerably it might fit in memory or, at the vey least, memory caches will be able to hold a lot more records.

Tom

From: R Smith<mailto:[hidden email]>
Sent: 10 August 2017 16:24
To: [hidden email]<mailto:[hidden email]>
Subject: Re: [sqlite] Packing integer primary key with field bits



On 2017/08/10 4:51 PM, x wrote:
> Valid point on the intended range Gunter. I don’t know enough about sqlite to fully understand your index cell paragraph. I thought the way sqlite worked was e.g. to get the value of the 3rd column it had to read the lengths of col1 & col2 so it knows where col 3 value starts (I’ve seen a comment that the most retrieved cols should be at the start of the record)

This is true, but the cost of "reading" the bytes to get to the values
are no worse than what your cost would be for reading the Int64 and
unpacking the bytes from there... SQLite internally does the same thing
only without using an Int64, it just uses an array of bytes, but gleans
the values equally efficiently. If it was better to use Int64 values in
stead of a byte-array, SQLite would already be using it. By the way,
Int64 is just another way of saying "list of 8 bytes", which is just
another way of saying "list of 64 bits", there is nothing magically more
efficient about 64bit INT than a list of 8 bytes. Memory is memory.
Your idea might offer savings in that you predetermine the length of
stored values and ensure they fit into that list of 8 bytes, whereas
SQLite stores also a Length, but has the vastly superior property of
being able to store any size of any type of variable. Again, the speed
gain is dubious for the price it comes at.

To make matters worse, the real bottleneck is usually the storage layer,
and to read even 1 byte from a row, or Index, usually requires reading
an entire File-System Page (Typically ~4 Thousand bytes) but some
successive reads are usually gotten from the same page to make matters
more efficient. The point being, the reading of that File system page is
several magnitudes more time-intensive than unpacking the bytes. You
will not save anything much speed-wise, and very little space-wise, and
then only for smaller row-lengths.

By the way, the same goes for reading your own array of bytes from the
storage medium - the hope is of course that some good caching at both
the storage layer and your application layer would smooth out read/write
waits.

>   and that indexes are stored the same way.

This is not quite true. The Index is (usually) a B-Tree structure with
consistent size value indices. An Index is built for speed, not storage
efficiency. Adding an Index effectively copies the entire column(s) the
Index refers to into slightly less efficient (but more accessible)
storage - It's much worse space-wise than simply using separate fields
and no Index.
[Note: I am not 100% sure about the BTree storage format in SQLite's
case, I refer here to other DB Engines, but I would be surprised if an
SQLite BTree Index is stored exactly the same as the table Row format.]


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

Nico Williams
In reply to this post by Clemens Ladisch
On Thu, Aug 10, 2017 at 03:36:31PM +0200, Clemens Ladisch wrote:
> x wrote:
> > I’m thinking about this more from the gain in speed rather than saving space.
>
> Database performance is usually limited by I/O, i.e., you gain speed by
> saving space.

Then proper compression (e.g., with zlib) will be much better for you
than what you're trying to do.

I don't know how to get SQLite3 DBs compressed though, short of writing
your own VFS (one might already exist) or hosting on ZFS (which works
very well).

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

Bob Friesenhahn
In reply to this post by Clemens Ladisch
On Thu, 10 Aug 2017, Clemens Ladisch wrote:

> x wrote:
>> I’m thinking about this more from the gain in speed rather than saving space.
>
> Database performance is usually limited by I/O, i.e., you gain speed by
> saving space.

To be clear, database performance is usually limited by the number of
I/O operations possible within a given amount of time (IOPS) rather
than the bandwidth (bytes/second) available.

If the database is small enough to fit in the OS filesystem cache,
then hardly any I/O operations to underlying store will be needed to
satisfy requests.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

R Smith
In reply to this post by curmudgeon
On 2017/08/10 5:54 PM, x wrote:
> Ryan, I know what you’re saying about the list of bytes but I’d like it confirmed that the indexes are not stored in a format similar to the data.

Oh it's similar, just not the same. I looked it up:
http://sqlite.org/fileformat.html
Fascinating reading. Take special note of section 1.5 that discusses the
B-Tree formats at length, and 2.1 for the Record format, which also
clarifies exactly SQLite's varint type that explains how the Integer
packing, which we've referred to in previous posts, happens.

> Do you not think the cost of unpacking the bytes would be insignificant?

Never sure how to answer questions asked in the negative....  YES, I
don't think so...  or NO, I don't think so - which makes more sense?

Jokes aside, I agree, I do think it insignificant - I just don't think
it's significantly more insignificant than the cost of SQLite unpacking
the fields for you. My point is that it is already insignificant enough
given the givens. Any improvement to be had has a very high EPR.

(That's my invented measure: *Effort-to-Pleasure Ratio*.  I strive to
lower it for everything.)  :)

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Packing integer primary key with field bits

Jens Alfke-2
In reply to this post by Nico Williams

> On Aug 10, 2017, at 9:36 AM, Nico Williams <[hidden email]> wrote:
>
> Then proper compression (e.g., with zlib) will be much better for you
> than what you're trying to do.

zlib isn’t fast enough for usage like this. It’s better to use something like Snappy, which was designed to be very fast, at the expense of less compression. It works very well for database content.

If I were going to use it, I’d implement a pair of SQL functions to compress/uncompress blobs, then wrap calls to those around the SQL statements that read and write the column I wanted to compress.

(Of course this is total overkill for storing a couple of integers as discussed here.)

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12