sqlite3_serialize / sqlite3_deserialize (3.23.0 draft)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

sqlite3_serialize / sqlite3_deserialize (3.23.0 draft)

Olivier Mascia
Hello,

What are use cases for these sqlite3_serialize / deserialize?
I understand what they do, from the documentation.
Though I'd like to understand what typically they were introduced for?
(Always trying to learn something here). :)

--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3_serialize / sqlite3_deserialize (3.23.0 draft)

Clemens Ladisch
Olivier Mascia wrote:
> What are use cases for these sqlite3_serialize / deserialize?

A common question is "how do I save my in-memory database to disk?"
(The common answer is "use the backup API" or "use an on-disk DB
in the first place".)

Anyway, there are cases where you have a blob containing a database
(e.g., embedded in a blob field in another database, or stored in flash
in an embedded system) and do not want to write it to a file on order
to be able to access it.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3_serialize / sqlite3_deserialize (3.23.0 draft)

Richard Hipp-3
In reply to this post by Olivier Mascia
On 3/18/18, Olivier Mascia <[hidden email]> wrote:
> Hello,
>
> What are use cases for these sqlite3_serialize / deserialize?
> I understand what they do, from the documentation.
> Though I'd like to understand what typically they were introduced for?
> (Always trying to learn something here). :)

These APIs support the concept of using small databases (small enough
to fit in memory) as a container for passing information around.

For example:  A client program might communicate with a server using
an HTTP request and sending an SQLite database as the POST data.  Or
the server might send an SQLite database as the reply.  If I had had
good sense, I might have designed Fossil to do this for the "sync" and
"clone" commands, rather than the elaborate protocol
(https://www.fossil-scm.org/fossil/doc/trunk/www/sync.wiki) it
currently uses.  Git does send a database for sync and clone, though
it is a bespoke "pack-file" key/value database, not an SQLite
database, and that is one aspect of Git that I think is superior to
Fossil.

Advantages of using an SQLite database as a container:

(1) It is easy to mix text and binary data without having to worry
with encodings.

(2) Applications can be easily enhanced and extended in
backwards-compatible ways by adding new columns and/or tables.

(3) Easy to manually view or modify the container content during
testing and debugging. (JSON also has this property, but JSON does not
work with binary data.)

(4) No need to write encoder/decoder logic.

(5) There is a high-level query language available to extract the
content in an order that might be very different from the order it
appears in the file.

(6) No need to worry with big-endian vs. little-endian translation,
nor UTF8 vs UTF16.  The database handles that automatically.

The first use-case for sqlite3_deserialize() was in the "sessionfuzz"
test program which we use together with AFL to fuzz-test the changeset
objects associated with the session extension.  The original
"sessionfuzz" accepted one or more command-line arguments where each
argument was a file containing a single changeset.  But, we also now
allow any of the arguments to be an SQLite database file containing
many separate changesets.  (An example of such a database is in the
test/sessionfuzz-data1.db file in the repository.)  The sessionfuzz
program reads each file named on the command-line into memory.  Then
it checks to see if the file begins with "SQLite format 3".  If it
does, that means the file is an SQLite database and it is processed
accordingly.  If not, it assumes the file is a corrupted changeset
that is a fuzzer input.  The sqlite3_deserialize() interface
streamlines the decoding process in cases where the chunk of memory
containing the file does begin with "SQLite format 3".

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3_serialize / sqlite3_deserialize (3.23.0 draft)

Olivier Mascia
> Le 18 mars 2018 à 11:31, Richard Hipp <[hidden email]> a écrit :
>
>> What are use cases for these sqlite3_serialize / deserialize?
>> I understand what they do, from the documentation.
>> Though I'd like to understand what typically they were introduced for?
>> (Always trying to learn something here). :)
>
> These APIs support the concept of using small databases (small enough
> to fit in memory) as a container for passing information around.
>
> ...
> Advantages of using an SQLite database as a container:
>
> (1) It is easy to mix text and binary data without having to worry
> with encodings.
>
> (2) Applications can be easily enhanced and extended in
> backwards-compatible ways by adding new columns and/or tables.
>
> (3) Easy to manually view or modify the container content during
> testing and debugging. (JSON also has this property, but JSON does not
> work with binary data.)
>
> (4) No need to write encoder/decoder logic.
>
> (5) There is a high-level query language available to extract the
> content in an order that might be very different from the order it
> appears in the file.
>
> (6) No need to worry with big-endian vs. little-endian translation,
> nor UTF8 vs UTF16.  The database handles that automatically.

Clever concept.

I'll get code and play with it, but even before that, one question burns my mind: what about page size? Is the serialized format quite compact, or full of void unused space on 'pages'? (that obviously a good external compression would get rid of). Or said differently, how far or close is the serialized format to the on-disk SQLite file format?

Thanks,
--
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3_serialize / sqlite3_deserialize (3.23.0 draft)

Richard Hipp-3
On 3/18/18, Olivier Mascia <[hidden email]> wrote:
>
> Is the serialized format quite compact, or full
> of void unused space on 'pages'? (that obviously a good external compression
> would get rid of). Or said differently, how far or close is the serialized
> format to the on-disk SQLite file format?

The serialization format is exactly the on-disk format.  So you can
write the serialization into a file, then open that file as a
database.  Or you can read a file off of disk into memory then
"deserialize" that blob into a database.

Normally when you open a :memory: database, the pages are spread out
in memory at arbitrary locations.  But with sqlite3_deserialize(),
SQLite keeps all the pages in one contiguous blob.

As for size, I have found that using page_size=512 gives maximum space
efficiency.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite3_serialize / sqlite3_deserialize (3.23.0 draft)

Dominique Devienne
On Sun, Mar 18, 2018 at 4:48 PM, Richard Hipp <[hidden email]> wrote:
> These APIs support the concept of using small databases (small enough
> to fit in memory) as a container for passing information around.

I very much like the concept. Thank you for this addition.

But then, this is begging for a JSON1-like extension to access an SQLite DB
inside a blob-column :)

Sure, that's denormalized, we all know that happens (case in point JSON1)
and for some types of
data like arrays (whose elements have no "natural keys"), that would be one
way. But then the 100 bytes
SQLite header, 1-page sqlite_master (for 1 table to store something into),
and at east 1 page for that table
puts the overhead of the first byte of data to 100+512+512 = 1124 byte, too
much for smallish data...


> On 3/18/18, Olivier Mascia <[hidden email]> wrote:
> > Is the serialized format quite compact, or full
> > of void unused space on 'pages'? (that obviously a good external
> compression
> > would get rid of). Or said differently, how far or close is the
> serialized
> > format to the on-disk SQLite file format?
>
> The serialization format is exactly the on-disk format.  So you can
> write the serialization into a file, then open that file as a
> database.  Or you can read a file off of disk into memory then
> "deserialize" that blob into a database.
>
> Normally when you open a :memory: database, the pages are spread out
> in memory at arbitrary locations.  But with sqlite3_deserialize(),
> SQLite keeps all the pages in one contiguous blob.
>
> As for size, I have found that using page_size=512 gives maximum space
> efficiency.
>

I guess Olivier's "compactness" question could also be viewed in terms of
"empty" pages after deletes,
or partially empty pages, i.e. are in-memory DBs always implicitly in a
"vaccumed" state, w/o any "unused" pages in the middle? --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users