Quantcast

BLOB sizes beyond 2GB?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

BLOB sizes beyond 2GB?

Andrew Cunningham-2
HI,
Is it likely the maximum BLOB size will be increased in a not too distant
future version of SQLite?

In a world of machines where 1TB memory is not unusual the current  upper
limit of ~2GB is proving to be restrictive for my use.

One might suggest that storing binary data of that size using SQLite is
abusing SQLite and an alternative, such as HDF , should be used.  To a
certain extent that is true, but having to manage multiple files with some
data in SQLite and other data in HDF files brings along another set of
complications. And SQLite reads/writes BLOBs at disk access speeds so
SQLite does not suffer from efficiency issues. Plus SQLite supports partial
BLOB read/write.

Andrew
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BLOB sizes beyond 2GB?

Richard Hipp-3
On 3/27/17, Andrew Cunningham <[hidden email]> wrote:
> HI,
> Is it likely the maximum BLOB size will be increased in a not too distant
> future version of SQLite?
>

The maximum blob size could, in theory, be increased to 4GiB.  But the
current file format will not accommodate anything larger than that.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BLOB sizes beyond 2GB?

Dominique Devienne
On Tue, Mar 28, 2017 at 11:08 AM, Richard Hipp <[hidden email]> wrote:

> On 3/27/17, Andrew Cunningham <[hidden email]> wrote:
> > Is it likely the maximum BLOB size will be increased in a not too distant
> > future version of SQLite?
>
> The maximum blob size could, in theory, be increased to 4GiB.  But the
> current file format will not accommodate anything larger than that.


Any chance you'd elaborate on which the format is blocking here?
I have no doubt you're right, but I'd really appreciate a better
understanding of that limitation.

As a naive developer, I can see the Record Format [1] uses a varint, which
can go up to 64-bit integers.
And also that there are Serial Types 10,11, which are "Not used. Reserved
for expansion".

Which combined with The B-tree Page Format [2], which has only 4 page types,
while a whole bytes is available for blob pages, a new blob-specific page
type would seem possible.

Given the above, I can (wrongly) imagine use Record Type 10 for "new-style
blobs",
which store a varint length for the "blob index" that follows, where that
blob index is a ordered list
of page-specific page numbers (as varints or not) where that blob is stored.

In such a scheme, updating a single byte of a blob requires changing 1 blob
page,
and the page(s) storing the "blob index"; and blobs can also be expanded or
contracted
transitionally w/o having to rewrite the whole blob.

I'm just trying to understand how/where that mental model is wrong. Thanks,
--DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BLOB sizes beyond 2GB?

Paul Sanderson
I am sure Richard will correct me if I am wrong. But...

The format for a record is

1. payload length varint
2. rowid varint (optional)
3. serial type array varint
4. serial types
followed by the data for the serial types

The issue are as I see them:

The payload length varint above, this is the sum of 3 + 4 above plus all of
the following data forming the record. So as things stand you can't store
any record where the sum of the bytes in the serial types array and the
actual data that follows is greater than MAXVARINT because the total length
must be stored in 1. (MAXVARINT is actually max positive varint - see
below).

If you want to use one of the reserved serial types to store a blob of 6GB
then the serial type itself must be capable of storing the size of the
blob. Currently, a blob has *any* serial type of >= 12 and even, so the
maximum size for a blob is (MAXVARINT-12)/2 i.e. *any* even serial type >=
12 and a text serial type is any odd serial type >= 13. All of the
remaining utilised serial types (i.e. those <= 9) refer to fixed length
data (ints and a 64 bit real).

The remaining 2 serial types (remember these are just two bits from a
64-bit serial type, each serial type is not a separate varint in its own
right) could be used to signify something like a 128-bit integer or some
other fixed-length data type, but, 1 bit by definition cannot store an
arbitrary length value.

I guess that the change Richard mentions (to up to 4GB) would be by
treating the varints as unsigned integers, rather than signed as they
currently are. This could be done (as far as I can see) for all varints
other than the rowid without affecting existing DBs.











Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence

On 28 March 2017 at 11:08, Dominique Devienne <[hidden email]> wrote:

> On Tue, Mar 28, 2017 at 11:08 AM, Richard Hipp <[hidden email]> wrote:
>
> > On 3/27/17, Andrew Cunningham <[hidden email]> wrote:
> > > Is it likely the maximum BLOB size will be increased in a not too
> distant
> > > future version of SQLite?
> >
> > The maximum blob size could, in theory, be increased to 4GiB.  But the
> > current file format will not accommodate anything larger than that.
>
>
> Any chance you'd elaborate on which the format is blocking here?
> I have no doubt you're right, but I'd really appreciate a better
> understanding of that limitation.
>
> As a naive developer, I can see the Record Format [1] uses a varint, which
> can go up to 64-bit integers.
> And also that there are Serial Types 10,11, which are "Not used. Reserved
> for expansion".
>
> Which combined with The B-tree Page Format [2], which has only 4 page
> types,
> while a whole bytes is available for blob pages, a new blob-specific page
> type would seem possible.
>
> Given the above, I can (wrongly) imagine use Record Type 10 for "new-style
> blobs",
> which store a varint length for the "blob index" that follows, where that
> blob index is a ordered list
> of page-specific page numbers (as varints or not) where that blob is
> stored.
>
> In such a scheme, updating a single byte of a blob requires changing 1 blob
> page,
> and the page(s) storing the "blob index"; and blobs can also be expanded or
> contracted
> transitionally w/o having to rewrite the whole blob.
>
> I'm just trying to understand how/where that mental model is wrong. Thanks,
> --DD
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BLOB sizes beyond 2GB?

Dominique Devienne
On Tue, Mar 28, 2017 at 12:52 PM, Paul Sanderson <
[hidden email]> wrote:

> I am sure Richard will correct me if I am wrong. But...
>
> The format for a record is
>
> 1. payload length varint
> 2. rowid varint (optional)
> 3. serial type array varint
> 4. serial types
> followed by the data for the serial types
>
> The issue are as I see them:
>
> The payload length varint above, this is the sum of 3 + 4 above plus all of
> the following data forming the record. So as things stand you can't store
> any record where the sum of the bytes in the serial types array and the
> actual data that follows is greater than MAXVARINT because the total length
> must be stored in 1. (MAXVARINT is actually max positive varint - see
> below).
>

Good point. But still, MAXVARINT is 64-bit (see below) not 32-bit.

The record format makes extensive use of the variable-length integer or
> varint representation of 64-bit signed integers defined above.
>


> If you want to use one of the reserved serial types to store a blob of 6GB
> then the serial type itself must be capable of storing the size of the
> blob. Currently, a blob has *any* serial type of >= 12 and even, so the
> maximum size for a blob is (MAXVARINT-12)/2 i.e. *any* even serial type >=
> 12 and a text serial type is any odd serial type >= 13. All of the
> remaining utilised serial types (i.e. those <= 9) refer to fixed length
> data (ints and a 64 bit real).
>

I understand that. That's why I put the length in the "old style" blob
value itself.
But again, the varint encodes a 64-bit signed integer, and the "new style"
blob could
be assumed if the blob length exceed 2GiB (or 4 GiB), not even resorting to
the
two reserved serial types.


> The remaining 2 serial types (remember these are just two bits from a
> 64-bit serial type, each serial type is not a separate varint in its own
> right) could be used to signify something like a 128-bit integer or some
> other fixed-length data type, but, 1 bit by definition cannot store an
> arbitrary length value.
>

I understand that (see above). But using the level of indirection of storing
in the record only the meta-data of the blob, e.g. its full length, its
in-record
length (in case using 10, or 11 serial type, which cannot encode the length
like the traditional text and blob serial types), and the ordered list of
blob
pages to read the blob from, seems completely possible.


> I guess that the change Richard mentions (to up to 4GB) would be by
> treating the varints as unsigned integers, rather than signed as they
> currently are. This could be done (as far as I can see) for all varints
> other than the rowid without affecting existing DBs.
>

That would be an implementation limitation though, not a file format
limitation.

Again, I'm probably naive here, but I still don't clearly see the file
format limitation,
and that's what I'm trying to understand. I completely accept this would be
a lot of
work and that the incentive for Richard to do it is rather low, to
extremely low, although
of course that does bum me out, I have to admit :), but really
understanding the
limitation I'm not seeing now is what I'm after here. Thanks, --DD

PS: The alternate scheme of assuming new-style blob for length > 4 GiB,
which is more backward-compatible, could be further refined via a pragma to
put it lower, make the DB incompatible with older SQLite versions, but no
more
than the many other opt-in features old versions don't support.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BLOB sizes beyond 2GB?

Paul Sanderson
I was a bit/very dull, schoolboy error :( re 32-bit - long week and it's
only Tuesday :)

Re: storing the length in the blob itself this would affect parsing the
serial types where, as now, you can determine the record length by looking
at the serial types and 'skip' through them to load a specific column. If
the length is stored in the record itself then reading past a blob means
that the first part of the blob itself needs to be read. Whether this would
have any significant impact on speed for parsing serial types in general I
don't know.





Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence

On 28 March 2017 at 12:33, Dominique Devienne <[hidden email]> wrote:

> On Tue, Mar 28, 2017 at 12:52 PM, Paul Sanderson <
> [hidden email]> wrote:
>
> > I am sure Richard will correct me if I am wrong. But...
> >
> > The format for a record is
> >
> > 1. payload length varint
> > 2. rowid varint (optional)
> > 3. serial type array varint
> > 4. serial types
> > followed by the data for the serial types
> >
> > The issue are as I see them:
> >
> > The payload length varint above, this is the sum of 3 + 4 above plus all
> of
> > the following data forming the record. So as things stand you can't store
> > any record where the sum of the bytes in the serial types array and the
> > actual data that follows is greater than MAXVARINT because the total
> length
> > must be stored in 1. (MAXVARINT is actually max positive varint - see
> > below).
> >
>
> Good point. But still, MAXVARINT is 64-bit (see below) not 32-bit.
>
> The record format makes extensive use of the variable-length integer or
> > varint representation of 64-bit signed integers defined above.
> >
>
>
> > If you want to use one of the reserved serial types to store a blob of
> 6GB
> > then the serial type itself must be capable of storing the size of the
> > blob. Currently, a blob has *any* serial type of >= 12 and even, so the
> > maximum size for a blob is (MAXVARINT-12)/2 i.e. *any* even serial type
> >=
> > 12 and a text serial type is any odd serial type >= 13. All of the
> > remaining utilised serial types (i.e. those <= 9) refer to fixed length
> > data (ints and a 64 bit real).
> >
>
> I understand that. That's why I put the length in the "old style" blob
> value itself.
> But again, the varint encodes a 64-bit signed integer, and the "new style"
> blob could
> be assumed if the blob length exceed 2GiB (or 4 GiB), not even resorting to
> the
> two reserved serial types.
>
>
> > The remaining 2 serial types (remember these are just two bits from a
> > 64-bit serial type, each serial type is not a separate varint in its own
> > right) could be used to signify something like a 128-bit integer or some
> > other fixed-length data type, but, 1 bit by definition cannot store an
> > arbitrary length value.
> >
>
> I understand that (see above). But using the level of indirection of
> storing
> in the record only the meta-data of the blob, e.g. its full length, its
> in-record
> length (in case using 10, or 11 serial type, which cannot encode the length
> like the traditional text and blob serial types), and the ordered list of
> blob
> pages to read the blob from, seems completely possible.
>
>
> > I guess that the change Richard mentions (to up to 4GB) would be by
> > treating the varints as unsigned integers, rather than signed as they
> > currently are. This could be done (as far as I can see) for all varints
> > other than the rowid without affecting existing DBs.
> >
>
> That would be an implementation limitation though, not a file format
> limitation.
>
> Again, I'm probably naive here, but I still don't clearly see the file
> format limitation,
> and that's what I'm trying to understand. I completely accept this would be
> a lot of
> work and that the incentive for Richard to do it is rather low, to
> extremely low, although
> of course that does bum me out, I have to admit :), but really
> understanding the
> limitation I'm not seeing now is what I'm after here. Thanks, --DD
>
> PS: The alternate scheme of assuming new-style blob for length > 4 GiB,
> which is more backward-compatible, could be further refined via a pragma to
> put it lower, make the DB incompatible with older SQLite versions, but no
> more
> than the many other opt-in features old versions don't support.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...