Random-access sequences

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Random-access sequences

Matthias-Christian Ott
I know that this question is not strictly related to SQLite.

I want to persist a random-access sequence with SQLite. A random-access
sequence is a queue with random-access to its queued elements. Suppose
that a random-access sequence [x_0, ..., x_n] has the following operations:

enqueue([x_0, ..., x_n], x) = [x_0, ..., x_n, x]
dequeue([x_0, ..., x_n]) = [x_1, ..., x_n]
lookup([x_0, ..., x_n], i) = x_i
update([x_0, ..., x_n], i, y) =
  [x_0, ..., x_(i - 1), y, x_(i + 1), ... x_n]
length([x_0, ..., x_n]) = n + 1

A naive implementation list-based would have at least one operation that
is in O(n).

A possible implementation in SQLite with all operations in O(log n)
could be:

CREATE TABLE sequence (
  position INTEGER PRIMARY KEY AUTOINCREMENT,
  ...
);

-- enqueue
INSERT INTO sequence VALUES (...);

-- dequeue
SELECT position, ... FROM sequence ORDER BY position ASC LIMIT 1;
DELETE FROM sequence WHERE position = :position;

-- lookup
SELECT position AS start, ... FROM sequence
  ORDER BY position ASC LIMIT 1;
SELECT ... FROM sequence WHERE position = :start + :i;

-- update
UPDATE sequence SET ... WHERE position = :i;

-- length
SELECT position AS start, ... FROM sequence
  ORDER BY position ASC LIMIT 1;
SELECT position AS end, ... FROM sequence
  ORDER BY position ASC LIMIT 1;
SELECT :end - :start + 1;

Unfortunately, this limits the maximum number of elements that can ever
be inserted during a table's life-time to 2^63 - 1. While this might be
acceptable in some cases it is an artificial limitation.

It seems that one way that is not subject to to such limitations is to
abuse tables as lists and use algorithms for purely functional data
structures. There is an algorithm that could be used to implement
random-access sequences in SQLite efficiently [1] but it's neither
elegant nor simple.

Does anyone have another idea how to elegantly implement random-access
sequences in O(log n) in SQLite?

- Matthias-Christian

[1] Robert Hood and Robert Melville. Real-time queue operations in pure
    LISP
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Random-access sequences

Stephan Beal-3
On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott <[hidden email]>
wrote:

> Unfortunately, this limits the maximum number of elements that can ever
> be inserted during a table's life-time to 2^63 - 1. While this might be
> acceptable in some cases it is an artificial limitation.
>

Artificial, yes, but so is "64 bits." You will likely hit other limitations
far before getting anywhere near 2^63-1 insertions:

https://www.sqlite.org/limits.html

e.g. point #13:

*Maximum Number Of Rows In A Table*

The theoretical maximum number of rows in a table is 264 (18446744073709551616
or about 1.8e+19). This limit is unreachable since the maximum database
size of 140 terabytes will be reached first. A 140 terabytes database can
hold no more than approximately 1e+13 rows, and then only if there are no
indices and if each row contains very little data.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Random-access sequences

Graham Holden
In reply to this post by Matthias-Christian Ott


-------- Original message --------
From: Stephan Beal <[hidden email]>
Date: 01/03/2016  12:07  (GMT+00:00)
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Random-access sequences
 
> On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott <[hidden email]>
wrote:

>> Unfortunately, this limits the maximum number of elements that can ever
>> be inserted during a table's life-time to 2^63 - 1. While this might be
>> acceptable in some cases it is an artificial limitation.
>>

>Artificial, yes, but so is "64 bits." You will likely hit other limitations
far before getting anywhere near 2^63-1 insertions:

> https://www.sqlite.org/limits.html

> e.g. point #13:

> *Maximum Number Of Rows In A Table*

I don't think he's bothered about the maximum number of rows at one time, but that he might run out of new rowids. However, this feels as needless a concern: with 1,000 queue/dequeues per second, 2^63 IDs will last 292 million years... 

Graham.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Random-access sequences

Matthias-Christian Ott
In reply to this post by Stephan Beal-3
On 01/03/16 12:07, Stephan Beal wrote:

> On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott <[hidden email]>
> wrote:
>
>> Unfortunately, this limits the maximum number of elements that can ever
>> be inserted during a table's life-time to 2^63 - 1. While this might be
>> acceptable in some cases it is an artificial limitation.
>>
>
> Artificial, yes, but so is "64 bits." You will likely hit other limitations
> far before getting anywhere near 2^63-1 insertions:
>
> https://www.sqlite.org/limits.html
>
> e.g. point #13:
>
> *Maximum Number Of Rows In A Table*
>
> The theoretical maximum number of rows in a table is 264 (18446744073709551616
> or about 1.8e+19). This limit is unreachable since the maximum database
> size of 140 terabytes will be reached first. A 140 terabytes database can
> hold no more than approximately 1e+13 rows, and then only if there are no
> indices and if each row contains very little data.

I use SQLite as a buffer for small amounts of data so I'm not near the
limits of the database. However, 2^63 - 1 for autoincrement columns
refers to the life-time of the table not the limits of the database. I'm
aware that pushing 2^63 - 1 elements through the queue is not realistic
even in the largest deployments. But saying that your software breaks
after 2^63 - 1 elements feels somewhat like saying that your name can
only be 20 character because some mainframe software can't handle more
than that. It's a feeling that you have failed as a software developer
to deliver a proper solution and now pretend that your improper solution
is proper because nobody will hit the limits.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Random-access sequences

Richard Damon
In reply to this post by Stephan Beal-3
On 3/1/16 7:07 AM, Stephan Beal wrote:

> On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott <[hidden email]>
> wrote:
>
>> Unfortunately, this limits the maximum number of elements that can ever
>> be inserted during a table's life-time to 2^63 - 1. While this might be
>> acceptable in some cases it is an artificial limitation.
>>
> Artificial, yes, but so is "64 bits." You will likely hit other limitations
> far before getting anywhere near 2^63-1 insertions:
>
> https://www.sqlite.org/limits.html
>
> e.g. point #13:
>
> *Maximum Number Of Rows In A Table*
>
> The theoretical maximum number of rows in a table is 264 (18446744073709551616
> or about 1.8e+19). This limit is unreachable since the maximum database
> size of 140 terabytes will be reached first. A 140 terabytes database can
> hold no more than approximately 1e+13 rows, and then only if there are no
> indices and if each row contains very little data.
>
You can hit 2^63 insertions well before hitting the size limit of the
database if you have also been doing deletions.

A simple repeated pattern of Add record N, Remove record N-1 will
eventually us up the limits on record number.

The answer here is to detect when the record number is getting near to
'too big' and then go through an 'renumber' the record back in sequence
from 1 and reset the next record number. This might require changing
data elsewhere if record numbers have been saved anywhere, which means
it might make sense to not wait till overflow in imminent, but do it
proactively when minimal records (ideally no records) are referenced
elsewhere (and only in known locations).

2^63 is such a big number that it is likely you can find a time for
standard PM when you can do this with the database offline before things
overflow. At one insertion a nano-second, you have almost 3 centuries
before this needs to be done (if I am doing my math right), so you can
do it when you migrate to a new box.

With 32 bit numbers, these limits were readily reachable, 64 bits was a
quantum leap in range, adding 32 doubling to the limit, which will take
a while to catch up (which we likely will). The 140 terabyte limit is
something that I could see being hit now in some applications.

--
Richard Damon

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Random-access sequences

Matthias-Christian Ott
In reply to this post by Matthias-Christian Ott
On 01/03/16 11:59, Matthias-Christian Ott wrote:

> I know that this question is not strictly related to SQLite.
>
> I want to persist a random-access sequence with SQLite. A random-access
> sequence is a queue with random-access to its queued elements. Suppose
> that a random-access sequence [x_0, ..., x_n] has the following operations:
>
> enqueue([x_0, ..., x_n], x) = [x_0, ..., x_n, x]
> dequeue([x_0, ..., x_n]) = [x_1, ..., x_n]
> lookup([x_0, ..., x_n], i) = x_i
> update([x_0, ..., x_n], i, y) =
>   [x_0, ..., x_(i - 1), y, x_(i + 1), ... x_n]
> length([x_0, ..., x_n]) = n + 1

Thinking a bit more about it, I also need the following operation:

delete([x_0, ..., x_n], i) = [x_0, ..., x_(i - 1), x_(i + 1), ... x_n]

So implementing the data structure with autoincrement does not work anymore.

- Matthias-Christian

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Random-access sequences

Wade, William
In RAM, the simple implementation would be to have a balanced tree, ordered by index, where every node knows how many elements are below it, and has pointers to its children. The tree will have O(logN) depth, following a pointer is O(1), and, and all of your operations involve a small (constant-bound) number of nodes at each level.

In a typical indexed database, following a pointer (finding the row for a key) takes O(logN) time (rather than O(1) time), and it seems most of your operations would now cost O((logN)^2) time.

You could hack your database's B-tree implementation, so that each B-Tree page "knows" the number of records "below" the page. At that point you would essentially be putting the RAM implementation into a file (with an optimistic assumption that Read(file, page#) is a constant-time operation in the file system).

In either of the above, auto-increment works fine as long as the total number enqueued() over the lifetime of the database is not too large, but in addition to the autoincrement, you need to update the "parents" of inserted or removed records.

Regards

-----Original Message-----
From: Matthias-Christian Ott [mailto:[hidden email]]
Sent: Tuesday, March 01, 2016 10:38 AM
To: [hidden email]
Subject: Re: [sqlite] Random-access sequences

On 01/03/16 11:59, Matthias-Christian Ott wrote:

> I know that this question is not strictly related to SQLite.
>
> I want to persist a random-access sequence with SQLite. A
> random-access sequence is a queue with random-access to its queued
> elements. Suppose that a random-access sequence [x_0, ..., x_n] has the following operations:
>
> enqueue([x_0, ..., x_n], x) = [x_0, ..., x_n, x] dequeue([x_0, ...,
> x_n]) = [x_1, ..., x_n] lookup([x_0, ..., x_n], i) = x_i update([x_0,
> ..., x_n], i, y) =
>   [x_0, ..., x_(i - 1), y, x_(i + 1), ... x_n] length([x_0, ..., x_n])
> = n + 1

Thinking a bit more about it, I also need the following operation:

delete([x_0, ..., x_n], i) = [x_0, ..., x_(i - 1), x_(i + 1), ... x_n]

So implementing the data structure with autoincrement does not work anymore.

- Matthias-Christian



**************************************************************************************
This e-mail and any attachments thereto may contain confidential information and/or information protected by intellectual property rights for the exclusive attention of the intended addressees named above. If you have received this transmission in error, please immediately notify the sender by return e-mail and delete this message and its attachments. Unauthorized use, copying or further full or partial distribution of this e-mail or its contents is prohibited.
**************************************************************************************
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Random-access sequences

James K. Lowden
In reply to this post by Richard Damon
On Tue, 1 Mar 2016 08:15:25 -0500
Richard Damon <[hidden email]> wrote:

> > The theoretical maximum number of rows in a table is 264
> > (18446744073709551616 or about 1.8e+19). This limit is unreachable
> > since the maximum database size of 140 terabytes will be reached
> > first. A 140 terabytes database can hold no more than approximately
> > 1e+13 rows, and then only if there are no indices and if each row
> > contains very little data.
> >
> You can hit 2^63 insertions well before hitting the size limit of the
> database if you have also been doing deletions.

Yes.  If you manage 1,000,000 insertion/second, that's 3.15576 *
10^13/year.  You would run out of integers in 584,542 years.  

To get around that, add an "epoch" column, also integer.
Initially it is always zero.  Whenever "position" exceeds 2^63,
increment "epoch" and reset "position" to zero.  

That will give you at least twice as many years.  

--jkl
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Random-access sequences

R Smith


On 2016/03/02 2:26 AM, James K. Lowden wrote:

> On Tue, 1 Mar 2016 08:15:25 -0500
> Richard Damon <[hidden email]> wrote:
>
>>> The theoretical maximum number of rows in a table is 264
>>> (18446744073709551616 or about 1.8e+19). This limit is unreachable
>>> since the maximum database size of 140 terabytes will be reached
>>> first. A 140 terabytes database can hold no more than approximately
>>> 1e+13 rows, and then only if there are no indices and if each row
>>> contains very little data.
>>>
>> You can hit 2^63 insertions well before hitting the size limit of the
>> database if you have also been doing deletions.
> Yes.  If you manage 1,000,000 insertion/second, that's 3.15576 *
> 10^13/year.  You would run out of integers in 584,542 years.
>
> To get around that, add an "epoch" column, also integer.
> Initially it is always zero.  Whenever "position" exceeds 2^63,
> increment "epoch" and reset "position" to zero.
>
> That will give you at least twice as many years.

No, that will give you 2^63 as many years... Which is many many times
more than the age of the Universe and slightly less entries than there
are atoms in the known Universe (About 10^80 on last count). Good idea,
I think it will be enough. :)

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users