SQLite scalability

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

SQLite scalability

Baruch Burstein-2
Hi all,

I know SQLite is supposed to support DB sizes in the TB (I think the
default configuration can reach 1TB). I am curious if anyone actually uses
SQlite at anywhere near this. Does anyone use it regularly for DBs 500GB+
in size, or with tables containing 10 billion rows+? How much concurrency
does your use require? How long do selects take (assuming indexes are set
correctly?) Are there problems of locking ("normal" SQLite doesn't usually
suffer from locking since transactions are very quick, but if transactions
can be in the order of 100's of ms, I think locking can easily happen if
the DB is accessed concurrently a few times a second, though I am not sure
if this may only apply to writes).

I understand that the answer to most of these questions can be very
hardware (and software) dependent, but I am just trying to get a feel for
SQLite's applicability for a project I am working on that may reach limits
like these.

Thanks,
Baruch

--
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite scalability

R Smith
Hi Baruch,

I have some DBs that top a TB, biggest one is 1.7TB but have to confess I use it only for testing - however, I have seen people post
about DBs on this forum that dwarfs mine by several magnitudes. I have seen people talk about a query running for several days.  
There is nothing intrinsinc in SQLite that prevents you from having an insanely big DB if you can manage the physical space. Some OS
restrictions may apply etc, and you will need to up the page-size settings - but you are really looking to implement DB sizes which
would be considered the low-end of the spectrum, so no need to get into the finer points there.

Further to this, the query execution time on SQlite will be close to any of the RDBMSes you can find - mostly faster, sometimes
slower depending on the query. Your biggest speed concern would be your physical hardware, not the RDBMS.

The things you should consider more is whether you will need any distributed or multi-tiered access or any form of user-access and
control... in which case SQLite is not your cup of tea.

This page is the best all-round description of when to use - and when NOT to use SQLite and probably will give you the best "feel"
for it:
http://www.sqlite.org/whentouse.html

Hope this helps,
Ryan


On 2013/11/21 11:43, Baruch Burstein wrote:

> Hi all,
>
> I know SQLite is supposed to support DB sizes in the TB (I think the
> default configuration can reach 1TB). I am curious if anyone actually uses
> SQlite at anywhere near this. Does anyone use it regularly for DBs 500GB+
> in size, or with tables containing 10 billion rows+? How much concurrency
> does your use require? How long do selects take (assuming indexes are set
> correctly?) Are there problems of locking ("normal" SQLite doesn't usually
> suffer from locking since transactions are very quick, but if transactions
> can be in the order of 100's of ms, I think locking can easily happen if
> the DB is accessed concurrently a few times a second, though I am not sure
> if this may only apply to writes).
>
> I understand that the answer to most of these questions can be very
> hardware (and software) dependent, but I am just trying to get a feel for
> SQLite's applicability for a project I am working on that may reach limits
> like these.
>
> Thanks,
> Baruch
>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite scalability

Eduardo Morras-2
In reply to this post by Baruch Burstein-2
On Thu, 21 Nov 2013 11:43:32 +0200
Baruch Burstein <[hidden email]> wrote:

> Hi all,
>
> I know SQLite is supposed to support DB sizes in the TB (I think the
> default configuration can reach 1TB). I am curious if anyone actually uses
> SQlite at anywhere near this. Does anyone use it regularly for DBs 500GB+
> in size, or with tables containing 10 billion rows+? How much concurrency
> does your use require? How long do selects take (assuming indexes are set
> correctly?) Are there problems of locking ("normal" SQLite doesn't usually
> suffer from locking since transactions are very quick, but if transactions
> can be in the order of 100's of ms, I think locking can easily happen if
> the DB is accessed concurrently a few times a second, though I am not sure
> if this may only apply to writes).
>
> I understand that the answer to most of these questions can be very
> hardware (and software) dependent, but I am just trying to get a feel for
> SQLite's applicability for a project I am working on that may reach limits
> like these.

Depends on what type/kind of use you need for your data. If you are going to do a lot insert, update, delete, perhaps sqlite isn't for you.

If it's principal use is for select, when populate the tables, do it pre-sorted by the data primary key or by the colum which makes a better quality index for your use (look for 'low quality indexes' in sqlite docs, and do the opposite).

Normalize your db as much as you can, but not more ;) .

Sqlite allows you to attach up to 30 db files. Split your data between 2-3 db and put each one in different disks.

Increase cache size before creating any table, other RDBMS uses a lot of memory from several MBs to GBs, give Sqlite cache 500MB-1GB for example.

Write your queries in different ways and test which is better, there are some tricks with indexes, joins..., that can help you (study documentation)

Set STATS3 or 4 for analyze your db data before creating any table.

Set autovacuum full before creating any table, even if you don't plan to delete/update data. If I Remember Correctly (if not correct me please), autovacuum adds metadata to db file that allows Sqlite engine do some internal works faster.

>
> Thanks,
> Baruch
>
> --
> ˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


---   ---
Eduardo Morras <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite scalability

Valentin Davydov-2
In reply to this post by Baruch Burstein-2
On Thu, Nov 21, 2013 at 11:43:32AM +0200, Baruch Burstein wrote:
> Hi all,
>
> I know SQLite is supposed to support DB sizes in the TB (I think the
> default configuration can reach 1TB). I am curious if anyone actually uses
> SQlite at anywhere near this.

Yes.

> Does anyone use it regularly for DBs 500GB+
> in size, or with tables containing 10 billion rows+?

I've got an installation with SQLite DB of several terabytes in size.
It contains about 20 billions thoroughly indexed records, and grows every
day (more precisely, every night) by a few tens of millions of new records.

> How much concurrency does your use require?

I've spent some efforts to eliminate concurrency in application. That is,
updates and selects occur at very different times of the day.

> How long do selects take (assuming indexes are set correctly?)

It depends of the size of the select. Single row is selected instantaneously.
Check of the uniqueness takes about 1-2 minutes per 1 million of records, most
of time being spent parsing SQL commands. Whereas aggregate functions over
substantional fraction of the entire database, of course, take too long to
be executed in real time.

> Are there problems of locking ("normal" SQLite doesn't usually
> suffer from locking since transactions are very quick, but if transactions
> can be in the order of 100's of ms, I think locking can easily happen if
> the DB is accessed concurrently a few times a second, though I am not sure
> if this may only apply to writes).

Yes. Single transaction (insertion of that tens of millions of new recors)
takes hours in the worst case.

> I understand that the answer to most of these questions can be very
> hardware (and software) dependent,

Indeed not so. The only hardware capable of storing such amount of data
is an array of magnetic disks, and their latency time (about 10-20 ms for
random access) is much more than any reasonable software overhead. Even
cache (internal SQLite page cache and/or operation system file cache)
occupies the same memory and therefore has almost the same effect. The
only software which determines the performance is SQLite itself, in my
case, perhaps, trees rebalancing algorithm.

> but I am just trying to get a feel for
> SQLite's applicability for a project I am working on that may reach limits
> like these.

The only definive SQLite limits are documentet in the relevant manual page.

Valentin Davydov.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLite scalability

Howard Chu
Valentin Davydov wrote:

> On Thu, Nov 21, 2013 at 11:43:32AM +0200, Baruch Burstein wrote:
>> Hi all,
>>
>> I know SQLite is supposed to support DB sizes in the TB (I think the
>> default configuration can reach 1TB). I am curious if anyone actually uses
>> SQlite at anywhere near this.
>
> Yes.
>
>> Does anyone use it regularly for DBs 500GB+
>> in size, or with tables containing 10 billion rows+?
>
> I've got an installation with SQLite DB of several terabytes in size.
> It contains about 20 billions thoroughly indexed records, and grows every
> day (more precisely, every night) by a few tens of millions of new records.
>
>> How much concurrency does your use require?
>
> I've spent some efforts to eliminate concurrency in application. That is,
> updates and selects occur at very different times of the day.
>
>> How long do selects take (assuming indexes are set correctly?)
>
> It depends of the size of the select. Single row is selected instantaneously.
> Check of the uniqueness takes about 1-2 minutes per 1 million of records, most
> of time being spent parsing SQL commands. Whereas aggregate functions over
> substantional fraction of the entire database, of course, take too long to
> be executed in real time.

Use SQLightning instead - concurrency issues are irrelevant then, since
writers don't block readers. And it will search multiple gigabytes per second,
as opposed to your millions-per-minute figure above.

>> Are there problems of locking ("normal" SQLite doesn't usually
>> suffer from locking since transactions are very quick, but if transactions
>> can be in the order of 100's of ms, I think locking can easily happen if
>> the DB is accessed concurrently a few times a second, though I am not sure
>> if this may only apply to writes).
>
> Yes. Single transaction (insertion of that tens of millions of new recors)
> takes hours in the worst case.

There's no good reason for insertion of 10 million records to take hours.

>> I understand that the answer to most of these questions can be very
>> hardware (and software) dependent,
>
> Indeed not so. The only hardware capable of storing such amount of data
> is an array of magnetic disks, and their latency time (about 10-20 ms for
> random access) is much more than any reasonable software overhead. Even
> cache (internal SQLite page cache and/or operation system file cache)
> occupies the same memory and therefore has almost the same effect. The
> only software which determines the performance is SQLite itself, in my
> case, perhaps, trees rebalancing algorithm.

1TB SSDs are only ~$500; there's no reason to limit yourself to the slowness
of magnetic disks these days.

http://www.amazon.com/Samsung-Electronics-EVO-Series-2-5-Inch-MZ-7TE1T0BW/dp/B00E3W16OU

SQLightning uses only the OS filesystem cache, so you get maximal use of the
available system RAM instead of wasting half of it with redundant copies in
application-level caches.

>> but I am just trying to get a feel for
>> SQLite's applicability for a project I am working on that may reach limits
>> like these.
>
> The only definive SQLite limits are documentet in the relevant manual page.

--
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users