Large databases howto

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Large databases howto

Pedro Pascual
Hi,

We are evaluating using sqlite for a huge database: around 1000 millions
records splitted into 200 files. Any experiences about this and
tricks/howto's?

The characteristics of our project:

    * The environment is a Unix box (IBM pSeries 64 bit) with fast
      (USCSI-3) disks.
    * No record deletes
    * Most of the (history) data will have a low update rate, and will
      be used mainly for queries.
    * Heavy inserts in active file (5000000 per day), closed every day
      (no more inserts)
    * Just two indexes on the data.

Regards,
Pedro Pascual
Reply | Threaded
Open this post in threaded view
|

Re: Large databases howto

D. Richard Hipp
On Thu, 2005-06-02 at 17:53 +0200, Pedro Pascual wrote:
>     * Heavy inserts in active file (5000000 per day), closed every day
>       (no more inserts)

If you will be querying from the active file, then
it will work best to batch your inserts.  Perhaps
insert into a temporary table.  Then when the temporary
table accumulates a few thousand entries, do this:

    INSERT INTO maintable SELECT * FROM temptable;
    DELETE FROM temptable;

If you try to do 5 million separate INSERT statements
(without a BEGIN...END wrapper around groups of INSERTs)
then each INSERT will be a separate transaction.  5
million inserts per days is 58 transactions per second - which
is aggressive.  It will work better to batch your inserts
in order to keep the number of transactions below 1 or
2 per second.

--
D. Richard Hipp <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Large databases howto

Tom Poindexter
In reply to this post by Pedro Pascual
On Thu, Jun 02, 2005 at 05:53:21PM +0200, Pedro Pascual wrote:

> Hi,
>
> We are evaluating using sqlite for a huge database: around 1000 millions
> records splitted into 200 files. Any experiences about this and
> tricks/howto's?
>
> The characteristics of our project:
>
>    * The environment is a Unix box (IBM pSeries 64 bit) with fast
>      (USCSI-3) disks.
>    * No record deletes
>    * Most of the (history) data will have a low update rate, and will
>      be used mainly for queries.
>    * Heavy inserts in active file (5000000 per day), closed every day
>      (no more inserts)
>    * Just two indexes on the data.
>
> Regards,
> Pedro Pascual


I don't see any problems right away, 5 million rows for a table is
easily handled by SQLite.  Richard's posting is good advice, if you
can batch your inserts, you will likely have better performance.
Also see the performance page if you haven't already:

    http://www.sqlite.org/cvstrac/wiki?p=PerformanceConsiderations

You might also want to experiment with page sizes greater than the
default, and the number of cache pages for queries.  
See the documentation for 'PRAGMA page_size = bytes;' and
'PRAGMA cache_size = Number-of-pages;'

It sounds like you are creating a new database for every day (and
thus a separate file.)  Will you be attaching other databases
for queries?  Your application sounds suspiciously like some
real-time logging application - if you don't allow queries on
current days' data, perhaps create your table without indexes,
then index in batch after the next day's file is active?

I have a reasonably sized SQLite application running on Linux,
Windows, and AIX.  My application uses a single database file, has
approximately 35 tables.  Several of my tables are small, less than
50,000 rows, but a few tables have up to 60-80 million rows.  I do
mostly inserts and updates, and use the Tcl interface exclusively.  
My database file can run up to 60-70 gigabytes in file size.

I would also recommend running any queries and
updates before hand with my IdxChk utility:

    http://www.sqlite.org/cvstrac/wiki?p=IdxChk

IdxChk should help spot any poorly performing queries ahead of time,
and verify that your indexes are being used.



--
Tom Poindexter
[hidden email]
http://www.nyx.net/~tpoindex/
Reply | Threaded
Open this post in threaded view
|

Re: Large databases howto

Pedro Pascual
In reply to this post by Pedro Pascual
Thanks for your answers. They seem encouraging.

A few extra comments and questions:

    * We are doing tests for the active/daily (5M records) file with
      BEGIN END packages of 1000 inserts (with 2 indexes), and seems to
      be OK. We need also to do some queries/updates on this file
      (around 100K a day), but it seems that sqlite can cope with them.
    * ?What about querying/updating the around 200 (6 months) historical
      data files (5M records each)? We know of the limitation to
      connecting to, at most, 32 files. ?Any advice on improving the
      performance of querying such a huge database?
    * We are thinking on merging 7 daily files (5M records, with 2
      indexes, each file) into one weekly file. ?Which is the optimum
      way of doing this?

Thanks again,
Pedro Pascual

Pedro Pascual wrote:

> Hi,
>
> We are evaluating using sqlite for a huge database: around 1000
> millions records splitted into 200 files. Any experiences about this
> and tricks/howto's?
>
> The characteristics of our project:
>
>    * The environment is a Unix box (IBM pSeries 64 bit) with fast
>      (USCSI-3) disks.
>    * No record deletes
>    * Most of the (history) data will have a low update rate, and will
>      be used mainly for queries.
>    * Heavy inserts in active file (5000000 per day), closed every day
>      (no more inserts)
>    * Just two indexes on the data.
>
> Regards,
> Pedro Pascual
>