Write performance

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Write performance

Sean Heber
SQLite write (insert) performance doesn't seem very good to me so I'm  
trying to find ways to improve it in my application.  I had this idea  
so I figured I'd run it past you folks who likely better know how the  
db works under the hood.

My thought is that perhaps the write performance suffers because the  
data is being written into the data file in some particular sorted  
order.  Thus it needs to re-order what is already there to insert  
content into the middle of the file at times rather than being able  
to just append new content onto the end.

I've been inserting data into a table with a non-integer primary  
key.  The data does not come in in an ordered fashion.  My thought is  
that perhaps performance suffers because it needs to rewrite huge  
chunks of the data file in order to slot the new data into its  
correct position within the file based on the sorting of the primary  
key.  Would I get a performance improvement if I gave my table an  
auto_increment integer primary key and just added an index to the  
column I'm currently using as my primary key instead?  The thinking  
is that this would then mean that the data would be inserted and thus  
sorted within the data file in numerical order based on the  
auto_increment column and thus the data would be much more often  
appended at or near the end of the data file which is a much faster  
operation than trying to put something into the middle of a large  
(and growing) file.

Does that make any sense at all or am I reaching too far here?  :-)

l8r
Sean

Reply | Threaded
Open this post in threaded view
|

Re: Write performance

Jay Sprenkle
On 6/17/05, Sean Heber <[hidden email]> wrote:
> SQLite write (insert) performance doesn't seem very good to me so I'm
> trying to find ways to improve it in my application.  I had this idea
> so I figured I'd run it past you folks who likely better know how the
> db works under the hood.

did you wrap your inserts in a transaction?
Reply | Threaded
Open this post in threaded view
|

Re: Write performance

Cory Nelson
In reply to this post by Sean Heber
are you using a transaction?  if not, go read the wiki.

On 6/17/05, Sean Heber <[hidden email]> wrote:

> SQLite write (insert) performance doesn't seem very good to me so I'm
> trying to find ways to improve it in my application.  I had this idea
> so I figured I'd run it past you folks who likely better know how the
> db works under the hood.
>
> My thought is that perhaps the write performance suffers because the
> data is being written into the data file in some particular sorted
> order.  Thus it needs to re-order what is already there to insert
> content into the middle of the file at times rather than being able
> to just append new content onto the end.
>
> I've been inserting data into a table with a non-integer primary
> key.  The data does not come in in an ordered fashion.  My thought is
> that perhaps performance suffers because it needs to rewrite huge
> chunks of the data file in order to slot the new data into its
> correct position within the file based on the sorting of the primary
> key.  Would I get a performance improvement if I gave my table an
> auto_increment integer primary key and just added an index to the
> column I'm currently using as my primary key instead?  The thinking
> is that this would then mean that the data would be inserted and thus
> sorted within the data file in numerical order based on the
> auto_increment column and thus the data would be much more often
> appended at or near the end of the data file which is a much faster
> operation than trying to put something into the middle of a large
> (and growing) file.
>
> Does that make any sense at all or am I reaching too far here?  :-)
>
> l8r
> Sean
>
>


--
Cory Nelson
http://www.int64.org
Reply | Threaded
Open this post in threaded view
|

Re: Write performance

D. Richard Hipp
In reply to this post by Sean Heber
On Fri, 2005-06-17 at 14:02 -0500, Sean Heber wrote:
> I've been inserting data into a table with a non-integer primary  
> key.  The data does not come in in an ordered fashion.  My thought is  
> that perhaps performance suffers because it needs to rewrite huge  
> chunks of the data file in order to slot the new data into its  
> correct position within the file based on the sorting of the primary  
> key.  

No.  SQLite uses the B-Tree algorithm.  Inserting a new value in
the middle of a table involves rewriting 4 or 5 pages in the worst
case.  The usual case is to rewrite just the one page where the value
is being inserted.
--
D. Richard Hipp <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Write performance

Sean Heber
In reply to this post by Jay Sprenkle
>> SQLite write (insert) performance doesn't seem very good to me so I'm
>> trying to find ways to improve it in my application.  I had this idea
>> so I figured I'd run it past you folks who likely better know how the
>> db works under the hood.
>>
>
> did you wrap your inserts in a transaction?

I don't think I can do that because the inserts come in at random  
intervals as they are generated by outside influences.  I'm not just  
doing an import of existing data as the data is created on the fly by  
factors somewhat outside of my control.  I tried to make an in-memory  
temporary table and insert into that and periodically start a  
transaction with an "insert into real_table select from memory_table"  
thing in bulk, but that didn't seem to make a significant  
difference.  Perhaps I've got something else wrong.

Thanks for the suggestion.

l8r
Sean

Reply | Threaded
Open this post in threaded view
|

Re: Write performance

Sean Heber
In reply to this post by D. Richard Hipp
>> I've been inserting data into a table with a non-integer primary
>> key.  The data does not come in in an ordered fashion.  My thought is
>> that perhaps performance suffers because it needs to rewrite huge
>> chunks of the data file in order to slot the new data into its
>> correct position within the file based on the sorting of the primary
>> key.
>>
>
> No.  SQLite uses the B-Tree algorithm.  Inserting a new value in
> the middle of a table involves rewriting 4 or 5 pages in the worst
> case.  The usual case is to rewrite just the one page where the value
> is being inserted.

That makes sense, okay.  Perhaps my problem is that my data is too  
big.  Is a page around 1k?  I often insert blob data in the 64k  
range.  Perhaps that is my problem.

l8r
Sean

Reply | Threaded
Open this post in threaded view
|

Re: Write performance

Eric Scouten
Sean Heber wrote:
>>> I've been inserting data into a table with a non-integer primary
>>> key.  The data does not come in in an ordered fashion.  My thought is
>>> that perhaps performance suffers because it needs to rewrite huge
>>> chunks of the data file in order to slot the new data into its
>>> correct position within the file based on the sorting of the primary
>>> key.

> That makes sense, okay.  Perhaps my problem is that my data is too  
> big.  Is a page around 1k?  I often insert blob data in the 64k  range.  
> Perhaps that is my problem.

My experience is that for chunks of data that are above something like
20K, you're better off (performance-wise) writing to separate,
individual files than using SQLite blobs.

-Eric
Reply | Threaded
Open this post in threaded view
|

Re: Write performance

Christian Smith
In reply to this post by Sean Heber
On Fri, 17 Jun 2005, Sean Heber wrote:

>>> SQLite write (insert) performance doesn't seem very good to me so I'm
>>> trying to find ways to improve it in my application.  I had this idea
>>> so I figured I'd run it past you folks who likely better know how the
>>> db works under the hood.
>>>
>>
>> did you wrap your inserts in a transaction?
>
>I don't think I can do that because the inserts come in at random
>intervals as they are generated by outside influences.  I'm not just
>doing an import of existing data as the data is created on the fly by
>factors somewhat outside of my control.  I tried to make an in-memory
>temporary table and insert into that and periodically start a
>transaction with an "insert into real_table select from memory_table"
>thing in bulk, but that didn't seem to make a significant
>difference.  Perhaps I've got something else wrong.


If data is coming in slowly, then doing a transaction per data object is
no problem.

If data is coming in fast, then handling multiple data objects per
transaction is a must for efficiency.

Therefore, why not do both? Poll data objects to be processed, and insert
into the current transaction until no more data is available for some
threshold is reached:

process() {
        while(1) {
                int datacount = 0;
                wait for data;
                while(data available && datacount < threshold) {
                        get data from source;
                        if (0==datacount) {
                                begin transaction
                        }
                        insert data to db;
                        datacount++;
                }
                commit transaction;
        }
}

In this case, the consumer will batch up to threshold data inserts,
increasing efficiency. If the producer stops producing data, the consumer
will find no more data and exit it's inner loop. Threshold prevents the
transaction and latency growing indefinitely.

For tuning, you could even make threshold dynamic. Start small, and
increase it's value to some maximum if the current threshold is exceeded.

>
>Thanks for the suggestion.
>
>l8r
>Sean
>
>

--
    /"\
    \ /    ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
     X                           - AGAINST MS ATTACHMENTS
    / \